{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5d40d974",
   "metadata": {},
   "source": [
    "# 99 - Writting to and reading from different formats\n",
    "\n",
    "**Description:** This notebook contains an example on how to write a list of CIMA objects into the FOF_CT volumetric     \n",
    "\n",
    "The following index has links to the different sections of the notebook. Some sections may have additional indexes to access the subparts.  \n",
    "If you are using VS Code to visualize the notebook, the links will not work. This is due to how VC Code render the notebook. To navigate the notebook use the outline panel. To know more about it, check this [link](https://code.visualstudio.com/docs/getstarted/userinterface#_outline-view)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5820879c",
   "metadata": {},
   "source": [
    "Content:\n",
    "\n",
    "- [Library imports and functions](#library-imports-and-functions)\n",
    "- [Setting some variables](#setting-some-variables)\n",
    "- [Exporting the data with CIMA](#exporting-the-data-with-cima)\n",
    "- [Writting to FOF-CT volumetric format](#writting-to-fof-ct-volumetic-format)\n",
    "- [Reading the FOF-CT volumetric format](#reading-the-fof-ct-volumetic-format)\n",
    "- [Reading from XYZ coordinates format](#reading-from-xyz-coordinate-format)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cde2c0d",
   "metadata": {},
   "source": [
    "## Library imports and functions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1723e176",
   "metadata": {},
   "source": [
    "Content:\n",
    "\n",
    "- [Library imports and functions](#library-imports-and-functions)\n",
    "- [Setting some variables](#setting-some-variables)\n",
    "- [Exporting the data with CIMA](#exporting-the-data-with-cima)\n",
    "- [Extracting morphological features](#extracting-the-morphological-features)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07f8c8b7",
   "metadata": {},
   "source": [
    "[Back to the general index](#morphological-features-extraction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4608fdf1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "from pathlib import Path\n",
    "from itertools import product\n",
    "\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "from cima.parsers.parser_csv import CSVParser\n",
    "from cima.utils.misc import fof_volumetric_ct_writer, fof_volumetric_ct_reader, fof_bs_ct_reader\n",
    "\n",
    "## Regular expressions to get the metadata from the filename\n",
    "regex_patterns = {\n",
    "    \"nucleusID\": re.compile(r\"(?i)nuc(\\d+)\"), # This searches for \"Nuc\" or \"nuc\" followed by digits\n",
    "    \"cellID\": re.compile(r\"(?i)cell(\\d+)\"), # This searches for \"Cell\" or \"cell\" followed by digits\n",
    "    \"locationID\" : re.compile(r\"(?i)loc-?(\\d+)\"), # This searches for \"Loc\" or \"loc\" followed by optional hyphen and digits\n",
    "    \"date\" : re.compile(r\"(\\d{4}[-\\.]\\d{2}[-\\.]\\d{2})\"), # This searches for dates in the format YYYY-MM-DD or YYYY.MM.DD\n",
    "    \"homolog\" : re.compile(r\"_([ABPM01pm])\"), ## Added just in case the homolog is in the filename.\n",
    "    \"chromosome\" : re.compile(r\"(?i)(chr[\\d+MXY])\") ## Added just in case the chromosome is in the filename.\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d970bd6d",
   "metadata": {},
   "source": [
    "## Setting some variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65c741e1",
   "metadata": {},
   "source": [
    "In the following cell you should set up some variables for the script to run properly. Change the variables at your convenience, and then you can press \"Run all\" in the Jupyter Notebook.  \n",
    "\n",
    "Brief explanation of the different variables:\n",
    "<ul>\n",
    "<li><strong><u>work_folder</strong></u>: this variable is where the plots and the tsv files that the script creates will be stored.</li><br>\n",
    "<li><strong><u>data_file</strong></u>: this variable is where the csv files to be assessed are stored.</li><br>\n",
    "<li><strong><u>info_file</strong></u>: file that contains information about your walks, such as the chromosome, start and end of the steps. You need a columns that matches the 'timepoint' column in the csv files, to be able to show proper information.</li><br>\n",
    "<li><strong><u>column_round</strong></u>: the column that matches the 'timepoint' column in the csv files.</li><br>\n",
    "<li><strong><u>column_name</strong></u>: the column with the \"nice\" name for the steps. This will be used instead of the timepoint number. Use \"\" in case the information file does not have this column (in this case the name of the steps will be in the form of 'StepN' where N is an incremental number form 1 to the max number of steps).</li><br>\n",
    "<li><strong><u>column_size</strong></u>: the column for the size of the segment. Use \"\" in case the information file does not have this column (in this case the notebook will calculate the size automatically of the segments according to the start and end positions in the file).</li><br>\n",
    "<li><strong><u>column_start_pos</strong></u>: the column of the start position of the timepoint.</li><br>\n",
    "<li><strong><u>column_end_pos</strong></u>: the column of the start position of the timepoint.</li><br>\n",
    "<li><strong><u>number_of_steps</strong></u>: the number of steps in your library.</li><br>\n",
    "<li><strong><u>starting_step</strong></u>: the step at which analysis should start.</li><br>\n",
    "<li><strong><u>prec_mean_dataset</strong></u>: the worst precision of the three axes (usually the Z). This will be used as a blurring factor when creating the 3D maps to calculate the volumetric features.</li><br>\n",
    "<li><strong><u>save_plots</strong></u>: save the plots into a folder or show them in the notebook.</li><br>\n",
    "<li><strong><u>file_suffix</strong></u>: add a suffix to the different files created, in case you wish to save the results from different assessments in the same folder. It will be added as \"_suffix\" before the .tsv/.pdf.</li><br>\n",
    "</ul>\n",
    "\n",
    "The notebook will create a folder in the work_folder path called `6_morphological_features_extraction`. Inside this folder 2 subfolders will be created, one for `tsvs` and one for `plots`. Inside each folder, a sub-subfolder with the suffix name will have the plots/tsvs for that experiment.\n",
    "\n",
    "<pre style=\"font-family: monospace; line-height: 1.2;\">\n",
    "6_morphological_features_extraction/\n",
    "|-- tsvs/\n",
    "|   |-- example_tsv_SUFFIX.tsv\n",
    "|-- plots/\n",
    "|   |-- example_plot_SUFFIX.pdf\n",
    "</pre>\n",
    "\n",
    "[Back to the general index](#morphological-features-extraction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4b6667e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "work_folder = \"/scratch/CIMA_tutorial/\"\n",
    "data_folder = \"/scratch/CIMA_tutorial/data/chr3\"\n",
    "\n",
    "info_file = \"/scratch/CIMA_tutorial/data/chr3/Info_chr3.txt\"\n",
    "column_round = \"Round\"\n",
    "column_name = \"Name\"\n",
    "column_size = \"Size(kb)\"\n",
    "column_chr = \"Chr\"\n",
    "column_start_pos = \"Start(hg19)\"\n",
    "column_end_pos = \"End(hg19)\"\n",
    "\n",
    "number_of_steps_library = 16\n",
    "starting_step = 6\n",
    "\n",
    "prec_mean_dataset = 30\n",
    "\n",
    "save_plots = False\n",
    "\n",
    "file_suffix = \"chr3\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "92e8536d",
   "metadata": {},
   "outputs": [],
   "source": [
    "work_folder = Path(work_folder)\n",
    "data_folder = Path(data_folder)\n",
    "pre_assessment_folder = Path(work_folder, \"2_experimental_quality_assessment\")\n",
    "reconstruction_tsv_folder = Path(work_folder, \"3_3D_reconstruction_assessment\", \"tsvs\", file_suffix)\n",
    "\n",
    "info_df = pd.read_table(Path(info_file), sep=\"\\t\")\n",
    "info_df.columns = info_df.columns.str.strip() # In case there are leading or trailing spaces in the column names\n",
    "\n",
    "info_file_dict = {\n",
    "    \"column_round\" : column_round,\n",
    "    \"column_name\" : column_name,\n",
    "    \"column_size\" : column_size,\n",
    "    \"column_chr\" : column_chr,\n",
    "    \"column_start_pos\" : column_start_pos,\n",
    "    \"column_end_pos\" : column_end_pos,\n",
    "}\n",
    "\n",
    "del column_round, column_name, column_size, column_chr, column_start_pos, column_end_pos, info_file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d73273a",
   "metadata": {},
   "source": [
    "## Exporting the data with CIMA"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85c2d610",
   "metadata": {},
   "source": [
    "Content:\n",
    "- [Creating the list of CIMA StructuralObject](#creating-the-list-of-cima-structuralobject)\n",
    "- [Checking the information file](#checking-the-information-file)\n",
    "- [Creating a color palette and the results files](#creating-a-color-palette-and-the-results-folders-for-the-tsv-and-plot-files)\n",
    "\n",
    "[Back to the general index](#morphological-features-extraction)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9c8ffb6",
   "metadata": {},
   "source": [
    "### Creating the list of CIMA StructuralObject"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07b3463a",
   "metadata": {},
   "source": [
    "In this cell we create a list of StructuralObjects (the main CIMA class type).  \n",
    "This list is the basis to do the assessment through this Jupyter Notebook.  \n",
    "\n",
    "When loading the data, we will also take out those traces or CS that did not pass the assessment in the previous notebooks.  \n",
    "  \n",
    "When reading the file, it will also add some metadata gathered from the filename of the csv file:\n",
    "<ul>\n",
    "<li><strong><u>nucleusID</strong></u>: searches for \"nuc\" (with any combination of lower and upper case) followed by digits. If there is not a match, the default is filecount (first file read will be number 1, ...).</li><br>\n",
    "<li><strong><u>cellID</strong></u>: searches for \"cell\" (with any combination of lower and upper case) followed by digits. If there is not a match, the default is filecount (first file read will be number 1, ...).</li><br>\n",
    "<li><strong><u>locationID</strong></u>: searches for \"loc\" (with any combination of lower and upper case) followed by optional hyphen and digits.If there is not a match, the default is filecount (first file read will be number 1, ...).</li><br>\n",
    "<li><strong><u>date</strong></u>: searches for dates in the format YYYY-MM-DD or YYYY.MM.DD. If there is not a match, the default is the file name.</li><br>\n",
    "<li><strong><u>homolog</strong></u>: searches for \"_\" followed by A, B, M, P, m, p, 0 or 1 (only one instance of this). If there is not a match, the default is \"A\".</li><br>\n",
    "<li><strong><u>chromosome</strong></u>: searches for \"chr\" (with any combination of lower and upper case) followed by a number, M, X or Y. If there is not a match, the default is \"chrN\".</li><br>\n",
    "</ul>\n",
    "\n",
    "[Back to the section index](#exporting-the-data-with-cima)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "99dc6c02",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of files found: 33\n",
      "Loading the experimental quality data from the CSV files\n",
      "Loading the bad flags from the CSV files\n",
      "\tCreating the 'flag' column for reconstruction-based removal\n",
      "Skipping file 2: 2018-07-10_nuc04_A_chr3 because it did not pass the experimental quality assessment.\n",
      "Skipping file 5: 2018-07-10_nuc03_A_chr3 because it did not pass the experimental quality assessment.\n",
      "\tExcluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.\n",
      "Skipping file 26: 2018-09-04_nuc01_A_chr3 because it did not pass the experimental quality assessment.\n",
      "Processing file 33: 2018-09-04_nuc06_A_chr3\n",
      "Total number of objects created: 30\n"
     ]
    }
   ],
   "source": [
    "# Get a list of all CSV files in the specified data folder\n",
    "csv_files = list(data_folder.glob('*.csv'))\n",
    "\n",
    "if len(csv_files) == 0:\n",
    "    print(\"No files found, please input the correct work_folder\")\n",
    "else:\n",
    "    print(f\"Total number of files found: {len(csv_files)}\")\n",
    "\n",
    "print(\"Loading the experimental quality data from the CSV files\")\n",
    "traces_pass = pd.read_csv(Path(pre_assessment_folder, f\"experimental_assessment_pass_{file_suffix}.txt\"), sep=\"\\t\", names=[\"trace\"])\n",
    "\n",
    "print(\"Loading the bad flags from the CSV files\")\n",
    "segments_to_remove_CCC = pd.read_csv(Path(reconstruction_tsv_folder, f'segments2remove_bycccvariation_t2{(\"_\" + file_suffix) if file_suffix != \"\" else \"\"}.tsv'), sep=\"\\t\")\n",
    "if \"flag\" not in segments_to_remove_CCC.columns and not segments_to_remove_CCC.empty:\n",
    "    print(\"\\tCreating the 'flag' column for reconstruction-based removal\")\n",
    "    segments_to_remove_CCC[\"flag\"] = segments_to_remove_CCC[[\"experimentID\", \"nucleusID\", \"homolog\", \"timepoint\"]].astype(str).apply(\"_\".join, axis=1)\n",
    "    bad_flags = set(segments_to_remove_CCC[\"flag\"].astype(str).values)\n",
    "elif not segments_to_remove_CCC.empty:\n",
    "    bad_flags = set(segments_to_remove_CCC[\"flag\"].astype(str).values)\n",
    "else:\n",
    "    bad_flags = set()\n",
    "\n",
    "# Get a list of metadata keys to extract from filenames\n",
    "metadata_keys = [\"nucleusID\", \"cellID\", \"locationID\", \"date\", \"homolog\", \"chromosome\"]\n",
    "\n",
    "# Initialize an empty list to store the StructuralObject instances\n",
    "obj_list = []\n",
    "\n",
    "if len(csv_files) > 0:\n",
    "    # Iterate over each file in the list of file names\n",
    "    for count, filein in enumerate(csv_files, start=1):\n",
    "\n",
    "        stem = Path(filein).stem\n",
    "\n",
    "        if not stem in traces_pass[\"trace\"].values:\n",
    "            print(f\"Skipping file {count}: {stem} because it did not pass the experimental quality assessment.\")\n",
    "            continue\n",
    "\n",
    "        print(f\"Processing file {count}: {stem}\", end=\"\\r\")\n",
    "\n",
    "        # We mine the metadata from the filename using regex patterns. If not possible, assign default values.\n",
    "        # In the case of nucleus, cell and location, the default is the filecount (first file read will be number 1, ...)\n",
    "        # In the case of date, if no date can be found, the filename gets used.\n",
    "        # If no valid homolog denomination is found, the default value is A.\n",
    "        # If no valid chromosome denomination is found, the default value is chrN.\n",
    "        metadata_CIMA = {\n",
    "            key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)\n",
    "            for key, default in zip(metadata_keys, [count, count, count, stem, \"A\", \"chrN\"])\n",
    "        }\n",
    "\n",
    "        # Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)\n",
    "        if \"cellID\" in metadata_CIMA and metadata_CIMA[\"nucleusID\"] == count:\n",
    "            metadata_CIMA[\"nucleusID\"] = metadata_CIMA[\"cellID\"]\n",
    "\n",
    "        # Set experimentID as date for consistency\n",
    "        metadata_CIMA[\"experimentID\"] = metadata_CIMA[\"date\"]\n",
    "\n",
    "        # Read the CSV file and create a StructuralObject instance\n",
    "        objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = \"srx\")\n",
    "\n",
    "        # Filter the atomList to include only the specified timepoints in the info file\n",
    "        objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step - 1) &\n",
    "                                        (objin.atomList['timepoint'] < (starting_step + number_of_steps_library - 1))].copy()\n",
    "        \n",
    "        experiment_id = objin.metadata[\"experimentID\"]\n",
    "        nucleus_id = int(objin.metadata[\"nucleusID\"])\n",
    "        homolog = objin.metadata[\"homolog\"]\n",
    "        \n",
    "        full_flags = (experiment_id + \"_\" + str(nucleus_id) + \"_\" +homolog + \"_\" + objin.atomList[\"timepoint\"].astype(str))\n",
    "        remove_mask = full_flags.isin(bad_flags)\n",
    "\n",
    "        if remove_mask.any():\n",
    "            removed_tps = objin.atomList.loc[remove_mask, \"timepoint\"].unique()\n",
    "            # removed_tps = \",\".join([tp_names[tp] for tp in removed_tps])\n",
    "            # print(f\"Excluding timepoints {removed_tps} in experimentID: {experiment_id}, nucleusID: {nucleus_id}, homolog: {homolog} due to poor assessment.\")\n",
    "            objin.atomList = objin.atomList.loc[~remove_mask].copy()\n",
    "\n",
    "        for tp in objin.atomList[\"timepoint\"].unique():\n",
    "            # Take out the timepoint from the atomList if it has less tha 10 points\n",
    "            tp_mask = objin.atomList[\"timepoint\"] == tp\n",
    "            if tp_mask.sum() < 10:\n",
    "                print(f\"\\tExcluding timepoint {tp} in experimentID: {experiment_id}, nucleusID: {nucleus_id}, homolog: {homolog} because it has less than 10 points.\")\n",
    "                objin.atomList = objin.atomList.loc[~tp_mask].copy()\n",
    "        \n",
    "        # Append the object to the list\n",
    "        obj_list.append(objin)\n",
    "\n",
    "    # Nice print to tell the user the number of objects created.\n",
    "    print(f\"\\nTotal number of objects created: {len(obj_list)}\")\n",
    "\n",
    "    # Clean up memory by deleting some of the variables no longer needed\n",
    "    del count, filein, stem, metadata_CIMA, match, objin\n",
    "\n",
    "del csv_files, metadata_keys\n",
    "del bad_flags\n",
    "del experiment_id, nucleus_id, homolog, full_flags, remove_mask\n",
    "if 'segments_to_remove_precision' in locals():\n",
    "    del segments_to_remove_CCC"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "caa19ca6",
   "metadata": {},
   "source": [
    "### Checking the information file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7a2a8f2",
   "metadata": {},
   "source": [
    "This cells makes an adjustment in the info_df.  \n",
    "\n",
    "It checks if the column of the info_df that we will use as the link between this data frame and our walks has the same timepoints. This happens because most programs start with timepoint 0, while most information files start at timepoint 1.  \n",
    "\n",
    "This will create a new column in the dataframe that is called 'timepoint'. This column will be modified so it has the same range as the timepoint column from the csv files.\n",
    "\n",
    "[Back to the section index](#exporting-the-data-with-cima)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "73ebe4db",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List of timepoints found across all objects: [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]\n",
      "List of timepoints in the information file: [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]\n",
      "\n",
      "Note: The column for the information file that has the timepoint is offset by 1 compared to the objects, adjusting accordingly.\n"
     ]
    }
   ],
   "source": [
    "list_of_timepoints = sorted({int(tp) for obj in obj_list for tp in obj.atomList.timepoint.unique()})\n",
    "info_of_timepoints = list(map(int, info_df[info_file_dict[\"column_round\"]].tolist()))\n",
    "\n",
    "print(f'List of timepoints found across all objects: {list_of_timepoints}')\n",
    "print(f'List of timepoints in the information file: {info_of_timepoints}\\n')\n",
    "\n",
    "if list_of_timepoints == info_of_timepoints:\n",
    "    print('The timepoints found in the objects match those in the information file.')\n",
    "    info_df[\"timepoint\"] = info_df[info_file_dict[\"column_round\"]]\n",
    "elif set(list_of_timepoints).issubset(info_of_timepoints):\n",
    "    print('Note: The timepoints found in the objects are a subset of those in the information file. Not all timepoints are present.')\n",
    "    info_df[\"timepoint\"] = info_df[info_file_dict[\"column_round\"]]\n",
    "elif list_of_timepoints == [x - 1 for x in info_of_timepoints]:\n",
    "    print('Note: The column for the information file that has the timepoint is offset by 1 compared to the objects, adjusting accordingly.')\n",
    "    info_df[\"timepoint\"] = info_df[info_file_dict[\"column_round\"]] - 1\n",
    "elif set(list_of_timepoints).issubset({x - 1 for x in info_of_timepoints}):\n",
    "    print('Note: The timepoints found in the objects are a subset of those in the information file, which are offset by 1 compared to the objects. Adjusting accordingly.')\n",
    "    info_df[\"timepoint\"] = info_df[info_file_dict[\"column_round\"]] - 1\n",
    "else:\n",
    "    print('Warning: The timepoints found in the objects do not match those in the information file.')\n",
    "\n",
    "if info_file_dict[\"column_size\"] == \"\":\n",
    "    print(\"Calculating size column from start and end positions...\")\n",
    "    column_size = \"size\"\n",
    "    info_file_dict[\"column_size\"] = column_size\n",
    "    info_df[column_size] = info_df[info_file_dict[\"column_end_pos\"]] - info_df[info_file_dict[\"column_start_pos\"]]\n",
    "\n",
    "if info_file_dict[\"column_name\"] == \"\":\n",
    "    print(\"Setting name column from round numbers...\")\n",
    "    info_file_dict[\"column_name\"] = \"name\"\n",
    "    info_df[\"name\"] = [f\"Step{i}\" for i in range(1, info_df.shape[0]+1)]\n",
    "\n",
    "tp_names = info_df.set_index(\"timepoint\")[info_file_dict[\"column_name\"]].to_dict()\n",
    "    \n",
    "del list_of_timepoints, info_of_timepoints"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36fc7977",
   "metadata": {},
   "source": [
    "### Creating a color palette and the results folders for the tsv and plot files\n",
    "\n",
    "Here we create a color palette automatically based on the different experiment ID found across the files. Experiment IDs are usually the date of the experiment. This allows to easily check for a possible batch effect.\n",
    "\n",
    "We also create the folders were to store the results.\n",
    "<ul>\n",
    "<li><strong><u>FOF_CT_files</strong></u>: a folder that will have all the files created. It will have a subfolder with the file_suffix chosen.</li><br>\n",
    "</ul>\n",
    "\n",
    "[Back to the section index](#exporting-the-data-with-cima)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1215c5d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the color palette for experiments. Use the date or the experimentID (date) as the key.\n",
    "color_palette = {exp_date: color for exp_date, color in zip(\n",
    "    # We get a set of unique dates from the metadata of the objects\n",
    "    set([file.metadata[\"experimentID\"] for file in obj_list]),\n",
    "    # We assign a color from the tab10 palette for each unique date\n",
    "    sns.color_palette(\"tab10\", n_colors=len(set([file.metadata[\"experimentID\"] for file in obj_list]))) \n",
    ")}\n",
    "\n",
    "# This palette is fixed for the current experiments used in the tutorial, which aim to reproduce the figures in the CIMA paper.\n",
    "color_palette = {'2018-01-11':\"#a6cee3\" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}\n",
    "\n",
    "# Create the folders to save the results. If the folders already exist, do not raise an error.\n",
    "fof_ct_folder = Path(work_folder, \"FOF_CT_files\", file_suffix)\n",
    "fof_ct_folder.mkdir(exist_ok=True, parents=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d04a54de",
   "metadata": {},
   "source": [
    "## Writting to FOF-CT volumetic format"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5a1bdf3",
   "metadata": {},
   "source": [
    "CIMA can write the contents of an object into the FOF_CT volumetric format.  \n",
    "<br>\n",
    "This format is composed of 3 different tables (core.csv, trace.csv and spot.csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59fe8edb",
   "metadata": {},
   "source": [
    "#### core.csv\n",
    "This has information about the Spot_ID. In our case the Spot_ID corresponds to a chromatin segment. Internally it's called 4dn_FOF_CT_core. The columns of this table are as follow:\n",
    "<ul>\n",
    "<li><strong><u>Spot_ID</strong></u>: incremental number to have a unique identifier for every chromatin segment.</li><br>\n",
    "<!-- <li><strong><u>Trace_ID</strong></u>: identifier of the trace from where this SpotID comes from.</li><br> -->\n",
    "<li><strong><u>X, Y, Z</strong></u>: coordinates of the centroid of the chromatin segment.</li><br>\n",
    "<li><strong><u>Chrom</strong></u>: chromosome of the chromatin segment.</li><br>\n",
    "<li><strong><u>Chrom_Start</strong></u>: genomic start coordinate of the chromatin segment.</li><br>\n",
    "<li><strong><u>Chrom_End</strong></u>: genomic end coordinate of the chromatin segment.</li><br>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ab2c6eb",
   "metadata": {},
   "source": [
    "#### trace.csv\n",
    "this has information about the Trace_ID. In our case, the Trace_ID corresponds to the trace. Internally I think is called 4dn_FOF_CT_trace. The columns of this table are as follow:\n",
    "<ul>\n",
    "<li><strong><u>Trace_ID</strong></u>: incremental number to have a unique identifier for every trace.</li><br>\n",
    "<li><strong><u>LocID</strong></u>: custom column for this table. Indentifier of the location/field of view where the trace is located. We gather this from the name of the file, as it is tipically included.</li><br>\n",
    "<li><strong><u>NucleusID</strong></u>: custom column for this table. Corresponds to the cellID/nucleusID where the trace is located. We gather this from the name of the file, as it is tipically included.</li><br>\n",
    "<li><strong><u>Homolog</strong></u>: custom column for this table. Corresponds to the homolog of the trace. We gather this from the name of the file, as it is tipically included.</li><br>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea19fe74",
   "metadata": {},
   "source": [
    "#### spot.csv\n",
    "This has the data that each Spot_ID contains. In our case, this table contains the blinks themselves. Internally it is called 4dn_FOF_CT_quality. The columns of this table are as follow:\n",
    "<ul>\n",
    "<li><strong><u>Loc_ID</strong></u>: this column correspond to the imageID from the SRX csv file. </li><br>\n",
    "<li><strong><u>Spot_ID</strong></u>: this corresponds to the Spot_ID from the core.csv table.</li><br>\n",
    "<li><strong><u>X</strong></u>: this corresponds to the X coordinate of the blink.</li><br>\n",
    "<li><strong><u>Y</strong></u>: this corresponds to the Y coordinate of the blink.</li><br>\n",
    "<li><strong><u>Z</strong></u>: this corresponds to the Z coordinate of the blink.</li><br>\n",
    "<li><strong><u>Fluor</strong></u>: this corresponds to the fluorophore used. Right now it has a dummy value as it only says Fluor.</li><br>\n",
    "<li><strong><u>Prec_X</strong></u>: custom column for this table. This corresponds to the xprec columns of the atomList in the CIMA object.</li><br>\n",
    "<li><strong><u>Prec_Y</strong></u>: custom column for this table. This corresponds to the yprec columns of the atomList in the CIMA object.</li><br>\n",
    "<li><strong><u>Prec_Z</strong></u>: custom column for this table. This corresponds to the zprec columns of the atomList in the CIMA object.</li><br>\n",
    "<li><strong><u>timepoint</strong></u>: custom column for this table. This corresponds to the timepoint column of the atomList in the CIMA object.</li><br>\n",
    "<li><strong><u>z_step</strong></u>: custom column for this table. This corresponds to the z-step column of the atomList in the CIMA object.</li><br>\n",
    "<li><strong><u>cycle</strong></u>: custom column for this table. This corresponds to the cycle column of the atomList in the CIMA object.</li><br>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "435a94ae",
   "metadata": {},
   "source": [
    "In order for the fof_ct_wirter function to work, the user must provide a bit of metadata of the experiment. It is described below:\n",
    "\n",
    "<ul>\n",
    "<li><strong><u>genome_assembly</strong></u>: the genome assembly used on the experiment.</li><br>\n",
    "<li><strong><u>lab_name</strong></u>: the name(s) of the lab(s) that did the experiment. Can be multiple ones.</li><br>\n",
    "<li><strong><u>experimenter_name</strong></u>: the name(s) of the experimenter(s) that did the experiment. Can be multiple ones.</li><br>\n",
    "<li><strong><u>experimenter_contact</strong></u>: the contact information of the experimenter(s) that did the experiment. Can be multiple ones.</li><br>\n",
    "<li><strong><u>description</strong></u>: small description of the experiment.</li><br>\n",
    "</ul>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "54257d4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "GENOME_ASSEMBLY = \"hg19\"\n",
    "LAB_NAME = \"Wu Lab\"\n",
    "EXPERIMENTER_NAME = \"Ting Wu, Guy Nir\"\n",
    "EXPERIMENTER_CONTACT = \"ting.wu@example.com, guy.nir@example.com\"\n",
    "DESCRIPTION = \"chr3 files\"\n",
    "\n",
    "# This is just an example, the user should fill this dictionary with the actual fluorophores used in the experiment.\n",
    "# The structure of the dictionary should be {timepoint: fluorophore}, where timepoint is the integer timepoint and fluorophore is a string with the name of the fluorophore used in that timepoint.\n",
    "# Numbering of timepoints sould be consistent with the timepoint column in the CSV file.\n",
    "# If you are using the structure of the tutorial files, you have a DataFrame with a column named \"timepoint\" that has the timepoint information.\n",
    "# Example of the dictionary structure:\n",
    "FLUOROPHORE_DICT = {\n",
    "    5: 'Alexa647',\n",
    "    6: 'Alexa647',\n",
    "    7: 'Alexa647',\n",
    "    8: 'Alexa647',\n",
    "    9: 'Alexa647',\n",
    "    10: 'Alexa647',\n",
    "    11: 'Alexa647',\n",
    "    12: 'Alexa647',\n",
    "    13: 'Alexa647',\n",
    "    14: 'Alexa647',\n",
    "    15: 'Alexa647',\n",
    "    16: 'Alexa647',\n",
    "    17: 'Alexa647',\n",
    "    18: 'Alexa647',\n",
    "    19: 'Alexa647',\n",
    "    20: 'Alexa647'}\n",
    "\n",
    "# This is a pythonic way to create the same dictionary using data avialable through the tutorial notebooks.\n",
    "# This only works if you have only ONE fluorophore across all timepoints.\n",
    "# FLUOROPHORE_DICT = {tp: color for tp, color in product(info_df[\"timepoint\"], [\"Alexa647\"])}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "417e02e5",
   "metadata": {},
   "source": [
    "The function as is done below will write a vlumetric FOF_CT compliant file.<br>\n",
    "If you do not add the dictionary that contains the information about the fluorophore for every timepoint, it will throw an error.<br>\n",
    "In a similar fashon, if the dictionary does not contain the same number of keys as timepoints in the dataset, it will throw an error.<br>\n",
    "\n",
    "The function assumes that the order of the regions matches the order of timepoints in the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9c2db27d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing file 2018-09-04_nuc03_A_chr3 [1/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:152500000-153000000, skipping...\n",
      "\tTime segment not found for region chr3:153500000-154000000, skipping...\n",
      "Processing file 2018-07-10_nuc02_A_chr3 [2/30]\n",
      "Processing file 2018-07-10_nuc05_A_chr3 [3/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc07_A_chr3 [4/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc09_A_chr3 [5/30]\n",
      "Processing file 2018-07-10_nuc11_A_chr3 [6/30]\n",
      "Processing file 2018-09-04_nuc01_B_chr3 [7/30]\n",
      "Processing file 2018-07-10_nuc06_B_chr3 [8/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc05_B_chr3 [9/30]\n",
      "Processing file 2018-07-10_nuc06_A_chr3 [10/30]\n",
      "Processing file 2018-06-28_nuc08_A_chr3 [11/30]\n",
      "Processing file 2018-06-28_nuc05_A_chr3 [12/30]\n",
      "\tTime segment not found for region chr3:152000000-152500000, skipping...\n",
      "Processing file 2018-07-10_nuc07_A_chr3 [13/30]\n",
      "\tTime segment not found for region chr3:156500000-157000000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc01_A_chr3 [14/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc02_B_chr3 [15/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc01_A_chr3 [16/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "Processing file 2018-09-04_nuc04_A_chr3 [17/30]\n",
      "Processing file 2018-07-10_nuc04_B_chr3 [18/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-09-04_nuc03_B_chr3 [19/30]\n",
      "Processing file 2018-07-10_nuc03_B_chr3 [20/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc01_B_chr3 [21/30]\n",
      "Processing file 2018-09-04_nuc07_A_chr3 [22/30]\n",
      "\tTime segment not found for region chr3:151500000-152000000, skipping...\n",
      "\tTime segment not found for region chr3:157000000-157500000, skipping...\n",
      "Processing file 2018-06-28_nuc02_A_chr3 [23/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc10_A_chr3 [24/30]\n",
      "Processing file 2018-07-10_nuc08_A_chr3 [25/30]\n",
      "Processing file 2018-09-04_nuc05_A_chr3 [26/30]\n",
      "Processing file 2018-06-28_nuc01_B_chr3 [27/30]\n",
      "Processing file 2018-09-04_nuc10_A_chr3 [28/30]\n",
      "\tTime segment not found for region chr3:157000000-157500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc07_B_chr3 [29/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:152000000-152500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-09-04_nuc06_A_chr3 [30/30]\n"
     ]
    }
   ],
   "source": [
    "fof_ct_metadata = {\"genome_assembly\": GENOME_ASSEMBLY,\n",
    "                   \"lab_name\": LAB_NAME,\n",
    "                   \"experimenter_name\": EXPERIMENTER_NAME,\n",
    "                   \"experimenter_contact\": EXPERIMENTER_CONTACT,\n",
    "                   \"description\": DESCRIPTION,\n",
    "                   }\n",
    "\n",
    "fof_volumetric_ct_writer(table_folder=fof_ct_folder,\n",
    "                         data2write=obj_list,\n",
    "                         fof_metadata=fof_ct_metadata,\n",
    "                         reg_dict={row[\"timepoint\"]: (row[\"Chr\"], row[\"Start(hg19)\"], row[\"End(hg19)\"]) for _, row in info_df.iterrows()},\n",
    "                         fluor=FLUOROPHORE_DICT,\n",
    "                         file_suffix = file_suffix\n",
    "                         )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96a3f9d5",
   "metadata": {},
   "source": [
    "Using the same function we can turn volumetric format off to write as a standard ball and stick FOF_CT format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d2485792",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing file 2018-09-04_nuc03_A_chr3 [1/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:152500000-153000000, skipping...\n",
      "\tTime segment not found for region chr3:153500000-154000000, skipping...\n",
      "Processing file 2018-07-10_nuc02_A_chr3 [2/30]\n",
      "Processing file 2018-07-10_nuc05_A_chr3 [3/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc07_A_chr3 [4/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc09_A_chr3 [5/30]\n",
      "Processing file 2018-07-10_nuc11_A_chr3 [6/30]\n",
      "Processing file 2018-09-04_nuc01_B_chr3 [7/30]\n",
      "Processing file 2018-07-10_nuc06_B_chr3 [8/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc05_B_chr3 [9/30]\n",
      "Processing file 2018-07-10_nuc06_A_chr3 [10/30]\n",
      "Processing file 2018-06-28_nuc08_A_chr3 [11/30]\n",
      "Processing file 2018-06-28_nuc05_A_chr3 [12/30]\n",
      "\tTime segment not found for region chr3:152000000-152500000, skipping...\n",
      "Processing file 2018-07-10_nuc07_A_chr3 [13/30]\n",
      "\tTime segment not found for region chr3:156500000-157000000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc01_A_chr3 [14/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc02_B_chr3 [15/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc01_A_chr3 [16/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "Processing file 2018-09-04_nuc04_A_chr3 [17/30]\n",
      "Processing file 2018-07-10_nuc04_B_chr3 [18/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-09-04_nuc03_B_chr3 [19/30]\n",
      "Processing file 2018-07-10_nuc03_B_chr3 [20/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc01_B_chr3 [21/30]\n",
      "Processing file 2018-09-04_nuc07_A_chr3 [22/30]\n",
      "\tTime segment not found for region chr3:151500000-152000000, skipping...\n",
      "\tTime segment not found for region chr3:157000000-157500000, skipping...\n",
      "Processing file 2018-06-28_nuc02_A_chr3 [23/30]\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-07-10_nuc10_A_chr3 [24/30]\n",
      "Processing file 2018-07-10_nuc08_A_chr3 [25/30]\n",
      "Processing file 2018-09-04_nuc05_A_chr3 [26/30]\n",
      "Processing file 2018-06-28_nuc01_B_chr3 [27/30]\n",
      "Processing file 2018-09-04_nuc10_A_chr3 [28/30]\n",
      "\tTime segment not found for region chr3:157000000-157500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-06-28_nuc07_B_chr3 [29/30]\n",
      "\tTime segment not found for region chr3:150000000-150500000, skipping...\n",
      "\tTime segment not found for region chr3:152000000-152500000, skipping...\n",
      "\tTime segment not found for region chr3:157500000-158000000, skipping...\n",
      "Processing file 2018-09-04_nuc06_A_chr3 [30/30]\n"
     ]
    }
   ],
   "source": [
    "fof_ct_metadata = {\"genome_assembly\": GENOME_ASSEMBLY,\n",
    "                   \"lab_name\": LAB_NAME,\n",
    "                   \"experimenter_name\": EXPERIMENTER_NAME,\n",
    "                   \"experimenter_contact\": EXPERIMENTER_CONTACT,\n",
    "                   \"description\": DESCRIPTION,\n",
    "                   }\n",
    "\n",
    "fof_volumetric_ct_writer(table_folder=fof_ct_folder,\n",
    "                         data2write=obj_list,\n",
    "                         fof_metadata=fof_ct_metadata,\n",
    "                         reg_dict={row[\"timepoint\"]: (row[\"Chr\"], row[\"Start(hg19)\"], row[\"End(hg19)\"]) for _, row in info_df.iterrows()},\n",
    "                        #  fluor=FLUOROPHORE_DICT,\n",
    "                         file_suffix = file_suffix,\n",
    "                         volumetric_format = False\n",
    "                         )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "107aed50",
   "metadata": {},
   "source": [
    "## Reading the FOF-CT volumetic format"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f5c29b0",
   "metadata": {},
   "source": [
    "In a similar fashion, CIMA can read the core.csv, trace.csv and spot.csv created by the fof_ct_writer function. With this we can recreate a list of CIMA objects similar to the original one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "79eb7274",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reconstructing file: loc1_nuc3_A\n",
      "Reconstructing file: loc3_nuc2_A\n",
      "Reconstructing file: loc4_nuc5_A\n",
      "Reconstructing file: loc6_nuc7_A\n",
      "Reconstructing file: loc7_nuc9_A\n",
      "Reconstructing file: loc8_nuc11_A\n",
      "Reconstructing file: loc9_nuc1_B\n",
      "Reconstructing file: loc10_nuc6_B\n",
      "Reconstructing file: loc11_nuc5_B\n",
      "Reconstructing file: loc12_nuc6_A\n",
      "Reconstructing file: loc13_nuc8_A\n",
      "Reconstructing file: loc14_nuc5_A\n",
      "Reconstructing file: loc15_nuc7_A\n",
      "Reconstructing file: loc16_nuc1_A\n",
      "Reconstructing file: loc17_nuc2_B\n",
      "Reconstructing file: loc18_nuc1_A\n",
      "Reconstructing file: loc19_nuc4_A\n",
      "Reconstructing file: loc20_nuc4_B\n",
      "Reconstructing file: loc21_nuc3_B\n",
      "Reconstructing file: loc22_nuc3_B\n",
      "Reconstructing file: loc23_nuc1_B\n",
      "Reconstructing file: loc24_nuc7_A\n",
      "Reconstructing file: loc25_nuc2_A\n",
      "Reconstructing file: loc27_nuc10_A\n",
      "Reconstructing file: loc28_nuc8_A\n",
      "Reconstructing file: loc29_nuc5_A\n",
      "Reconstructing file: loc30_nuc1_B\n",
      "Reconstructing file: loc31_nuc10_A\n",
      "Reconstructing file: loc32_nuc7_B\n",
      "Reconstructing file: loc33_nuc6_A\n"
     ]
    }
   ],
   "source": [
    "core_file = Path(fof_ct_folder, f\"core_{file_suffix}_volumetric.csv\")\n",
    "trace_file = Path(fof_ct_folder, f\"trace_{file_suffix}_volumetric.csv\")\n",
    "spot_file = Path(fof_ct_folder, f\"spot_{file_suffix}_volumetric.csv\")\n",
    "\n",
    "obj_CIMA_list = fof_volumetric_ct_reader(fof_core_file=core_file, fof_spot_file=spot_file, fof_trace_file=trace_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f47ef0cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "imageID",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "cycle",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "zstep",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "frame",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "accum",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount11",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount12",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount21",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount22",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfx",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfy",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfz",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfphotoncount",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "x",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "y",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "z",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "stdev",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "amp",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background11",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background12",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background21",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background22",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "maxResidualSlope",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "chi",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "loglike",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "accuracy",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "llr",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "clusterID",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "xprec",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "yprec",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "zprec",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "timepoint",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "record_time",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "record_name",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "chromosomes",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "s11",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "s12",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "shiftz",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "mass",
         "rawType": "object",
         "type": "unknown"
        }
       ],
       "ref": "b250a615-4f54-4629-979b-666c8e9d14e6",
       "rows": [
        [
         "0",
         "335775",
         "0",
         "6",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11777.0",
         "1338.5799560546875",
         "1321.719970703125",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "10.845",
         "9.96838",
         "85.8496",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1321.719970703125",
         "10.0"
        ],
        [
         "1",
         "335783",
         "0",
         "6",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11786.400390625",
         "1340.9599609375",
         "1337.5400390625",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11.8319",
         "10.767",
         "90.884",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1337.5400390625",
         "10.0"
        ],
        [
         "2",
         "336116",
         "0",
         "8",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11702.7998046875",
         "1566.4200439453125",
         "1431.239990234375",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "5.01669",
         "4.67027",
         "37.7125",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1431.239990234375",
         "10.0"
        ],
        [
         "3",
         "336118",
         "0",
         "8",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11719.599609375",
         "1555.4599609375",
         "1376.93994140625",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "7.10123",
         "6.80031",
         "60.6664",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1376.93994140625",
         "10.0"
        ],
        [
         "4",
         "336119",
         "0",
         "8",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11706.7998046875",
         "1550.989990234375",
         "1417.5799560546875",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "6.98016",
         "6.51492",
         "55.7733",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1417.5799560546875",
         "10.0"
        ]
       ],
       "shape": {
        "columns": 40,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>imageID</th>\n",
       "      <th>cycle</th>\n",
       "      <th>zstep</th>\n",
       "      <th>frame</th>\n",
       "      <th>accum</th>\n",
       "      <th>photoncount</th>\n",
       "      <th>photoncount11</th>\n",
       "      <th>photoncount12</th>\n",
       "      <th>photoncount21</th>\n",
       "      <th>photoncount22</th>\n",
       "      <th>...</th>\n",
       "      <th>yprec</th>\n",
       "      <th>zprec</th>\n",
       "      <th>timepoint</th>\n",
       "      <th>record_time</th>\n",
       "      <th>record_name</th>\n",
       "      <th>chromosomes</th>\n",
       "      <th>s11</th>\n",
       "      <th>s12</th>\n",
       "      <th>shiftz</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>335775</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>9.96838</td>\n",
       "      <td>85.8496</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1321.719971</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>335783</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>10.76700</td>\n",
       "      <td>90.8840</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1337.540039</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>336116</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>4.67027</td>\n",
       "      <td>37.7125</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1431.239990</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>336118</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>6.80031</td>\n",
       "      <td>60.6664</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1376.939941</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>336119</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>6.51492</td>\n",
       "      <td>55.7733</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1417.579956</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 40 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   imageID  cycle  zstep  frame  accum  photoncount  photoncount11  \\\n",
       "0   335775      0      6      0      0            0              0   \n",
       "1   335783      0      6      0      0            0              0   \n",
       "2   336116      0      8      0      0            0              0   \n",
       "3   336118      0      8      0      0            0              0   \n",
       "4   336119      0      8      0      0            0              0   \n",
       "\n",
       "   photoncount12  photoncount21  photoncount22  ...     yprec    zprec  \\\n",
       "0              0              0              0  ...   9.96838  85.8496   \n",
       "1              0              0              0  ...  10.76700  90.8840   \n",
       "2              0              0              0  ...   4.67027  37.7125   \n",
       "3              0              0              0  ...   6.80031  60.6664   \n",
       "4              0              0              0  ...   6.51492  55.7733   \n",
       "\n",
       "   timepoint  record_time  record_name  chromosomes  s11  s12       shiftz  \\\n",
       "0          6            0         sLOC            0  NaN  NaN  1321.719971   \n",
       "1          6            0         sLOC            0  NaN  NaN  1337.540039   \n",
       "2          6            0         sLOC            0  NaN  NaN  1431.239990   \n",
       "3          6            0         sLOC            0  NaN  NaN  1376.939941   \n",
       "4          6            0         sLOC            0  NaN  NaN  1417.579956   \n",
       "\n",
       "   mass  \n",
       "0  10.0  \n",
       "1  10.0  \n",
       "2  10.0  \n",
       "3  10.0  \n",
       "4  10.0  \n",
       "\n",
       "[5 rows x 40 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj_CIMA_list[0].atomList.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f2b1578",
   "metadata": {},
   "source": [
    "## Reading the FOF-CT bs format"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2078f10",
   "metadata": {},
   "source": [
    "In a similar fashion, CIMA can read the core.csv nad trace.csv. With this we can recreate a list of CIMA objects similar to the original one.<br>\n",
    "\n",
    "<div style=\"text-align: center;\">\n",
    "  <span style=\"color: red; font-size: 2em;\"><strong><u>IMPORTANT</u></strong></span>\n",
    "</div>  \n",
    "<br>\n",
    "<div style=\"text-align: center;\">\n",
    "The atomList of the object will be mostly filled with 0s, as most of the information is missing.\n",
    "</div>  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b1d0b142",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reconstructing file: loc1_nuc3_A\n",
      "Reconstructing file: loc3_nuc2_A\n",
      "Reconstructing file: loc4_nuc5_A\n",
      "Reconstructing file: loc6_nuc7_A\n",
      "Reconstructing file: loc7_nuc9_A\n",
      "Reconstructing file: loc8_nuc11_A\n",
      "Reconstructing file: loc9_nuc1_B\n",
      "Reconstructing file: loc10_nuc6_B\n",
      "Reconstructing file: loc11_nuc5_B\n",
      "Reconstructing file: loc12_nuc6_A\n",
      "Reconstructing file: loc13_nuc8_A\n",
      "Reconstructing file: loc14_nuc5_A\n",
      "Reconstructing file: loc15_nuc7_A\n",
      "Reconstructing file: loc16_nuc1_A\n",
      "Reconstructing file: loc17_nuc2_B\n",
      "Reconstructing file: loc18_nuc1_A\n",
      "Reconstructing file: loc19_nuc4_A\n",
      "Reconstructing file: loc20_nuc4_B\n",
      "Reconstructing file: loc21_nuc3_B\n",
      "Reconstructing file: loc22_nuc3_B\n",
      "Reconstructing file: loc23_nuc1_B\n",
      "Reconstructing file: loc24_nuc7_A\n",
      "Reconstructing file: loc25_nuc2_A\n",
      "Reconstructing file: loc27_nuc10_A\n",
      "Reconstructing file: loc28_nuc8_A\n",
      "Reconstructing file: loc29_nuc5_A\n",
      "Reconstructing file: loc30_nuc1_B\n",
      "Reconstructing file: loc31_nuc10_A\n",
      "Reconstructing file: loc32_nuc7_B\n",
      "Reconstructing file: loc33_nuc6_A\n"
     ]
    }
   ],
   "source": [
    "core_file = Path(fof_ct_folder, f\"core_{file_suffix}_bs.csv\")\n",
    "trace_file = Path(fof_ct_folder, f\"trace_{file_suffix}_bs.csv\")\n",
    "\n",
    "obj_CIMA_list = fof_bs_ct_reader(fof_core_file=core_file, fof_trace_file=trace_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2e7a1592",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "imageID",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "cycle",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "zstep",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "frame",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "accum",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount11",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount12",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount21",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "photoncount22",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfx",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfy",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfz",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "psfphotoncount",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "x",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "y",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "z",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "stdev",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "amp",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background11",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background12",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background21",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "background22",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "maxResidualSlope",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "chi",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "loglike",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "accuracy",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "llr",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "clusterID",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "xprec",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "yprec",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "zprec",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "timepoint",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "record_time",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "record_name",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "chromosomes",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "s11",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "s12",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "shiftz",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "mass",
         "rawType": "object",
         "type": "unknown"
        }
       ],
       "ref": "09367220-399a-4c12-9ec0-3f4b88a348db",
       "rows": [
        [
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "11715.0888671875",
         "1557.898193359375",
         "1481.0479736328125",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "1",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1481.0479736328125",
         "10.0"
        ],
        [
         "1",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "12328.1142578125",
         "1338.091064453125",
         "1398.165283203125",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "2",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "1398.165283203125",
         "10.0"
        ],
        [
         "2",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "12419.5302734375",
         "1516.2933349609375",
         "963.5399780273438",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "3",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "963.5399780273438",
         "10.0"
        ],
        [
         "3",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "12283.1416015625",
         "1473.8743896484375",
         "951.85205078125",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "4",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "951.85205078125",
         "10.0"
        ],
        [
         "4",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "12763.0087890625",
         "1573.85546875",
         "919.4849243164062",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "0",
         "6",
         "0",
         "sLOC",
         "0",
         null,
         null,
         "919.4849243164062",
         "10.0"
        ]
       ],
       "shape": {
        "columns": 40,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>imageID</th>\n",
       "      <th>cycle</th>\n",
       "      <th>zstep</th>\n",
       "      <th>frame</th>\n",
       "      <th>accum</th>\n",
       "      <th>photoncount</th>\n",
       "      <th>photoncount11</th>\n",
       "      <th>photoncount12</th>\n",
       "      <th>photoncount21</th>\n",
       "      <th>photoncount22</th>\n",
       "      <th>...</th>\n",
       "      <th>yprec</th>\n",
       "      <th>zprec</th>\n",
       "      <th>timepoint</th>\n",
       "      <th>record_time</th>\n",
       "      <th>record_name</th>\n",
       "      <th>chromosomes</th>\n",
       "      <th>s11</th>\n",
       "      <th>s12</th>\n",
       "      <th>shiftz</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1481.047974</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1398.165283</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>963.539978</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>951.852051</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>919.484924</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 40 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   imageID  cycle  zstep  frame  accum  photoncount  photoncount11  \\\n",
       "0        0      0      0      0      0            0              0   \n",
       "1        0      0      0      0      0            0              0   \n",
       "2        0      0      0      0      0            0              0   \n",
       "3        0      0      0      0      0            0              0   \n",
       "4        0      0      0      0      0            0              0   \n",
       "\n",
       "   photoncount12  photoncount21  photoncount22  ...  yprec  zprec  timepoint  \\\n",
       "0              0              0              0  ...      0      0          1   \n",
       "1              0              0              0  ...      0      0          2   \n",
       "2              0              0              0  ...      0      0          3   \n",
       "3              0              0              0  ...      0      0          4   \n",
       "4              0              0              0  ...      0      0          6   \n",
       "\n",
       "   record_time  record_name  chromosomes  s11  s12       shiftz  mass  \n",
       "0            0         sLOC            0  NaN  NaN  1481.047974  10.0  \n",
       "1            0         sLOC            0  NaN  NaN  1398.165283  10.0  \n",
       "2            0         sLOC            0  NaN  NaN   963.539978  10.0  \n",
       "3            0         sLOC            0  NaN  NaN   951.852051  10.0  \n",
       "4            0         sLOC            0  NaN  NaN   919.484924  10.0  \n",
       "\n",
       "[5 rows x 40 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj_CIMA_list[0].atomList.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d32cc4e2",
   "metadata": {},
   "source": [
    "## Reading from XYZ coordinate format"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93a139c7",
   "metadata": {},
   "source": [
    "CIMA also can read files that only contain X, Y, Z coordinates.<br>\n",
    "In this case we have to use the SegmentXYZ. This only requires that the data have the columns x, y and z. All other columns will be preserved.<br>\n",
    "Here there is an example using chr21 data as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f5d15789",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "Z(nm)",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "X(nm)",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "Y(nm)",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "Genomic coordinate",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "Chromosome copy number",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "Gene names",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "Transcription",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "TSS ZXY(nm)",
         "rawType": "str",
         "type": "string"
        }
       ],
       "ref": "7cde1dca-05e2-43dd-81a7-bf4f7d4a5c63",
       "rows": [
        [
         "0",
         "2449.0",
         "4700.0",
         "7234.0",
         "chr21:10400001-10450001",
         "1",
         null,
         null,
         null
        ],
        [
         "1",
         "3731.0",
         "4629.0",
         "7409.0",
         "chr21:10500001-10550001",
         "1",
         null,
         null,
         null
        ],
        [
         "2",
         "2248.0",
         "4690.0",
         "7148.0",
         "chr21:10600001-10650001",
         "1",
         null,
         null,
         null
        ],
        [
         "3",
         "2211.0",
         "4065.0",
         "7567.0",
         "chr21:13250001-13300001",
         "1",
         null,
         null,
         null
        ],
        [
         "4",
         "2499.0",
         "3904.0",
         "7255.0",
         "chr21:14000001-14050001",
         "1",
         null,
         null,
         null
        ]
       ],
       "shape": {
        "columns": 8,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Z(nm)</th>\n",
       "      <th>X(nm)</th>\n",
       "      <th>Y(nm)</th>\n",
       "      <th>Genomic coordinate</th>\n",
       "      <th>Chromosome copy number</th>\n",
       "      <th>Gene names</th>\n",
       "      <th>Transcription</th>\n",
       "      <th>TSS ZXY(nm)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2449.0</td>\n",
       "      <td>4700.0</td>\n",
       "      <td>7234.0</td>\n",
       "      <td>chr21:10400001-10450001</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3731.0</td>\n",
       "      <td>4629.0</td>\n",
       "      <td>7409.0</td>\n",
       "      <td>chr21:10500001-10550001</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2248.0</td>\n",
       "      <td>4690.0</td>\n",
       "      <td>7148.0</td>\n",
       "      <td>chr21:10600001-10650001</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2211.0</td>\n",
       "      <td>4065.0</td>\n",
       "      <td>7567.0</td>\n",
       "      <td>chr21:13250001-13300001</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2499.0</td>\n",
       "      <td>3904.0</td>\n",
       "      <td>7255.0</td>\n",
       "      <td>chr21:14000001-14050001</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Z(nm)   X(nm)   Y(nm)       Genomic coordinate  Chromosome copy number  \\\n",
       "0  2449.0  4700.0  7234.0  chr21:10400001-10450001                       1   \n",
       "1  3731.0  4629.0  7409.0  chr21:10500001-10550001                       1   \n",
       "2  2248.0  4690.0  7148.0  chr21:10600001-10650001                       1   \n",
       "3  2211.0  4065.0  7567.0  chr21:13250001-13300001                       1   \n",
       "4  2499.0  3904.0  7255.0  chr21:14000001-14050001                       1   \n",
       "\n",
       "  Gene names Transcription TSS ZXY(nm)  \n",
       "0        NaN           NaN         NaN  \n",
       "1        NaN           NaN         NaN  \n",
       "2        NaN           NaN         NaN  \n",
       "3        NaN           NaN         NaN  \n",
       "4        NaN           NaN         NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from cima.segments.segment_info_xyz import SegmentXYZ\n",
    "\n",
    "chr21_bs_data = pd.read_csv('https://zenodo.org/records/3928890/files/chromosome21.tsv?download=1', sep='\\t')\n",
    "\n",
    "display(chr21_bs_data.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "adf4f016",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "x",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "y",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "z",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "gen_coord",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "chr_copy_num",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "timepoint",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "clusterID",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "record_name",
         "rawType": "str",
         "type": "string"
        },
        {
         "name": "mass",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "04c98968-ee8e-480c-9c5a-4dc1e2bf8d9b",
       "rows": [
        [
         "0",
         "4700.0",
         "7234.0",
         "2449.0",
         "chr21:10400001-10450001",
         "1",
         "0",
         "0",
         "sLOC",
         "10.0"
        ],
        [
         "1",
         "4629.0",
         "7409.0",
         "3731.0",
         "chr21:10500001-10550001",
         "1",
         "0",
         "0",
         "sLOC",
         "10.0"
        ],
        [
         "2",
         "4690.0",
         "7148.0",
         "2248.0",
         "chr21:10600001-10650001",
         "1",
         "0",
         "0",
         "sLOC",
         "10.0"
        ],
        [
         "3",
         "4065.0",
         "7567.0",
         "2211.0",
         "chr21:13250001-13300001",
         "1",
         "0",
         "0",
         "sLOC",
         "10.0"
        ],
        [
         "4",
         "3904.0",
         "7255.0",
         "2499.0",
         "chr21:14000001-14050001",
         "1",
         "0",
         "0",
         "sLOC",
         "10.0"
        ]
       ],
       "shape": {
        "columns": 9,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "      <th>z</th>\n",
       "      <th>gen_coord</th>\n",
       "      <th>chr_copy_num</th>\n",
       "      <th>timepoint</th>\n",
       "      <th>clusterID</th>\n",
       "      <th>record_name</th>\n",
       "      <th>mass</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4700.0</td>\n",
       "      <td>7234.0</td>\n",
       "      <td>2449.0</td>\n",
       "      <td>chr21:10400001-10450001</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4629.0</td>\n",
       "      <td>7409.0</td>\n",
       "      <td>3731.0</td>\n",
       "      <td>chr21:10500001-10550001</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4690.0</td>\n",
       "      <td>7148.0</td>\n",
       "      <td>2248.0</td>\n",
       "      <td>chr21:10600001-10650001</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4065.0</td>\n",
       "      <td>7567.0</td>\n",
       "      <td>2211.0</td>\n",
       "      <td>chr21:13250001-13300001</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3904.0</td>\n",
       "      <td>7255.0</td>\n",
       "      <td>2499.0</td>\n",
       "      <td>chr21:14000001-14050001</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>sLOC</td>\n",
       "      <td>10.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        x       y       z                gen_coord  chr_copy_num  timepoint  \\\n",
       "0  4700.0  7234.0  2449.0  chr21:10400001-10450001             1          0   \n",
       "1  4629.0  7409.0  3731.0  chr21:10500001-10550001             1          0   \n",
       "2  4690.0  7148.0  2248.0  chr21:10600001-10650001             1          0   \n",
       "3  4065.0  7567.0  2211.0  chr21:13250001-13300001             1          0   \n",
       "4  3904.0  7255.0  2499.0  chr21:14000001-14050001             1          0   \n",
       "\n",
       "   clusterID record_name  mass  \n",
       "0          0        sLOC  10.0  \n",
       "1          0        sLOC  10.0  \n",
       "2          0        sLOC  10.0  \n",
       "3          0        sLOC  10.0  \n",
       "4          0        sLOC  10.0  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "chr21_bs_data = chr21_bs_data[['X(nm)', 'Y(nm)', 'Z(nm)','Genomic coordinate','Chromosome copy number']]\n",
    "chr21_bs_data.rename({\n",
    "            'X(nm)': 'x',\n",
    "            'Y(nm)': 'y',\n",
    "            'Z(nm)': 'z',\n",
    "            'Genomic coordinate':'gen_coord',\n",
    "            'Chromosome copy number': 'chr_copy_num'},\n",
    "        axis=1, inplace=True)\n",
    "\n",
    "chr21_bs_data_CIMA = SegmentXYZ(chr21_bs_data)\n",
    "\n",
    "display(chr21_bs_data_CIMA.atomList.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "29dc2c8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0]\n"
     ]
    }
   ],
   "source": [
    "print(chr21_bs_data_CIMA.atomList.timepoint.unique())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8236d440",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<StringArray>\n",
      "['chr21:10400001-10450001', 'chr21:10500001-10550001',\n",
      " 'chr21:10600001-10650001', 'chr21:13250001-13300001',\n",
      " 'chr21:14000001-14050001', 'chr21:14050001-14100001',\n",
      " 'chr21:14100001-14150001', 'chr21:14150001-14200001',\n",
      " 'chr21:14200001-14250001', 'chr21:14250001-14300001',\n",
      " ...\n",
      " 'chr21:46200001-46250001', 'chr21:46250001-46300001',\n",
      " 'chr21:46300001-46350001', 'chr21:46350001-46400001',\n",
      " 'chr21:46400001-46450001', 'chr21:46450001-46500001',\n",
      " 'chr21:46500001-46550001', 'chr21:46550001-46600001',\n",
      " 'chr21:46600001-46650001', 'chr21:46650001-46700001']\n",
      "Length: 651, dtype: str\n"
     ]
    }
   ],
   "source": [
    "print(chr21_bs_data_CIMA.atomList.gen_coord.unique())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "CIMA_testing",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}