2 - Experimental quality assessment

Description: This notebook contains an example on how to check for outliers regarding numerosity.

Removing this outliers is recommended (is the default for the other notebooks).

In this notebooks we first check if the distribution of numerosities across an experiment (normally tied to the same chromosome / genomic region) is homogeneous or some of the traces stand out as outliers.

We do this in two steps.

  • Kruskal-Wallis test: the aim of this test is to identify if we have possible outliers. This tests only gives this information. To know which are the outliers we perform a second test. This test is an all-vs-all.

  • Kolmogorov-Smirnov test: the aim of this test is to identify if a particular trace is an outlier in respect to the background distribution of numerosities. This background distribution is build by adding up all the numerosities from all the experiments for that region. This test is a one-vs-all.

The following index has links to the different sections of the notebook. Some sections may have additional indexes to access the subparts.
If you are using VS Code to visualize the notebook, the links may not work. This is due to how VC Code render the notebook. To navigate the notebook use the outline panel. To know more about it, check this link.

Content:

Next notebook to run is:
3D Reconstruction Assessment

Library imports and functions

Back to the general index

import re
from pathlib import Path
from collections import defaultdict

import pandas as pd
import numpy as np
from cima.parsers.parser_csv import CSVParser
from scipy.stats import ks_2samp, kruskal

import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
import seaborn as sns
import matplotlib as mpl
from matplotlib.collections import PolyCollection

## Regular expressions to get the metadata from the filename
regex_patterns = {
    "nucleusID": re.compile(r"(?i)nuc(\d+)"), # This searches for "Nuc" or "nuc" followed by digits
    "cellID": re.compile(r"(?i)cell(\d+)"), # This searches for "Cell" or "cell" followed by digits
    "locationID" : re.compile(r"(?i)loc-?(\d+)"), # This searches for "Loc" or "loc" followed by optional hyphen and digits
    "date" : re.compile(r"(\d{4}[-\.]\d{2}[-\.]\d{2})"), # This searches for dates in the format YYYY-MM-DD or YYYY.MM.DD
    "homolog" : re.compile(r"_([ABPM01pm])"), ## Added just in case the homolog is in the filename.
    "chromosome" : re.compile(r"(?i)_(chr[\d+MXY])") ## Added just in case the chromosome is in the filename.
    }

# Set the options for saving the plots
plot_opts = {"dpi": 300, "bbox_inches": 'tight', "transparent": True}
def key_explicit(s):
    date, rest = s.split("_cell")
    cell_num, suffix = rest.split("_")
    return (date, int(cell_num), suffix)

Setting some variables

In the following cell you should set up some variables for the script to run properly. Change the variables at your convenience, and then you can press “Run all” in the Jupyter Notebook.

This notebook assumes that your csv files are in SRX format.

Brief explanation of the different variables:

  • data_file: this variable is where the csv files to be assessed are stored.

  • work_folder: this variable is where the plots and the tsv files that the script creates will be stored.

  • number_of_steps_library: the number of steps in your library.

  • starting_step: the step at which analysis should start. Provide a dictionary using the chromosomes/folder names of your datasets and the starting step (usually located in the info file as the 'Round' column).

  • kw_test_thresholds: the threshold to consider for the Kruskal-Wallis test.

  • save_plots: save the plots into a folder or show them in the notebook.

The notebook will create a folder in the work_folder path called 2_experimental_quality_assessment where the plots/files created will be stored.

For the purposes of this notebook, save_plots is set to False to show the plots. By doing so only plots for chr3 will be shown. If you want to obtain the plots for the other chromosomes, run this notebook with save_plots set to True and check the folder.

Back to the general index

data_folder = "/scratch/CIMA_tutorial/data/"
work_folder = "/scratch/CIMA_tutorial/"

number_of_steps_library = 16
starting_step = {"chr3" : 6, "chr5" : 5, "chr8" : 4,"chr16" : 2, "chrX" : 3}

kw_test_thresholds = 0.01

save_plots = True

experimental_quality_assessment_folder = Path(work_folder, "2_experimental_quality_assessment")
experimental_quality_assessment_folder.mkdir(parents=True, exist_ok=True)

nice_feature_names = {
    "numerosity": "Numerosity",
    "precX": "X-axis precision",
    "precY": "Y-axis precision",
    "precZ": "Z-axis precision",}

Exporting the data and running the tests

We tests four different things:

  • Deviations in numerosity.
  • Deviations in precision in the X axis.
  • Deviations in precision in the Y axis.
  • Deviations in precision in the Z axis.

Then we will do violin plots to show the data of each trace against the “background distribution”. We create this “background distribution” by aggregating the data of the different traces.

Back to the general index

kw_test_data = defaultdict(list)

violin_plot_data = {}
ks_test_data = {}

for chrom in starting_step.keys():

    violin_plot_chr = defaultdict(list)

    data_folder_chr = Path(data_folder, chrom)

    print(f"Processing CSV files in folder: {data_folder_chr}")

    # Get a list of all CSV files in the specified data folder
    csv_files = list(data_folder_chr.glob('*.csv'))
    if len(csv_files) == 0:
        print("No files found, please input the correct work_folder")
    else:
        print(f"Total number of files found: {len(csv_files)}")

    # Get a list of metadata keys to extract from filenames
    metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]

    # Initialize an empty list to store the StructuralObject instances
    obj_list = []

    # Initialize lists to store numerosity data for KS test and violin plot
    features_total = defaultdict(list)

    # Initialize a dictionary to store numerosities by file for the KS test
    features_by_file = defaultdict(lambda: defaultdict(list))

    if len(csv_files) > 0:
        # Iterate over each file in the list of file names
        for count, filein in enumerate(csv_files, start=1):

            stem = Path(filein).stem

            print(f"Processing file {count}: {stem}", end="\r")

            # We mine the metadata from the filename using regex patterns. If not possible, assign default values.
            metadata_CIMA = {
                key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
                for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
            }

            # Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
            if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
                metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]

            metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
            metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
            metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])

            # Set experimentID as date for consistency
            metadata_CIMA["experimentID"] = metadata_CIMA["date"]

            # Read the CSV file and create a StructuralObject instance
            objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")

            objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
                                            (objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
            
            for tp in objin.atomList["timepoint"].unique():
                # Take out the timepoint from the atomList if it has less tha 10 points
                tp_mask = objin.atomList["timepoint"] == tp
                if tp_mask.sum() < 10:
                    print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
                    objin.atomList = objin.atomList.loc[~tp_mask].copy()
            
            # We store the numerosities for each timepoint in the file for the KW/KS tests.
            for tp in objin.atomList.timepoint.unique():
                temp = objin.get_Time_segment(timepoint=tp)
                features_total["numerosity"].append(temp.atomList.shape[0])
                features_total["precX"].append(np.nanmedian(temp.atomList.xprec))
                features_total["precY"].append(np.nanmedian(temp.atomList.yprec))
                features_total["precZ"].append(np.nanmedian(temp.atomList.zprec))
                features_by_file[stem]["numerosity"].append(temp.atomList.shape[0])
                features_by_file[stem]["precX"].append(np.nanmedian(temp.atomList.xprec))
                features_by_file[stem]["precY"].append(np.nanmedian(temp.atomList.yprec))
                features_by_file[stem]["precZ"].append(np.nanmedian(temp.atomList.zprec))
            
            # Append the object to the list
            obj_list.append(objin)

        # Nice print to tell the user the number of objects created.
        print(f"\nTotal number of objects created: {len(obj_list)}")

        # Clean up memory by deleting some of the variables no longer needed
        del count, filein, stem, metadata_CIMA, match

        if 'default' in locals():
            del default
        if 'key' in locals():
            del key

    del csv_files, metadata_keys

    ks_test_summary = defaultdict(list)

    for objin in obj_list:

        repID = f"{objin.metadata['experimentID']}_cell{objin.metadata['nucleusID']}_{objin.metadata['homolog']}"

        file_features = features_by_file[Path(objin.metadata["filename"]).stem]

        ks_test_summary["filename"].append(Path(objin.metadata["filename"]).stem)

        for feature in ["numerosity", "precX", "precY", "precZ"]:
            ks_test = ks_2samp(file_features[feature], features_total[feature])
            ks_test_summary[f"statistic_{feature}"].append(ks_test.statistic)
            ks_test_summary[f"pvalue_{feature}"].append(ks_test.pvalue)
        
        ks_test_summary["n_total"].append(len(features_total["numerosity"]))

        violin_plot_chr["filename"].extend([Path(objin.metadata["filename"]).stem] * len(file_features["precX"]))
        violin_plot_chr["expID"].extend([objin.metadata["experimentID"]] * len(file_features["precX"]))
        violin_plot_chr["repID"].extend([repID] * len(file_features["precX"]))
        violin_plot_chr["numerosity"].extend(file_features["numerosity"])
        violin_plot_chr["precX"].extend(file_features["precX"])
        violin_plot_chr["precY"].extend(file_features["precY"])
        violin_plot_chr["precZ"].extend(file_features["precZ"])

    violin_plot_data[chrom] = pd.DataFrame(violin_plot_chr)
    ks_test_data[chrom] = pd.DataFrame(ks_test_summary)

    kw_test_data["chromosome"].append(chrom)

    for feature in ["numerosity", "precX", "precY", "precZ"]:
        kw_test = kruskal(*[features_by_file[fn][feature] for fn in features_by_file])
        kw_test_data[f"statistic_{feature}"].append(kw_test.statistic)
        kw_test_data[f"pvalue_{feature}"].append(kw_test.pvalue)

    del feature, features_by_file, file_features, ks_test, ks_test_summary, kw_test, repID, temp, tp, violin_plot_chr
    del objin, features_total

    print(f"\n{'-'*54}\n")

kw_test_data = pd.DataFrame(kw_test_data)

del chrom, data_folder_chr, obj_list
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Processing file 33: 2018-09-04_nuc06_A_chr3
Total number of objects created: 33

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Processing file 35: 2018-09-04_nuc04_A_chr5
Total number of objects created: 35

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Processing file 22: 2018-09-04_nuc01_B_chr8
Total number of objects created: 22

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Processing file 10: 2018-09-04_nuc04_A_chr16
Total number of objects created: 10

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Processing file 5: 2018-09-04_nuc01_A_chrX
Total number of objects created: 5

------------------------------------------------------

Violin plots of the different distributions of the traces against the background

Here we create the mentioned violin plots.

The trace is on the right, colored by date. The background is on the right, colored in grey.

color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}
default_trace_color = "skyblue"
bg_color = "lightgray"

for chrom in violin_plot_data.keys():

    if not save_plots and chrom != "chr3":
        continue

    for feature in ["numerosity", "precX", "precY", "precZ"]:

        print(f"Creating violin plot for chromosome {chrom} and feature: {nice_feature_names[feature]}")

        violin_plot_chr = violin_plot_data[chrom]
        ks_test_chr = ks_test_data[chrom]
        numerosity_total_chr = violin_plot_chr[feature].to_numpy()

        groups = sorted(violin_plot_chr["repID"].unique(), key=key_explicit)

        rows = []

        for gr in groups:
            cond_vals = violin_plot_chr.loc[violin_plot_chr["repID"] == gr, feature].to_numpy()
            rows.append(pd.DataFrame({
                "Group": f"{gr}",
                "Side": "Trace",
                "Value": cond_vals
            }))
            rows.append(pd.DataFrame({
                "Group": f"{gr}",
                "Side": "Background",
                "Value": numerosity_total_chr
            }))

        plot_df = pd.concat(rows, ignore_index=True)
        plot_df["Pair"] = "Pair" 

        # For vertical set x to Side and y to Value
        # For horizontal set x to Value and y to Side
        g = sns.catplot(
            data = plot_df,
            x = "Side", # Side will put a space in the middle, Pair will take it out
            y = "Value",
            hue = "Side", # Side will determine the color (Trace vs Background)
            col = "Group", # Group will determine the facetting (one facet per file)
            col_wrap = 7, # N violins per row
            kind = "violin", # Type of estimation to use
            split = True, # Split violins to show Trace vs Background in the same violin
            inner = "quart", # Show the quartiles inside the violins
            linewidth = 1, height = 2.4, aspect = 0.9,
            sharey = True, # Share the y-axis across facets to make them comparable
            hue_order = ["Trace", "Background"],
            palette = {"Trace": default_trace_color, "Background": bg_color},
        )

        g.set_titles("")
        g.set_ylabels(nice_feature_names[feature])
        g.set_xlabels("")

        col2check = "Group" if "Group" in violin_plot_chr.columns else "repID"

        # Draw a rectangle around the ones that failed the KS test at p-value < 0.05
        for ax, gr in zip(g.axes.flat, groups):

            temp = gr.split("_")
            nice_title = f"Cell {temp[1].replace('cell', '')} allele {temp[2]}"
            ax.set_title(nice_title, fontsize = 10)

            mask = violin_plot_chr[col2check] == gr
            filename_temp = violin_plot_chr.loc[mask, "filename"].iat[0]

            exp_val = violin_plot_chr.loc[mask, "expID"].iat[0]
            trace_color = color_palette.get(exp_val, default_trace_color)

            # Find PolyCollection objects (violin halves). seaborn draws one poly per half.
            polys = [c for c in ax.collections if isinstance(c, PolyCollection)]

            # Apply colors to first two polys if present (Trace, Background)
            for poly, color in zip(polys[:2], (trace_color, bg_color)):
                try:
                    poly.set_facecolor(mpl.colors.to_rgba(color))
                except Exception:
                    pass

            # Get per-file n_total (if present) and derive a sensible p-threshold
            n_total_series = ks_test_chr.loc[ks_test_chr["filename"] == filename_temp, "n_total"]
            if not n_total_series.empty and int(n_total_series.iat[0]) > 0:
                p_threshold = 0.05 / int(n_total_series.iat[0])
            else:
                p_threshold = 0.05

            failed = ks_test_chr.loc[
                (ks_test_chr["filename"] == filename_temp) &
                (ks_test_chr[f"pvalue_{feature}"] < p_threshold)
            ]

            if not failed.empty:
                ax.text(
                    0.97, 0.97,
                    "*",
                    transform = ax.transAxes,
                    ha = "right", va = "top",
                    fontsize = 18, fontweight = "bold", color = "red",
                    path_effects = [pe.withStroke(linewidth = 3, foreground = "white")],
                )

        # Remove duplicate legends from facets; keep one
        if g._legend is not None:
            g._legend.set_title("")
            g._legend.set_bbox_to_anchor((1.02, 0.5))
            g._legend._loc = 6

        plt.tight_layout()

        if save_plots:
            plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
            plot_folder.mkdir(parents=True, exist_ok=True)
            filename = Path(plot_folder, f"violin_plot_{feature}_{chrom}.pdf")
            print(f"Saving plots to: {filename}")
            plt.savefig(filename, **plot_opts)
            del plot_folder, filename
        else:
            plt.show()

        plt.close()

    print(f"\n{'-'*54}\n")

del chrom, violin_plot_chr, ks_test_chr, groups, rows, gr, cond_vals, plot_df, g, col2check, feature
del ax, temp, nice_title, mask, filename_temp, exp_val, trace_color, polys, poly, color, n_total_series, p_threshold, failed
del bg_color, default_trace_color, color_palette, numerosity_total_chr
Creating violin plot for chromosome chr3 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_numerosity_chr3.pdf
Creating violin plot for chromosome chr3 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precX_chr3.pdf
Creating violin plot for chromosome chr3 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precY_chr3.pdf
Creating violin plot for chromosome chr3 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precZ_chr3.pdf

------------------------------------------------------

Creating violin plot for chromosome chr5 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_numerosity_chr5.pdf
Creating violin plot for chromosome chr5 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precX_chr5.pdf
Creating violin plot for chromosome chr5 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precY_chr5.pdf
Creating violin plot for chromosome chr5 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precZ_chr5.pdf

------------------------------------------------------

Creating violin plot for chromosome chr8 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_numerosity_chr8.pdf
Creating violin plot for chromosome chr8 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precX_chr8.pdf
Creating violin plot for chromosome chr8 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precY_chr8.pdf
Creating violin plot for chromosome chr8 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precZ_chr8.pdf

------------------------------------------------------

Creating violin plot for chromosome chr16 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_numerosity_chr16.pdf
Creating violin plot for chromosome chr16 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precX_chr16.pdf
Creating violin plot for chromosome chr16 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precY_chr16.pdf
Creating violin plot for chromosome chr16 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precZ_chr16.pdf

------------------------------------------------------

Creating violin plot for chromosome chrX and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_numerosity_chrX.pdf
Creating violin plot for chromosome chrX and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precX_chrX.pdf
Creating violin plot for chromosome chrX and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precY_chrX.pdf
Creating violin plot for chromosome chrX and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precZ_chrX.pdf

------------------------------------------------------

Plotting X/Y/Z axis precision against numerosity

We can also plot the X/Y/Z axis mean precision of a given chromatin segment against its numerosity.

Back to the general index

color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}

for chrom in violin_plot_data.keys():

    if not save_plots and chrom != "chr3":
        continue

    violin_plot_chr = violin_plot_data[chrom]

    sns.scatterplot(data=violin_plot_chr, x="precX", y="numerosity", hue="expID", palette=color_palette, legend=False)
    plt.xlabel("X-axis precision")
    plt.ylabel("Numerosity")
    plt.tight_layout()
    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"precX_vs_numerosity_{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()

    sns.scatterplot(data=violin_plot_chr, x="precY", y="numerosity", hue="expID", palette=color_palette, legend=False)
    plt.xlabel("Y-axis precision")
    plt.ylabel("Numerosity")
    plt.tight_layout()
    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"precY_vs_numerosity_{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()

    sns.scatterplot(data=violin_plot_chr, x="precZ", y="numerosity", hue="expID", palette=color_palette, legend=False)
    plt.xlabel("Z-axis precision")
    plt.ylabel("Numerosity")
    plt.tight_layout()
    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"precZ_vs_numerosity_{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precX_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precY_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precZ_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precX_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precY_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precZ_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precX_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precY_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precZ_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precX_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precY_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precZ_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precX_vs_numerosity_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precY_vs_numerosity_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precZ_vs_numerosity_chrX.pdf

Getting a file per chromosome with the traces that pass the assessment

We create a file per each chromosome with the traces that passed the experimental assessment.

This way the user has a reference file to filter the traces in the next steps, without needing to move the files around.

Back to the general index

for chrom in ks_test_data.keys():

    ks_test_chr = ks_test_data[chrom]
    kw_test_chr = kw_test_data[kw_test_data["chromosome"] == chrom]

    Path(experimental_quality_assessment_folder, "tsv").mkdir(parents=True, exist_ok=True)

    ks_test_chr.to_csv(Path(experimental_quality_assessment_folder, "tsv", f"ks_test_summary_{chrom}.tsv"), index=False, sep="\t")
    kw_test_chr.to_csv(Path(experimental_quality_assessment_folder, "tsv", f"kw_test_summary_{chrom}.tsv"), index=False, sep="\t")

    for feature in ["numerosity", "precX", "precY", "precZ"]:
        print(f"Kruskal-Wallis test for chromosome {chrom} and feature {nice_feature_names[feature]}: p-value={kw_test_chr[f'pvalue_{feature}'].iat[0]:.4e}")

    trace_pass = []

    for _, row in ks_test_chr.iterrows():

        # display(row)

        p_value_threshold = 0.05 / row["n_total"] if row["n_total"] > 0 else 0.05

        if (row["pvalue_numerosity"] > p_value_threshold) & \
           (row["pvalue_precX"] > p_value_threshold) & \
           (row["pvalue_precY"] > p_value_threshold) & \
           (row["pvalue_precZ"] > p_value_threshold):
            
            trace_pass.append(row["filename"])

        # Write the contents of taces_pass to a text file
        with open(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), "w") as f:
            for item in trace_pass:
                f.write(f"{item}\n")

    
    print(f"Traces that \033[1;4mpassed\033[0m the KS test for chromosome {chrom}: {len(trace_pass)} out of {ks_test_chr.shape[0]} total traces.")
    print(f"\n{'-'*54}\n")

del chrom, ks_test_chr, kw_test_chr, trace_pass, row, p_value_threshold, f, item, feature
Kruskal-Wallis test for chromosome chr3 and feature Numerosity: p-value=2.3501e-14
Kruskal-Wallis test for chromosome chr3 and feature X-axis precision: p-value=5.0175e-08
Kruskal-Wallis test for chromosome chr3 and feature Y-axis precision: p-value=6.3053e-15
Kruskal-Wallis test for chromosome chr3 and feature Z-axis precision: p-value=8.1420e-09
Traces that passed the KS test for chromosome chr3: 30 out of 33 total traces.

------------------------------------------------------

Kruskal-Wallis test for chromosome chr5 and feature Numerosity: p-value=2.7458e-15
Kruskal-Wallis test for chromosome chr5 and feature X-axis precision: p-value=4.7086e-09
Kruskal-Wallis test for chromosome chr5 and feature Y-axis precision: p-value=3.7215e-13
Kruskal-Wallis test for chromosome chr5 and feature Z-axis precision: p-value=8.6281e-10
Traces that passed the KS test for chromosome chr5: 33 out of 35 total traces.

------------------------------------------------------

Kruskal-Wallis test for chromosome chr8 and feature Numerosity: p-value=2.4712e-05
Kruskal-Wallis test for chromosome chr8 and feature X-axis precision: p-value=3.0020e-04
Kruskal-Wallis test for chromosome chr8 and feature Y-axis precision: p-value=3.5394e-06
Kruskal-Wallis test for chromosome chr8 and feature Z-axis precision: p-value=2.4363e-04
Traces that passed the KS test for chromosome chr8: 22 out of 22 total traces.

------------------------------------------------------

Kruskal-Wallis test for chromosome chr16 and feature Numerosity: p-value=3.9739e-01
Kruskal-Wallis test for chromosome chr16 and feature X-axis precision: p-value=7.2336e-01
Kruskal-Wallis test for chromosome chr16 and feature Y-axis precision: p-value=5.9635e-01
Kruskal-Wallis test for chromosome chr16 and feature Z-axis precision: p-value=3.1732e-01
Traces that passed the KS test for chromosome chr16: 10 out of 10 total traces.

------------------------------------------------------

Kruskal-Wallis test for chromosome chrX and feature Numerosity: p-value=1.9648e-07
Kruskal-Wallis test for chromosome chrX and feature X-axis precision: p-value=4.9749e-01
Kruskal-Wallis test for chromosome chrX and feature Y-axis precision: p-value=2.3816e-01
Kruskal-Wallis test for chromosome chrX and feature Z-axis precision: p-value=3.1606e-01
Traces that passed the KS test for chromosome chrX: 4 out of 5 total traces.

------------------------------------------------------

Plotting efficiency matrices

Here we plot what we call efficiency matrices.

They show the different traces and (one per row) and the different chromatin segments (one per column).

If a chromatin segment is present in a trace, it is shown as a blue square; if absent is shown as a white square.

Back to the general index

color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}

print("Creating efficiency matrices and plots for each chromosome...\n")

for chrom in starting_step.keys():

    if save_plots == False and chrom != "chr3":
        continue

    data_folder_chr = Path(data_folder, chrom)

    violin_plot_chr = violin_plot_data[chrom]

    print(f"Processing CSV files in folder: {data_folder_chr}")

    # Get a list of all CSV files in the specified data folder
    csv_files = list(data_folder_chr.glob('*.csv'))
    if len(csv_files) == 0:
        print("No files found, please input the correct work_folder")
    else:
        print(f"Total number of files found: {len(csv_files)}")

    traces_pass = pd.read_csv(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), sep="\t", names=["trace"])

    # Get a list of metadata keys to extract from filenames
    metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]

    file_names = []

    if len(csv_files) > 0:
        efficiency_matrix_1 = np.zeros((len(csv_files), number_of_steps_library))
        efficiency_matrix_2 = np.zeros((len(csv_files), number_of_steps_library))
        # Iterate over each file in the list of file names
        for count, filein in enumerate(csv_files, start=1):

            stem = Path(filein).stem

            file_names.append(violin_plot_chr[violin_plot_chr["filename"] == stem]["repID"].unique()[0])

            print(f"Processing file {count}: {stem}", end="\r")

            # We mine the metadata from the filename using regex patterns. If not possible, assign default values.
            metadata_CIMA = {
                key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
                for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
            }

            # Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
            if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
                metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]

            metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
            metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
            metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])

            # Set experimentID as date for consistency
            metadata_CIMA["experimentID"] = metadata_CIMA["date"]

            # Read the CSV file and create a StructuralObject instance
            objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")

            objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
                                            (objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
            
            for tp in objin.atomList["timepoint"].unique():
                # Take out the timepoint from the atomList if it has less tha 10 points
                tp_mask = objin.atomList["timepoint"] == tp
                if tp_mask.sum() < 10:
                    print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
                    objin.atomList = objin.atomList.loc[~tp_mask].copy()


            # We store the numerosities for each timepoint in the file for the KW/KS tests.
            for tp in objin.atomList.timepoint.unique():
                efficiency_matrix_1[count-1, tp-starting_step[chrom]] = 1
                if stem in traces_pass["trace"].values:
                    efficiency_matrix_2[count-1, tp-starting_step[chrom]] = 1

        efficiency_matrix = pd.DataFrame(efficiency_matrix_1, columns = [f"{i+1}" for i in range(number_of_steps_library)])
        efficiency_matrix.index = file_names
        rownames = [temp.split("_")[1] for temp in efficiency_matrix.index]
        rownames = [f"Cell {temp.split('_')[1].replace('cell', '')} allelle {temp.split('_')[2]}" for temp in efficiency_matrix.index]

        fig_height = max(2, 0.3 * len(efficiency_matrix.index))

        g = sns.clustermap(data = efficiency_matrix, figsize=(5, fig_height),
               vmin=0, vmax=1, cmap=['#f7f7f7', '#00008B'], 
               row_colors=[color_palette.get(d.split("_")[0], "#a6cee3") for d in efficiency_matrix.index],
               col_cluster=False, row_cluster=False,
               yticklabels=rownames,
               xticklabels=[f"{i+1}" for i in range(efficiency_matrix.shape[1])],
        )

        g.ax_heatmap.set_xlabel('Step ID')
        g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xticklabels(), rotation=0, fontsize=8)
        g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=8)
        g.ax_row_dendrogram.set_visible(False)
        g.ax_col_dendrogram.set_visible(False)
        g.ax_cbar.set_visible(False)

        if save_plots:
            plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
            plot_folder.mkdir(parents=True, exist_ok=True)
            filename = Path(plot_folder, f"efficiency_matrix_pre_experimental_assessment_{chrom}.pdf")
            print(f"Saving plots to: {filename}")
            plt.savefig(filename, **plot_opts)
            del plot_folder, filename
        else:
            plt.show()
        plt.close()

        efficiency_matrix = pd.DataFrame(efficiency_matrix_2, columns = [f"Step{i+1}" for i in range(number_of_steps_library)])
        efficiency_matrix.index = file_names

        g = sns.clustermap(data = efficiency_matrix, figsize=(5, fig_height),
               vmin=0, vmax=1, cmap=['#f7f7f7', '#00008B'], 
               row_colors=[color_palette.get(d.split("_")[0], "#a6cee3") for d in efficiency_matrix.index],
               col_cluster=False, row_cluster=False,
               yticklabels=rownames,
               xticklabels=[f"{i+1}" for i in range(efficiency_matrix.shape[1])],
        )

        g.ax_heatmap.set_xlabel('Step ID')
        g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xticklabels(), rotation=0, fontsize=8)
        g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=8)
        g.ax_row_dendrogram.set_visible(False)
        g.ax_col_dendrogram.set_visible(False)
        g.ax_cbar.set_visible(False)

        if save_plots:
            plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
            plot_folder.mkdir(parents=True, exist_ok=True)
            filename = Path(plot_folder, f"efficiency_matrix_post_experimental_assessment_{chrom}.pdf")
            print(f"Saving plots to: {filename}")
            plt.savefig(filename, **plot_opts)
            del plot_folder, filename
        else:
            plt.show()
        plt.close()

    print(f"\n{'-'*54}\n")

del chrom, data_folder_chr, csv_files, traces_pass, metadata_keys, efficiency_matrix_1, efficiency_matrix_2
del count, filein, stem, metadata_CIMA, match, objin, tp, efficiency_matrix, g, fig_height, file_names, rownames
del violin_plot_chr
Creating efficiency matrices and plots for each chromosome...

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/efficiency_matrix_pre_experimental_assessment_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/efficiency_matrix_post_experimental_assessment_chr3.pdf

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/efficiency_matrix_pre_experimental_assessment_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/efficiency_matrix_post_experimental_assessment_chr5.pdf

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/efficiency_matrix_pre_experimental_assessment_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/efficiency_matrix_post_experimental_assessment_chr8.pdf

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/efficiency_matrix_pre_experimental_assessment_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/efficiency_matrix_post_experimental_assessment_chr16.pdf

------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/efficiency_matrix_pre_experimental_assessment_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/efficiency_matrix_post_experimental_assessment_chrX.pdf

------------------------------------------------------

Anisotropy factor

Here we calculate and plot the anisotropy factor.

The anisotrpy factor is a measure on how different is the error of the localization in the different axis.

It is calculated following this formula:

\[ \text{Anisotropy factor} = \frac{\operatorname{mean}(x_{\text{spread}},\ y_{\text{spread}})}{z_{\text{spread}}} \]

In the plots, lines are shown for 0.2, 0.3 and 0.5 to give the user a visual reference.

Back to the general index

collection_prec = defaultdict(list)
for chrom in starting_step.keys():

    data_folder_chr = Path(data_folder, chrom)

    print(f"Processing CSV files in folder: {data_folder_chr}")

    # Get a list of all CSV files in the specified data folder
    csv_files = list(data_folder_chr.glob('*.csv'))
    if len(csv_files) == 0:
        print("No files found, please input the correct work_folder")
    else:
        print(f"Total number of files found: {len(csv_files)}")

    traces_pass = pd.read_csv(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), sep="\t", names=["trace"])

    # Get a list of metadata keys to extract from filenames
    metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]

    obj_list = []

    if len(csv_files) > 0:
        # Iterate over each file in the list of file names
        for count, filein in enumerate(csv_files, start=1):

            stem = Path(filein).stem

            if stem not in traces_pass["trace"].values:
                print(f"Skipping file {count}: {stem} because it did not pass the experimental quality assessment.")
                continue

            print(f"Processing file {count}: {stem}", end="\r")

            # We mine the metadata from the filename using regex patterns. If not possible, assign default values.
            metadata_CIMA = {
                key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
                for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
            }

            # Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
            if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
                metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]

            metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
            metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
            metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])

            # Set experimentID as date for consistency
            metadata_CIMA["experimentID"] = metadata_CIMA["date"]

            # Read the CSV file and create a StructuralObject instance
            objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")

            objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
                                            (objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
            
            for tp in objin.atomList["timepoint"].unique():
                # Take out the timepoint from the atomList if it has less tha 10 points
                tp_mask = objin.atomList["timepoint"] == tp
                if tp_mask.sum() < 10:
                    print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
                    objin.atomList = objin.atomList.loc[~tp_mask].copy()

            obj_list.append(objin)

        for obj in obj_list:
            for tp in obj.atomList["timepoint"].unique():
                # Take out the timepoint from the atomList if it has less tha 10 points
                tp_mask = obj.atomList["timepoint"] == tp
                temp = obj.atomList.loc[tp_mask].copy()

                x_spread = max(temp["xprec"].values) - min(temp["xprec"].values)
                y_spread = max(temp["yprec"].values) - min(temp["yprec"].values)
                z_spread = max(temp["zprec"].values) - min(temp["zprec"].values)
                anisotropy_factor = np.nanmean([x_spread, y_spread]) / z_spread

                collection_prec["chromosome"].append(chrom)
                collection_prec["file"].append(obj.metadata["experimentID"] + "_" + str(obj.metadata["nucleusID"]) + "_" + obj.metadata["homolog"])
                collection_prec["tp"].append(tp)
                collection_prec["spread_x"].append(x_spread)
                collection_prec["spread_y"].append(y_spread)
                collection_prec["spread_z"].append(z_spread)
                collection_prec["anisotropy_factor"].append(anisotropy_factor)

    print(f"\n{'-'*54}\n")

collection_prec = pd.DataFrame(collection_prec)

del chrom, data_folder_chr, csv_files, count, filein, traces_pass, metadata_keys, stem, metadata_CIMA, match, objin
del tp, tp_mask, obj_list, obj, temp, x_spread, y_spread, z_spread, anisotropy_factor
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Skipping file 2: 2018-07-10_nuc04_A_chr3 because it did not pass the experimental quality assessment.
Skipping file 5: 2018-07-10_nuc03_A_chr3 because it did not pass the experimental quality assessment.
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Skipping file 26: 2018-09-04_nuc01_A_chr3 because it did not pass the experimental quality assessment.
Processing file 33: 2018-09-04_nuc06_A_chr3
------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Skipping file 2: 2018-07-10_nuc04_A_chr5 because it did not pass the experimental quality assessment.
Skipping file 34: 2018-09-04_nuc01_A_chr5 because it did not pass the experimental quality assessment.
Processing file 35: 2018-09-04_nuc04_A_chr5
------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Processing file 22: 2018-09-04_nuc01_B_chr8
------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Processing file 10: 2018-09-04_nuc04_A_chr16
------------------------------------------------------

Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Skipping file 2: 2018-09-04_nuc04_A_chrX because it did not pass the experimental quality assessment.
Processing file 5: 2018-09-04_nuc01_A_chrX
------------------------------------------------------
for chrom in starting_step.keys():

    temp = collection_prec[collection_prec["chromosome"] == chrom]
    print(f"For {chrom} the min anisotropy factor is {temp.anisotropy_factor.min()}")
    print(f"For {chrom} the max anisotropy factor is {temp.anisotropy_factor.max()}\n")

del chrom, temp
For chr3 the min anisotropy factor is 0.22194411423835986
For chr3 the max anisotropy factor is 0.8102558862289374

For chr5 the min anisotropy factor is 0.18651602294796202
For chr5 the max anisotropy factor is 0.7496463612103765

For chr8 the min anisotropy factor is 0.17096927338239262
For chr8 the max anisotropy factor is 0.5108808891327506

For chr16 the min anisotropy factor is 0.1550711582050499
For chr16 the max anisotropy factor is 0.5208693780499624

For chrX the min anisotropy factor is 0.17967870020756443
For chrX the max anisotropy factor is 0.4708960725633753
for chrom in starting_step.keys():

    if not save_plots and chrom != "chr3":
        continue
    
    print(f"Plotting anisotropy factor per trace for chromosome {chrom}...")

    collection_prec_chr = (
        collection_prec.loc[collection_prec["chromosome"] == chrom, ["file", "tp", "anisotropy_factor"]]
        .sort_values(["file", "tp"])
        .copy()
    )

    g = sns.relplot(
        data=collection_prec_chr,
        x="tp",
        y="anisotropy_factor",
        col="file",
        col_wrap=6,          # Number of plots per row, adjust as needed
        kind="line",
        marker="o",
        linewidth=1.2,
        height=2.5,
        aspect=1.2,
        facet_kws={"sharey": True, "sharex": True},
        # ylim = (0, 1)
    )

    g.set_axis_labels("Timepoint (tp)", "Anisotropy factor")
    g.set_titles("{col_name}")
    g.set(ylim=(0,1))

    for ax in g.axes.flat:
        x_min = int(collection_prec_chr["tp"].min())
        x_max = int(collection_prec_chr["tp"].max())
        ax.set_xticks(range(x_min, x_max + 1))
        ax.set_xticklabels(range(1, number_of_steps_library + 1), rotation=0)
        ax.axhline(0.25, color = "red", alpha = 0.25, zorder = 0)
        ax.axhline(0.3, color = "green", alpha = 0.25, zorder = 0)
        ax.axhline(0.5, color = "black", alpha = 0.25, zorder = 0)

    plt.tight_layout()

    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"anisotropy_factor_per_trace{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()

    print(f"\n{'-'*54}\n")

del chrom, g, ax, collection_prec_chr, x_max, x_min
Plotting anisotropy factor per trace for chromosome chr3...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_per_tracechr3.pdf

------------------------------------------------------

Plotting anisotropy factor per trace for chromosome chr5...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_per_tracechr5.pdf

------------------------------------------------------

Plotting anisotropy factor per trace for chromosome chr8...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_per_tracechr8.pdf

------------------------------------------------------

Plotting anisotropy factor per trace for chromosome chr16...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_per_tracechr16.pdf

------------------------------------------------------

Plotting anisotropy factor per trace for chromosome chrX...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_per_tracechrX.pdf

------------------------------------------------------
for chrom in collection_prec["chromosome"].unique():

    if not save_plots and chrom != "chr3":
        continue

    print(f"Plotting anisotropy factor distribution for chromosome {chrom}...")

    collection_prec_chr = collection_prec[collection_prec["chromosome"] == chrom]

    sns.boxenplot(data = collection_prec_chr, x = "tp", y = "anisotropy_factor")
    plt.ylim(min(collection_prec_chr["anisotropy_factor"].min() * 0.9, 0), collection_prec_chr["anisotropy_factor"].max() * 1.1)
    plt.xlabel("Timepoint")
    plt.ylabel("Anisotropy factor (mean(X,Y) spread / Z spread )")
    plt.axhline(0.25, color = "red")
    plt.axhline(0.20, color = "blue")
    plt.title(f"Chromosome {chrom} - Boxenplot")
    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"anisotropy_factor_boxenplot_{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()

    sns.lineplot(data = collection_prec_chr, x = "tp", y = "anisotropy_factor")
    plt.ylim(collection_prec_chr["anisotropy_factor"].min() * 0.9, collection_prec_chr["anisotropy_factor"].max() * 1.1)
    plt.xlabel("Timepoint")
    plt.ylabel("Anisotropy factor (mean(X,Y) spread / Z spread )")
    plt.title(f"Chromosome {chrom} - Lineplot")
    if save_plots:
        plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
        plot_folder.mkdir(parents=True, exist_ok=True)
        filename = Path(plot_folder, f"anisotropy_factor_lineplot_{chrom}.pdf")
        print(f"Saving plots to: {filename}")
        plt.savefig(filename, **plot_opts)
        del plot_folder, filename
    else:
        plt.show()
    plt.close()

    print(f"\n{'-'*54}\n")

del chrom, collection_prec_chr
Plotting anisotropy factor distribution for chromosome chr3...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_boxenplot_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_lineplot_chr3.pdf

------------------------------------------------------

Plotting anisotropy factor distribution for chromosome chr5...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_boxenplot_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_lineplot_chr5.pdf

------------------------------------------------------

Plotting anisotropy factor distribution for chromosome chr8...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_boxenplot_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_lineplot_chr8.pdf

------------------------------------------------------

Plotting anisotropy factor distribution for chromosome chr16...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_boxenplot_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_lineplot_chr16.pdf

------------------------------------------------------

Plotting anisotropy factor distribution for chromosome chrX...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_boxenplot_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_lineplot_chrX.pdf

------------------------------------------------------