2 - Experimental quality assessment
Description: This notebook contains an example on how to check for outliers regarding numerosity.
Removing this outliers is recommended (is the default for the other notebooks).
In this notebooks we first check if the distribution of numerosities across an experiment (normally tied to the same chromosome / genomic region) is homogeneous or some of the traces stand out as outliers.
We do this in two steps.
Kruskal-Wallis test: the aim of this test is to identify if we have possible outliers. This tests only gives this information. To know which are the outliers we perform a second test. This test is an all-vs-all.
Kolmogorov-Smirnov test: the aim of this test is to identify if a particular trace is an outlier in respect to the background distribution of numerosities. This background distribution is build by adding up all the numerosities from all the experiments for that region. This test is a one-vs-all.
The following index has links to the different sections of the notebook. Some sections may have additional indexes to access the subparts.
If you are using VS Code to visualize the notebook, the links may not work. This is due to how VC Code render the notebook. To navigate the notebook use the outline panel. To know more about it, check this link.
Content:
Getting a file per chromosome with the traces that pass the assessment
Violin plots of the different distributions of the traces against the background
Next notebook to run is:
3D Reconstruction Assessment
Library imports and functions
import re
from pathlib import Path
from collections import defaultdict
import pandas as pd
import numpy as np
from cima.parsers.parser_csv import CSVParser
from scipy.stats import ks_2samp, kruskal
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
import seaborn as sns
import matplotlib as mpl
from matplotlib.collections import PolyCollection
## Regular expressions to get the metadata from the filename
regex_patterns = {
"nucleusID": re.compile(r"(?i)nuc(\d+)"), # This searches for "Nuc" or "nuc" followed by digits
"cellID": re.compile(r"(?i)cell(\d+)"), # This searches for "Cell" or "cell" followed by digits
"locationID" : re.compile(r"(?i)loc-?(\d+)"), # This searches for "Loc" or "loc" followed by optional hyphen and digits
"date" : re.compile(r"(\d{4}[-\.]\d{2}[-\.]\d{2})"), # This searches for dates in the format YYYY-MM-DD or YYYY.MM.DD
"homolog" : re.compile(r"_([ABPM01pm])"), ## Added just in case the homolog is in the filename.
"chromosome" : re.compile(r"(?i)_(chr[\d+MXY])") ## Added just in case the chromosome is in the filename.
}
# Set the options for saving the plots
plot_opts = {"dpi": 300, "bbox_inches": 'tight', "transparent": True}
def key_explicit(s):
date, rest = s.split("_cell")
cell_num, suffix = rest.split("_")
return (date, int(cell_num), suffix)
Setting some variables
In the following cell you should set up some variables for the script to run properly. Change the variables at your convenience, and then you can press “Run all” in the Jupyter Notebook.
This notebook assumes that your csv files are in SRX format.
Brief explanation of the different variables:
- data_file: this variable is where the csv files to be assessed are stored.
- work_folder: this variable is where the plots and the tsv files that the script creates will be stored.
- number_of_steps_library: the number of steps in your library.
- starting_step: the step at which analysis should start. Provide a dictionary using the chromosomes/folder names of your datasets and the starting step (usually located in the info file as the 'Round' column).
- kw_test_thresholds: the threshold to consider for the Kruskal-Wallis test.
- save_plots: save the plots into a folder or show them in the notebook.
The notebook will create a folder in the work_folder path called 2_experimental_quality_assessment where the plots/files created will be stored.
For the purposes of this notebook, save_plots is set to False to show the plots. By doing so only plots for chr3 will be shown. If you want to obtain the plots for the other chromosomes, run this notebook with save_plots set to True and check the folder.
data_folder = "/scratch/CIMA_tutorial/data/"
work_folder = "/scratch/CIMA_tutorial/"
number_of_steps_library = 16
starting_step = {"chr3" : 6, "chr5" : 5, "chr8" : 4,"chr16" : 2, "chrX" : 3}
kw_test_thresholds = 0.01
save_plots = True
experimental_quality_assessment_folder = Path(work_folder, "2_experimental_quality_assessment")
experimental_quality_assessment_folder.mkdir(parents=True, exist_ok=True)
nice_feature_names = {
"numerosity": "Numerosity",
"precX": "X-axis precision",
"precY": "Y-axis precision",
"precZ": "Z-axis precision",}
Exporting the data and running the tests
We tests four different things:
- Deviations in numerosity.
- Deviations in precision in the X axis.
- Deviations in precision in the Y axis.
- Deviations in precision in the Z axis.
Then we will do violin plots to show the data of each trace against the “background distribution”. We create this “background distribution” by aggregating the data of the different traces.
kw_test_data = defaultdict(list)
violin_plot_data = {}
ks_test_data = {}
for chrom in starting_step.keys():
violin_plot_chr = defaultdict(list)
data_folder_chr = Path(data_folder, chrom)
print(f"Processing CSV files in folder: {data_folder_chr}")
# Get a list of all CSV files in the specified data folder
csv_files = list(data_folder_chr.glob('*.csv'))
if len(csv_files) == 0:
print("No files found, please input the correct work_folder")
else:
print(f"Total number of files found: {len(csv_files)}")
# Get a list of metadata keys to extract from filenames
metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]
# Initialize an empty list to store the StructuralObject instances
obj_list = []
# Initialize lists to store numerosity data for KS test and violin plot
features_total = defaultdict(list)
# Initialize a dictionary to store numerosities by file for the KS test
features_by_file = defaultdict(lambda: defaultdict(list))
if len(csv_files) > 0:
# Iterate over each file in the list of file names
for count, filein in enumerate(csv_files, start=1):
stem = Path(filein).stem
print(f"Processing file {count}: {stem}", end="\r")
# We mine the metadata from the filename using regex patterns. If not possible, assign default values.
metadata_CIMA = {
key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
}
# Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]
metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])
# Set experimentID as date for consistency
metadata_CIMA["experimentID"] = metadata_CIMA["date"]
# Read the CSV file and create a StructuralObject instance
objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")
objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
(objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
for tp in objin.atomList["timepoint"].unique():
# Take out the timepoint from the atomList if it has less tha 10 points
tp_mask = objin.atomList["timepoint"] == tp
if tp_mask.sum() < 10:
print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
objin.atomList = objin.atomList.loc[~tp_mask].copy()
# We store the numerosities for each timepoint in the file for the KW/KS tests.
for tp in objin.atomList.timepoint.unique():
temp = objin.get_Time_segment(timepoint=tp)
features_total["numerosity"].append(temp.atomList.shape[0])
features_total["precX"].append(np.nanmedian(temp.atomList.xprec))
features_total["precY"].append(np.nanmedian(temp.atomList.yprec))
features_total["precZ"].append(np.nanmedian(temp.atomList.zprec))
features_by_file[stem]["numerosity"].append(temp.atomList.shape[0])
features_by_file[stem]["precX"].append(np.nanmedian(temp.atomList.xprec))
features_by_file[stem]["precY"].append(np.nanmedian(temp.atomList.yprec))
features_by_file[stem]["precZ"].append(np.nanmedian(temp.atomList.zprec))
# Append the object to the list
obj_list.append(objin)
# Nice print to tell the user the number of objects created.
print(f"\nTotal number of objects created: {len(obj_list)}")
# Clean up memory by deleting some of the variables no longer needed
del count, filein, stem, metadata_CIMA, match
if 'default' in locals():
del default
if 'key' in locals():
del key
del csv_files, metadata_keys
ks_test_summary = defaultdict(list)
for objin in obj_list:
repID = f"{objin.metadata['experimentID']}_cell{objin.metadata['nucleusID']}_{objin.metadata['homolog']}"
file_features = features_by_file[Path(objin.metadata["filename"]).stem]
ks_test_summary["filename"].append(Path(objin.metadata["filename"]).stem)
for feature in ["numerosity", "precX", "precY", "precZ"]:
ks_test = ks_2samp(file_features[feature], features_total[feature])
ks_test_summary[f"statistic_{feature}"].append(ks_test.statistic)
ks_test_summary[f"pvalue_{feature}"].append(ks_test.pvalue)
ks_test_summary["n_total"].append(len(features_total["numerosity"]))
violin_plot_chr["filename"].extend([Path(objin.metadata["filename"]).stem] * len(file_features["precX"]))
violin_plot_chr["expID"].extend([objin.metadata["experimentID"]] * len(file_features["precX"]))
violin_plot_chr["repID"].extend([repID] * len(file_features["precX"]))
violin_plot_chr["numerosity"].extend(file_features["numerosity"])
violin_plot_chr["precX"].extend(file_features["precX"])
violin_plot_chr["precY"].extend(file_features["precY"])
violin_plot_chr["precZ"].extend(file_features["precZ"])
violin_plot_data[chrom] = pd.DataFrame(violin_plot_chr)
ks_test_data[chrom] = pd.DataFrame(ks_test_summary)
kw_test_data["chromosome"].append(chrom)
for feature in ["numerosity", "precX", "precY", "precZ"]:
kw_test = kruskal(*[features_by_file[fn][feature] for fn in features_by_file])
kw_test_data[f"statistic_{feature}"].append(kw_test.statistic)
kw_test_data[f"pvalue_{feature}"].append(kw_test.pvalue)
del feature, features_by_file, file_features, ks_test, ks_test_summary, kw_test, repID, temp, tp, violin_plot_chr
del objin, features_total
print(f"\n{'-'*54}\n")
kw_test_data = pd.DataFrame(kw_test_data)
del chrom, data_folder_chr, obj_list
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Processing file 33: 2018-09-04_nuc06_A_chr3
Total number of objects created: 33
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Processing file 35: 2018-09-04_nuc04_A_chr5
Total number of objects created: 35
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Processing file 22: 2018-09-04_nuc01_B_chr8
Total number of objects created: 22
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Processing file 10: 2018-09-04_nuc04_A_chr16
Total number of objects created: 10
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Processing file 5: 2018-09-04_nuc01_A_chrX
Total number of objects created: 5
------------------------------------------------------
Violin plots of the different distributions of the traces against the background
Here we create the mentioned violin plots.
The trace is on the right, colored by date. The background is on the right, colored in grey.
color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}
default_trace_color = "skyblue"
bg_color = "lightgray"
for chrom in violin_plot_data.keys():
if not save_plots and chrom != "chr3":
continue
for feature in ["numerosity", "precX", "precY", "precZ"]:
print(f"Creating violin plot for chromosome {chrom} and feature: {nice_feature_names[feature]}")
violin_plot_chr = violin_plot_data[chrom]
ks_test_chr = ks_test_data[chrom]
numerosity_total_chr = violin_plot_chr[feature].to_numpy()
groups = sorted(violin_plot_chr["repID"].unique(), key=key_explicit)
rows = []
for gr in groups:
cond_vals = violin_plot_chr.loc[violin_plot_chr["repID"] == gr, feature].to_numpy()
rows.append(pd.DataFrame({
"Group": f"{gr}",
"Side": "Trace",
"Value": cond_vals
}))
rows.append(pd.DataFrame({
"Group": f"{gr}",
"Side": "Background",
"Value": numerosity_total_chr
}))
plot_df = pd.concat(rows, ignore_index=True)
plot_df["Pair"] = "Pair"
# For vertical set x to Side and y to Value
# For horizontal set x to Value and y to Side
g = sns.catplot(
data = plot_df,
x = "Side", # Side will put a space in the middle, Pair will take it out
y = "Value",
hue = "Side", # Side will determine the color (Trace vs Background)
col = "Group", # Group will determine the facetting (one facet per file)
col_wrap = 7, # N violins per row
kind = "violin", # Type of estimation to use
split = True, # Split violins to show Trace vs Background in the same violin
inner = "quart", # Show the quartiles inside the violins
linewidth = 1, height = 2.4, aspect = 0.9,
sharey = True, # Share the y-axis across facets to make them comparable
hue_order = ["Trace", "Background"],
palette = {"Trace": default_trace_color, "Background": bg_color},
)
g.set_titles("")
g.set_ylabels(nice_feature_names[feature])
g.set_xlabels("")
col2check = "Group" if "Group" in violin_plot_chr.columns else "repID"
# Draw a rectangle around the ones that failed the KS test at p-value < 0.05
for ax, gr in zip(g.axes.flat, groups):
temp = gr.split("_")
nice_title = f"Cell {temp[1].replace('cell', '')} allele {temp[2]}"
ax.set_title(nice_title, fontsize = 10)
mask = violin_plot_chr[col2check] == gr
filename_temp = violin_plot_chr.loc[mask, "filename"].iat[0]
exp_val = violin_plot_chr.loc[mask, "expID"].iat[0]
trace_color = color_palette.get(exp_val, default_trace_color)
# Find PolyCollection objects (violin halves). seaborn draws one poly per half.
polys = [c for c in ax.collections if isinstance(c, PolyCollection)]
# Apply colors to first two polys if present (Trace, Background)
for poly, color in zip(polys[:2], (trace_color, bg_color)):
try:
poly.set_facecolor(mpl.colors.to_rgba(color))
except Exception:
pass
# Get per-file n_total (if present) and derive a sensible p-threshold
n_total_series = ks_test_chr.loc[ks_test_chr["filename"] == filename_temp, "n_total"]
if not n_total_series.empty and int(n_total_series.iat[0]) > 0:
p_threshold = 0.05 / int(n_total_series.iat[0])
else:
p_threshold = 0.05
failed = ks_test_chr.loc[
(ks_test_chr["filename"] == filename_temp) &
(ks_test_chr[f"pvalue_{feature}"] < p_threshold)
]
if not failed.empty:
ax.text(
0.97, 0.97,
"*",
transform = ax.transAxes,
ha = "right", va = "top",
fontsize = 18, fontweight = "bold", color = "red",
path_effects = [pe.withStroke(linewidth = 3, foreground = "white")],
)
# Remove duplicate legends from facets; keep one
if g._legend is not None:
g._legend.set_title("")
g._legend.set_bbox_to_anchor((1.02, 0.5))
g._legend._loc = 6
plt.tight_layout()
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"violin_plot_{feature}_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
print(f"\n{'-'*54}\n")
del chrom, violin_plot_chr, ks_test_chr, groups, rows, gr, cond_vals, plot_df, g, col2check, feature
del ax, temp, nice_title, mask, filename_temp, exp_val, trace_color, polys, poly, color, n_total_series, p_threshold, failed
del bg_color, default_trace_color, color_palette, numerosity_total_chr
Creating violin plot for chromosome chr3 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_numerosity_chr3.pdf
Creating violin plot for chromosome chr3 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precX_chr3.pdf
Creating violin plot for chromosome chr3 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precY_chr3.pdf
Creating violin plot for chromosome chr3 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/violin_plot_precZ_chr3.pdf
------------------------------------------------------
Creating violin plot for chromosome chr5 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_numerosity_chr5.pdf
Creating violin plot for chromosome chr5 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precX_chr5.pdf
Creating violin plot for chromosome chr5 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precY_chr5.pdf
Creating violin plot for chromosome chr5 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/violin_plot_precZ_chr5.pdf
------------------------------------------------------
Creating violin plot for chromosome chr8 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_numerosity_chr8.pdf
Creating violin plot for chromosome chr8 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precX_chr8.pdf
Creating violin plot for chromosome chr8 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precY_chr8.pdf
Creating violin plot for chromosome chr8 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/violin_plot_precZ_chr8.pdf
------------------------------------------------------
Creating violin plot for chromosome chr16 and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_numerosity_chr16.pdf
Creating violin plot for chromosome chr16 and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precX_chr16.pdf
Creating violin plot for chromosome chr16 and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precY_chr16.pdf
Creating violin plot for chromosome chr16 and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/violin_plot_precZ_chr16.pdf
------------------------------------------------------
Creating violin plot for chromosome chrX and feature: Numerosity
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_numerosity_chrX.pdf
Creating violin plot for chromosome chrX and feature: X-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precX_chrX.pdf
Creating violin plot for chromosome chrX and feature: Y-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precY_chrX.pdf
Creating violin plot for chromosome chrX and feature: Z-axis precision
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/violin_plot_precZ_chrX.pdf
------------------------------------------------------
Plotting X/Y/Z axis precision against numerosity
We can also plot the X/Y/Z axis mean precision of a given chromatin segment against its numerosity.
color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}
for chrom in violin_plot_data.keys():
if not save_plots and chrom != "chr3":
continue
violin_plot_chr = violin_plot_data[chrom]
sns.scatterplot(data=violin_plot_chr, x="precX", y="numerosity", hue="expID", palette=color_palette, legend=False)
plt.xlabel("X-axis precision")
plt.ylabel("Numerosity")
plt.tight_layout()
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"precX_vs_numerosity_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
sns.scatterplot(data=violin_plot_chr, x="precY", y="numerosity", hue="expID", palette=color_palette, legend=False)
plt.xlabel("Y-axis precision")
plt.ylabel("Numerosity")
plt.tight_layout()
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"precY_vs_numerosity_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
sns.scatterplot(data=violin_plot_chr, x="precZ", y="numerosity", hue="expID", palette=color_palette, legend=False)
plt.xlabel("Z-axis precision")
plt.ylabel("Numerosity")
plt.tight_layout()
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"precZ_vs_numerosity_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precX_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precY_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/precZ_vs_numerosity_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precX_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precY_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/precZ_vs_numerosity_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precX_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precY_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/precZ_vs_numerosity_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precX_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precY_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/precZ_vs_numerosity_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precX_vs_numerosity_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precY_vs_numerosity_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/precZ_vs_numerosity_chrX.pdf
Getting a file per chromosome with the traces that pass the assessment
We create a file per each chromosome with the traces that passed the experimental assessment.
This way the user has a reference file to filter the traces in the next steps, without needing to move the files around.
for chrom in ks_test_data.keys():
ks_test_chr = ks_test_data[chrom]
kw_test_chr = kw_test_data[kw_test_data["chromosome"] == chrom]
Path(experimental_quality_assessment_folder, "tsv").mkdir(parents=True, exist_ok=True)
ks_test_chr.to_csv(Path(experimental_quality_assessment_folder, "tsv", f"ks_test_summary_{chrom}.tsv"), index=False, sep="\t")
kw_test_chr.to_csv(Path(experimental_quality_assessment_folder, "tsv", f"kw_test_summary_{chrom}.tsv"), index=False, sep="\t")
for feature in ["numerosity", "precX", "precY", "precZ"]:
print(f"Kruskal-Wallis test for chromosome {chrom} and feature {nice_feature_names[feature]}: p-value={kw_test_chr[f'pvalue_{feature}'].iat[0]:.4e}")
trace_pass = []
for _, row in ks_test_chr.iterrows():
# display(row)
p_value_threshold = 0.05 / row["n_total"] if row["n_total"] > 0 else 0.05
if (row["pvalue_numerosity"] > p_value_threshold) & \
(row["pvalue_precX"] > p_value_threshold) & \
(row["pvalue_precY"] > p_value_threshold) & \
(row["pvalue_precZ"] > p_value_threshold):
trace_pass.append(row["filename"])
# Write the contents of taces_pass to a text file
with open(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), "w") as f:
for item in trace_pass:
f.write(f"{item}\n")
print(f"Traces that \033[1;4mpassed\033[0m the KS test for chromosome {chrom}: {len(trace_pass)} out of {ks_test_chr.shape[0]} total traces.")
print(f"\n{'-'*54}\n")
del chrom, ks_test_chr, kw_test_chr, trace_pass, row, p_value_threshold, f, item, feature
Kruskal-Wallis test for chromosome chr3 and feature Numerosity: p-value=2.3501e-14
Kruskal-Wallis test for chromosome chr3 and feature X-axis precision: p-value=5.0175e-08
Kruskal-Wallis test for chromosome chr3 and feature Y-axis precision: p-value=6.3053e-15
Kruskal-Wallis test for chromosome chr3 and feature Z-axis precision: p-value=8.1420e-09
Traces that passed the KS test for chromosome chr3: 30 out of 33 total traces.
------------------------------------------------------
Kruskal-Wallis test for chromosome chr5 and feature Numerosity: p-value=2.7458e-15
Kruskal-Wallis test for chromosome chr5 and feature X-axis precision: p-value=4.7086e-09
Kruskal-Wallis test for chromosome chr5 and feature Y-axis precision: p-value=3.7215e-13
Kruskal-Wallis test for chromosome chr5 and feature Z-axis precision: p-value=8.6281e-10
Traces that passed the KS test for chromosome chr5: 33 out of 35 total traces.
------------------------------------------------------
Kruskal-Wallis test for chromosome chr8 and feature Numerosity: p-value=2.4712e-05
Kruskal-Wallis test for chromosome chr8 and feature X-axis precision: p-value=3.0020e-04
Kruskal-Wallis test for chromosome chr8 and feature Y-axis precision: p-value=3.5394e-06
Kruskal-Wallis test for chromosome chr8 and feature Z-axis precision: p-value=2.4363e-04
Traces that passed the KS test for chromosome chr8: 22 out of 22 total traces.
------------------------------------------------------
Kruskal-Wallis test for chromosome chr16 and feature Numerosity: p-value=3.9739e-01
Kruskal-Wallis test for chromosome chr16 and feature X-axis precision: p-value=7.2336e-01
Kruskal-Wallis test for chromosome chr16 and feature Y-axis precision: p-value=5.9635e-01
Kruskal-Wallis test for chromosome chr16 and feature Z-axis precision: p-value=3.1732e-01
Traces that passed the KS test for chromosome chr16: 10 out of 10 total traces.
------------------------------------------------------
Kruskal-Wallis test for chromosome chrX and feature Numerosity: p-value=1.9648e-07
Kruskal-Wallis test for chromosome chrX and feature X-axis precision: p-value=4.9749e-01
Kruskal-Wallis test for chromosome chrX and feature Y-axis precision: p-value=2.3816e-01
Kruskal-Wallis test for chromosome chrX and feature Z-axis precision: p-value=3.1606e-01
Traces that passed the KS test for chromosome chrX: 4 out of 5 total traces.
------------------------------------------------------
Plotting efficiency matrices
Here we plot what we call efficiency matrices.
They show the different traces and (one per row) and the different chromatin segments (one per column).
If a chromatin segment is present in a trace, it is shown as a blue square; if absent is shown as a white square.
color_palette = {'2018-01-11':"#a6cee3" ,'2018-06-28':'#1f78b4', '2018-07-10':'#b2df8a', '2018-09-04':'#33a02c'}
print("Creating efficiency matrices and plots for each chromosome...\n")
for chrom in starting_step.keys():
if save_plots == False and chrom != "chr3":
continue
data_folder_chr = Path(data_folder, chrom)
violin_plot_chr = violin_plot_data[chrom]
print(f"Processing CSV files in folder: {data_folder_chr}")
# Get a list of all CSV files in the specified data folder
csv_files = list(data_folder_chr.glob('*.csv'))
if len(csv_files) == 0:
print("No files found, please input the correct work_folder")
else:
print(f"Total number of files found: {len(csv_files)}")
traces_pass = pd.read_csv(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), sep="\t", names=["trace"])
# Get a list of metadata keys to extract from filenames
metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]
file_names = []
if len(csv_files) > 0:
efficiency_matrix_1 = np.zeros((len(csv_files), number_of_steps_library))
efficiency_matrix_2 = np.zeros((len(csv_files), number_of_steps_library))
# Iterate over each file in the list of file names
for count, filein in enumerate(csv_files, start=1):
stem = Path(filein).stem
file_names.append(violin_plot_chr[violin_plot_chr["filename"] == stem]["repID"].unique()[0])
print(f"Processing file {count}: {stem}", end="\r")
# We mine the metadata from the filename using regex patterns. If not possible, assign default values.
metadata_CIMA = {
key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
}
# Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]
metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])
# Set experimentID as date for consistency
metadata_CIMA["experimentID"] = metadata_CIMA["date"]
# Read the CSV file and create a StructuralObject instance
objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")
objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
(objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
for tp in objin.atomList["timepoint"].unique():
# Take out the timepoint from the atomList if it has less tha 10 points
tp_mask = objin.atomList["timepoint"] == tp
if tp_mask.sum() < 10:
print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
objin.atomList = objin.atomList.loc[~tp_mask].copy()
# We store the numerosities for each timepoint in the file for the KW/KS tests.
for tp in objin.atomList.timepoint.unique():
efficiency_matrix_1[count-1, tp-starting_step[chrom]] = 1
if stem in traces_pass["trace"].values:
efficiency_matrix_2[count-1, tp-starting_step[chrom]] = 1
efficiency_matrix = pd.DataFrame(efficiency_matrix_1, columns = [f"{i+1}" for i in range(number_of_steps_library)])
efficiency_matrix.index = file_names
rownames = [temp.split("_")[1] for temp in efficiency_matrix.index]
rownames = [f"Cell {temp.split('_')[1].replace('cell', '')} allelle {temp.split('_')[2]}" for temp in efficiency_matrix.index]
fig_height = max(2, 0.3 * len(efficiency_matrix.index))
g = sns.clustermap(data = efficiency_matrix, figsize=(5, fig_height),
vmin=0, vmax=1, cmap=['#f7f7f7', '#00008B'],
row_colors=[color_palette.get(d.split("_")[0], "#a6cee3") for d in efficiency_matrix.index],
col_cluster=False, row_cluster=False,
yticklabels=rownames,
xticklabels=[f"{i+1}" for i in range(efficiency_matrix.shape[1])],
)
g.ax_heatmap.set_xlabel('Step ID')
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xticklabels(), rotation=0, fontsize=8)
g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=8)
g.ax_row_dendrogram.set_visible(False)
g.ax_col_dendrogram.set_visible(False)
g.ax_cbar.set_visible(False)
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"efficiency_matrix_pre_experimental_assessment_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
efficiency_matrix = pd.DataFrame(efficiency_matrix_2, columns = [f"Step{i+1}" for i in range(number_of_steps_library)])
efficiency_matrix.index = file_names
g = sns.clustermap(data = efficiency_matrix, figsize=(5, fig_height),
vmin=0, vmax=1, cmap=['#f7f7f7', '#00008B'],
row_colors=[color_palette.get(d.split("_")[0], "#a6cee3") for d in efficiency_matrix.index],
col_cluster=False, row_cluster=False,
yticklabels=rownames,
xticklabels=[f"{i+1}" for i in range(efficiency_matrix.shape[1])],
)
g.ax_heatmap.set_xlabel('Step ID')
g.ax_heatmap.set_xticklabels(g.ax_heatmap.get_xticklabels(), rotation=0, fontsize=8)
g.ax_heatmap.set_yticklabels(g.ax_heatmap.get_yticklabels(), rotation=0, fontsize=8)
g.ax_row_dendrogram.set_visible(False)
g.ax_col_dendrogram.set_visible(False)
g.ax_cbar.set_visible(False)
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"efficiency_matrix_post_experimental_assessment_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
print(f"\n{'-'*54}\n")
del chrom, data_folder_chr, csv_files, traces_pass, metadata_keys, efficiency_matrix_1, efficiency_matrix_2
del count, filein, stem, metadata_CIMA, match, objin, tp, efficiency_matrix, g, fig_height, file_names, rownames
del violin_plot_chr
Creating efficiency matrices and plots for each chromosome...
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/efficiency_matrix_pre_experimental_assessment_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/efficiency_matrix_post_experimental_assessment_chr3.pdf
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/efficiency_matrix_pre_experimental_assessment_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/efficiency_matrix_post_experimental_assessment_chr5.pdf
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/efficiency_matrix_pre_experimental_assessment_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/efficiency_matrix_post_experimental_assessment_chr8.pdf
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/efficiency_matrix_pre_experimental_assessment_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/efficiency_matrix_post_experimental_assessment_chr16.pdf
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/efficiency_matrix_pre_experimental_assessment_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/efficiency_matrix_post_experimental_assessment_chrX.pdf
------------------------------------------------------
Anisotropy factor
Here we calculate and plot the anisotropy factor.
The anisotrpy factor is a measure on how different is the error of the localization in the different axis.
It is calculated following this formula:
In the plots, lines are shown for 0.2, 0.3 and 0.5 to give the user a visual reference.
collection_prec = defaultdict(list)
for chrom in starting_step.keys():
data_folder_chr = Path(data_folder, chrom)
print(f"Processing CSV files in folder: {data_folder_chr}")
# Get a list of all CSV files in the specified data folder
csv_files = list(data_folder_chr.glob('*.csv'))
if len(csv_files) == 0:
print("No files found, please input the correct work_folder")
else:
print(f"Total number of files found: {len(csv_files)}")
traces_pass = pd.read_csv(Path(experimental_quality_assessment_folder, f"experimental_assessment_pass_{chrom}.txt"), sep="\t", names=["trace"])
# Get a list of metadata keys to extract from filenames
metadata_keys = ["nucleusID", "cellID", "locationID", "date", "homolog", "chromosome"]
obj_list = []
if len(csv_files) > 0:
# Iterate over each file in the list of file names
for count, filein in enumerate(csv_files, start=1):
stem = Path(filein).stem
if stem not in traces_pass["trace"].values:
print(f"Skipping file {count}: {stem} because it did not pass the experimental quality assessment.")
continue
print(f"Processing file {count}: {stem}", end="\r")
# We mine the metadata from the filename using regex patterns. If not possible, assign default values.
metadata_CIMA = {
key: (match.group(1) if (match := regex_patterns[key].search(stem)) else default)
for key, default in zip(metadata_keys, [count, count, count, stem, "A", "chrN"])
}
# Adjust the metadata for nucleus and cell (if nucleus is count, but cell is something, use the cell match in nucleus)
if "cell" in metadata_CIMA and metadata_CIMA["nucleusID"] == count:
metadata_CIMA["nucleusID"] = metadata_CIMA["cell"]
metadata_CIMA["nucleusID"] = int(metadata_CIMA["nucleusID"])
metadata_CIMA["cellID"] = int(metadata_CIMA["cellID"])
metadata_CIMA["locationID"] = int(metadata_CIMA["locationID"])
# Set experimentID as date for consistency
metadata_CIMA["experimentID"] = metadata_CIMA["date"]
# Read the CSV file and create a StructuralObject instance
objin = CSVParser.read_CSV_file(filein.as_posix(), metadata = metadata_CIMA, content_type = "srx")
objin.atomList = objin.atomList[(objin.atomList['timepoint'] >= starting_step[chrom] - 1) &
(objin.atomList['timepoint'] < (starting_step[chrom] + number_of_steps_library - 1))].copy()
for tp in objin.atomList["timepoint"].unique():
# Take out the timepoint from the atomList if it has less tha 10 points
tp_mask = objin.atomList["timepoint"] == tp
if tp_mask.sum() < 10:
print(f"Excluding timepoint {tp} in experimentID: {metadata_CIMA['experimentID']}, nucleusID: {metadata_CIMA['nucleusID']}, homolog: {metadata_CIMA['homolog']} because it has less than 10 points.")
objin.atomList = objin.atomList.loc[~tp_mask].copy()
obj_list.append(objin)
for obj in obj_list:
for tp in obj.atomList["timepoint"].unique():
# Take out the timepoint from the atomList if it has less tha 10 points
tp_mask = obj.atomList["timepoint"] == tp
temp = obj.atomList.loc[tp_mask].copy()
x_spread = max(temp["xprec"].values) - min(temp["xprec"].values)
y_spread = max(temp["yprec"].values) - min(temp["yprec"].values)
z_spread = max(temp["zprec"].values) - min(temp["zprec"].values)
anisotropy_factor = np.nanmean([x_spread, y_spread]) / z_spread
collection_prec["chromosome"].append(chrom)
collection_prec["file"].append(obj.metadata["experimentID"] + "_" + str(obj.metadata["nucleusID"]) + "_" + obj.metadata["homolog"])
collection_prec["tp"].append(tp)
collection_prec["spread_x"].append(x_spread)
collection_prec["spread_y"].append(y_spread)
collection_prec["spread_z"].append(z_spread)
collection_prec["anisotropy_factor"].append(anisotropy_factor)
print(f"\n{'-'*54}\n")
collection_prec = pd.DataFrame(collection_prec)
del chrom, data_folder_chr, csv_files, count, filein, traces_pass, metadata_keys, stem, metadata_CIMA, match, objin
del tp, tp_mask, obj_list, obj, temp, x_spread, y_spread, z_spread, anisotropy_factor
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr3
Total number of files found: 33
Skipping file 2: 2018-07-10_nuc04_A_chr3 because it did not pass the experimental quality assessment.
Skipping file 5: 2018-07-10_nuc03_A_chr3 because it did not pass the experimental quality assessment.
Excluding timepoint 19 in experimentID: 2018-09-04, nucleusID: 7, homolog: A because it has less than 10 points.
Skipping file 26: 2018-09-04_nuc01_A_chr3 because it did not pass the experimental quality assessment.
Processing file 33: 2018-09-04_nuc06_A_chr3
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr5
Total number of files found: 35
Skipping file 2: 2018-07-10_nuc04_A_chr5 because it did not pass the experimental quality assessment.
Skipping file 34: 2018-09-04_nuc01_A_chr5 because it did not pass the experimental quality assessment.
Processing file 35: 2018-09-04_nuc04_A_chr5
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr8
Total number of files found: 22
Processing file 22: 2018-09-04_nuc01_B_chr8
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chr16
Total number of files found: 10
Processing file 10: 2018-09-04_nuc04_A_chr16
------------------------------------------------------
Processing CSV files in folder: /scratch/CIMA_tutorial/data/chrX
Total number of files found: 5
Skipping file 2: 2018-09-04_nuc04_A_chrX because it did not pass the experimental quality assessment.
Processing file 5: 2018-09-04_nuc01_A_chrX
------------------------------------------------------
for chrom in starting_step.keys():
temp = collection_prec[collection_prec["chromosome"] == chrom]
print(f"For {chrom} the min anisotropy factor is {temp.anisotropy_factor.min()}")
print(f"For {chrom} the max anisotropy factor is {temp.anisotropy_factor.max()}\n")
del chrom, temp
For chr3 the min anisotropy factor is 0.22194411423835986
For chr3 the max anisotropy factor is 0.8102558862289374
For chr5 the min anisotropy factor is 0.18651602294796202
For chr5 the max anisotropy factor is 0.7496463612103765
For chr8 the min anisotropy factor is 0.17096927338239262
For chr8 the max anisotropy factor is 0.5108808891327506
For chr16 the min anisotropy factor is 0.1550711582050499
For chr16 the max anisotropy factor is 0.5208693780499624
For chrX the min anisotropy factor is 0.17967870020756443
For chrX the max anisotropy factor is 0.4708960725633753
for chrom in starting_step.keys():
if not save_plots and chrom != "chr3":
continue
print(f"Plotting anisotropy factor per trace for chromosome {chrom}...")
collection_prec_chr = (
collection_prec.loc[collection_prec["chromosome"] == chrom, ["file", "tp", "anisotropy_factor"]]
.sort_values(["file", "tp"])
.copy()
)
g = sns.relplot(
data=collection_prec_chr,
x="tp",
y="anisotropy_factor",
col="file",
col_wrap=6, # Number of plots per row, adjust as needed
kind="line",
marker="o",
linewidth=1.2,
height=2.5,
aspect=1.2,
facet_kws={"sharey": True, "sharex": True},
# ylim = (0, 1)
)
g.set_axis_labels("Timepoint (tp)", "Anisotropy factor")
g.set_titles("{col_name}")
g.set(ylim=(0,1))
for ax in g.axes.flat:
x_min = int(collection_prec_chr["tp"].min())
x_max = int(collection_prec_chr["tp"].max())
ax.set_xticks(range(x_min, x_max + 1))
ax.set_xticklabels(range(1, number_of_steps_library + 1), rotation=0)
ax.axhline(0.25, color = "red", alpha = 0.25, zorder = 0)
ax.axhline(0.3, color = "green", alpha = 0.25, zorder = 0)
ax.axhline(0.5, color = "black", alpha = 0.25, zorder = 0)
plt.tight_layout()
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"anisotropy_factor_per_trace{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
print(f"\n{'-'*54}\n")
del chrom, g, ax, collection_prec_chr, x_max, x_min
Plotting anisotropy factor per trace for chromosome chr3...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_per_tracechr3.pdf
------------------------------------------------------
Plotting anisotropy factor per trace for chromosome chr5...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_per_tracechr5.pdf
------------------------------------------------------
Plotting anisotropy factor per trace for chromosome chr8...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_per_tracechr8.pdf
------------------------------------------------------
Plotting anisotropy factor per trace for chromosome chr16...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_per_tracechr16.pdf
------------------------------------------------------
Plotting anisotropy factor per trace for chromosome chrX...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_per_tracechrX.pdf
------------------------------------------------------
for chrom in collection_prec["chromosome"].unique():
if not save_plots and chrom != "chr3":
continue
print(f"Plotting anisotropy factor distribution for chromosome {chrom}...")
collection_prec_chr = collection_prec[collection_prec["chromosome"] == chrom]
sns.boxenplot(data = collection_prec_chr, x = "tp", y = "anisotropy_factor")
plt.ylim(min(collection_prec_chr["anisotropy_factor"].min() * 0.9, 0), collection_prec_chr["anisotropy_factor"].max() * 1.1)
plt.xlabel("Timepoint")
plt.ylabel("Anisotropy factor (mean(X,Y) spread / Z spread )")
plt.axhline(0.25, color = "red")
plt.axhline(0.20, color = "blue")
plt.title(f"Chromosome {chrom} - Boxenplot")
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"anisotropy_factor_boxenplot_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
sns.lineplot(data = collection_prec_chr, x = "tp", y = "anisotropy_factor")
plt.ylim(collection_prec_chr["anisotropy_factor"].min() * 0.9, collection_prec_chr["anisotropy_factor"].max() * 1.1)
plt.xlabel("Timepoint")
plt.ylabel("Anisotropy factor (mean(X,Y) spread / Z spread )")
plt.title(f"Chromosome {chrom} - Lineplot")
if save_plots:
plot_folder = Path(experimental_quality_assessment_folder, "plots", chrom)
plot_folder.mkdir(parents=True, exist_ok=True)
filename = Path(plot_folder, f"anisotropy_factor_lineplot_{chrom}.pdf")
print(f"Saving plots to: {filename}")
plt.savefig(filename, **plot_opts)
del plot_folder, filename
else:
plt.show()
plt.close()
print(f"\n{'-'*54}\n")
del chrom, collection_prec_chr
Plotting anisotropy factor distribution for chromosome chr3...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_boxenplot_chr3.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr3/anisotropy_factor_lineplot_chr3.pdf
------------------------------------------------------
Plotting anisotropy factor distribution for chromosome chr5...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_boxenplot_chr5.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr5/anisotropy_factor_lineplot_chr5.pdf
------------------------------------------------------
Plotting anisotropy factor distribution for chromosome chr8...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_boxenplot_chr8.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr8/anisotropy_factor_lineplot_chr8.pdf
------------------------------------------------------
Plotting anisotropy factor distribution for chromosome chr16...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_boxenplot_chr16.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chr16/anisotropy_factor_lineplot_chr16.pdf
------------------------------------------------------
Plotting anisotropy factor distribution for chromosome chrX...
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_boxenplot_chrX.pdf
Saving plots to: /scratch/CIMA_tutorial/2_experimental_quality_assessment/plots/chrX/anisotropy_factor_lineplot_chrX.pdf
------------------------------------------------------