cima.detection package

Submodules

cima.detection.beads_identification module

class cima.detection.beads_identification.BeadsFinder

Bases: object

fit(seg: Segment, n_jobs: int = 1, radius: int = 40, min_persistence: float = 0.75, eps: int = 30, min_samples: int = 100, plot_beads: bool = False, plot_path: str = '', plot_filename: str = '', plot_format: str = 'pdf')

Function to find the beads in a given Segment, label them and add that info to the Segment under graphs_df. Also adds a debeaded segment for ease of use.

Parameters:

seg (Segment) – The Segment object to find beads in.
n_jobs (int, optional) – The number of jobs to run in parallel, by default 1
radius (int, optional) – The radius to use for finding neighbors, by default 40
min_persistence (float, optional) – The minimum persistence ratio to classify a localization as bead-originated, by default 0.75
eps (int, optional) – The maximum distance between two samples for them to be considered as in the same neighborhood in the DBSCAN clustering, by default 30
min_samples (int, optional) – The number of samples in a neighborhood for a point to be considered as a core point in the DBSCAN clustering, by default 100
plot_beads (bool, optional) – Whether to plot the beads, by default False
plot_path (str, optional) – The path to save the plot, by default “” (which will save it in the same directory as the segment’s filename). DO NOT INCLUDE THE FILENAME
plot_filename (str, optional) – The filename to save the plot, by default “” (which will generate a filename based on the segment’s filename). DO NOT INCLUDE THE FORMAT
plot_format (str, optional) – The format to save the plot, by default “pdf”

Returns:

self – The fitted BeadsFinder instance.

Return type:

BeadsFinder

show(path2save: str | None = None, n_jobs: int = 1, radius: int = 40)

Displays a representation of the connection matrices, first by considering all localizations and then considering one localization per frame. Then it also displays the persistence of each localization.

Parameters:

path2save (Optional[str], optional) – If provided, saves the plots to the specified path instead of displaying them, by default None
n_jobs (int, optional) – The number of jobs to run in parallel for the nearest neighbors calculation, by default 1
radius (int, optional) – The radius to use for finding neighbors in the nearest neighbors calculation, by default 40

cima.detection.clusters module

cima.detection.clusters.DBscan(StructureObj: Segment, epsilon2test=0, minpoints=100, n_jobs=8)

The function takes a StructureObj object and performs DBSCAN clustering with user-defined parameters.

Parameters:

StructureObj (Segment) – Structure Object to cluster
epsi (int) – Epsilon value for DBSCAN
minpoints (int) – Minimum points for DBSCAN
n_jobs (int) – Number of parallel jobs to run during the DBSCAN computation

Returns:

Clustered Structure Object with clusterID assigned.

Return type:

Segment

class cima.detection.clusters.DBscan_grid_search_stable

Bases: object

Selects the pair of min_points and eps parameters which give the clustering with the highest stability, meaning that it changes less when changing the parameters. In the process computes DBSCAN labels for all the combinations of specified parameters. Also computes the grids of ari scores among neighborhood of labels, and the variation of those ari scores.

copy(): Return an identical copy of the object.

fit(segment: SegmentXYZ, min_pts_param: tuple[int, int, int] = (10, 300, 10), eps_param: tuple[int, int, int] = (0, 0, 0), consider_noise: bool = True, n_neighbors: int = 2, conv: bool = False, verbose: bool = False, n_jobs: int = 8, downsampling_rate: float = 1.0, limit_density: bool = False, random_seed: int = 0)

Computes labels, ARI and variance of ARI grids. Computes the rank of parameter combinations according to stability. Saves best labels and a copy of segment with them as clusterIDs.

Parameters:

segment (SegmentXYZ) – SegmentXYZ containing the coodinates on which to run the clustering
min_pts_param (tuple[int, int, int], optional) – Tuple of the form (min, max, step) defining the range of minimum number of points to consider in DBSCAN
eps_param (tuple[int, int, int], optional) – Tuple of the form (min, max, step) defining the range of eps values to provide to DBSCAN
consider_noise (bool, optional) – This may be useful when the majority of points are classified as noise, because it concentrates the comparison on the signal part.
n_neighbors (int, optional) – Number of neighbors to consider when computing the rank of stability, by default 2.
conv (bool, optional) – Whether to apply a blurring on the grid before computing the rank of stability, by default False.
n_jobs (int, optional) – How many cpus to use, by default 8.
downsampling_rate (float, optional) – Rate of subselection of points on which to run DBSCAN. Allows to decrease computation time. When the rate is <1 the min_pts_range is adjusted so that the pattern on the grid is very similar to that that would be obtained with rate = 1. The results is less comparable as the rate is decreased towards 0. By default 1.0 (no downsampling)
limit_density (bool, optional) – Limit the search for stability among those parameters defining a density threshold between the 25th and 75th density percentile of coordinates. By default False.
random_seed (int, optional) – Used in the random selection of downsampled coords, to make it reproducible. By default 0.

mergeOtherDBSCANGrid(other_scanner: DBscan_grid_search_stable, conv: bool = False, consider_noise: bool = True, n_neighbors: int = 2, verbose: bool = False)

Integrates the grid of precomputed labels contained in other_scanner into this object. Then computes everything else from them. Useful when you want to extend the grid without having to recompute the labels that you already have.

Parameters:

other_scanner (DBscan_grid_search_stable) – The scanner (already fitted) to integrate into this one
conv (bool, optional) – Whether to apply a blurring on the grid before computing the rank of stability, by default False
consider_noise (bool, optional) – This may be useful when the majority of points are classified as noise, because it concentrates the comparison on the signal part, by default True
n_neighbors (int, optional) – The number of neighbors to consider for the ARI calculation, by default 2
verbose (bool, optional) – Whether to print progress messages, by default False

Returns:

The updated DBscan_grid_search_stable object.

Return type:

self

Raises:

ValueError – If self is not fitted.
ValueError – If the other_scanner is not fitted.
ValueError – If the segment coordinates don’t match.

plotAriGrid(): Plots the grid of ARI values. It will put a red dot on the best combination of parameters.

plotAriVarGrid(): Plots the grid of ARI variation values. It will put a red dot on the best combination of parameters.

saveLog(outfile): Saves the ARI and ARI variation values in a single csv file, in decreasing order of stability

class cima.detection.clusters.HDBSCAN_stable

Bases: object

Selects the value of min_cluster_size which gives the clustering with the highest stability, meaning that it changes less when changing the parameter. In the process computes HDBSCAN labels for all the specified parameters. Also computes the grids of ARI scores among neighborhood of labels, and the variation of those ARI scores.

ari_median

1D array containing the median ARI values for each min_cluster_size

Type:: np.ndarray

ari_var

1D array containing the ARI variance values for each min_cluster_size

Type:: np.ndarray

all_labels

2D array containing the clustering labels for each min_cluster_size

Type:: np.ndarray

min_cluster_size_range

List of min_cluster_size values used

Type:: list[int]

segment

SegmentXYZ containing the coordinates on which the clustering was run

Type:: SegmentXYZ

ordered_params_df

DataFrame containing the ordered min_cluster_size, ARI and ARI variation values

Type:: pd.DataFrame

best_mcs

The best min_cluster_size value

Type:: int

best_ind

The index of the best min_cluster_size value in min_cluster_size_range

Type:: int

labels_

1D array containing the clustering labels for the best min_cluster_size

Type:: np.ndarray

optimal_segment

Copy of segment with clusterIDs set to labels_.

Type:: SegmentXYZ

copy(): Returns a copy of this object

fit(segment: SegmentXYZ, min_cluster_size_range: list[int] = [], n_neighbors=2, n_jobs=8, conv=False, verbose=False, consider_noise=True)

Computes labels, ARI and ARI_var grids. Computes the rank of parameter combinations according to stability. Saves best labels and a copy of segment with them as clusterIDs.

Parameters:

segment (SegmentXYZ) – SegmentXYZ containing the coordinates on which to run the clustering
min_cluster_size_range (list[int], optional) – List of min_samples values to provide to HDBSCAN
n_neighbors (int, optional)

mergeOtherHDBSCANGrid(other_scanner, conv: bool = False, consider_noise: bool = True, n_neighbors: int = 2, verbose: bool = False)

Integrates the grid of precomputed labels contained in other_scanner into this object. Then computes everything else from them. Useful when you want to extend the grid without having to recompute the labels that you already have.

Parameters:

other_scanner (HDBSCAN_stable) – The scanner (already fitted) to integrate into this one
conv (bool, optional) – Whether to apply convolution to the ARI scores, by default False
consider_noise (bool, optional) – Whether to consider noise points in the ARI calculation, by default True
n_neighbors (int, optional) – The number of neighbors to use for the ARI computation, by default 2
verbose (bool, optional) – Whether to print verbose output, by default False

plotAriGrid(): Plots the grid of ari values

plotAriVarGrid(): Plots the grid of ARI var values

saveLog(outfile)

Saves the ari and ari variation values in a single csv file, in decreasing order of stability

Parameters:: outfile (str) – Path to the output CSV file

class cima.detection.clusters.ThresholdClusterFilter

Bases: object

Class that filters clusters in a Segment based on computed features using specified thresholds.

features_to_use

which features were used for the filtering

Type:: list

feats_df

dataframe with the computed features for all clusters

Type:: pd.DataFrame

limits

the limits used for the filtering

Type:: dict

where_retain

boolean array indicating which clusters were retained

Type:: np.ndarray

retained_clusters_ids

list of the ids of the retained clusters

Type:: list

transformed_segment

the segment after filtering

Type:: Segment

new_labels

array of the new cluster labels for the original segment

Type:: np.ndarray

fit(segment: SegmentXYZ, features_to_use: list[str] = ['radius_of_gyration', 'volume', 'numerosity'], method: str = 'proportional', threshold: float = 0.2, custom_limits: dict = {}, n_jobs: int = 1, verbose: bool = False) → None

Find clusters to retain based on their features and the specified thresholds.

Parameters:

segment (SegmentXYZ) – Segment to filter
features_to_use (list) – which features to use. Any subset of radius_of_gyration, volume, numerosity
method (str) – ‘proportional’ or ‘percentile’ or ‘custom’
threshold (float) – threshold to use for the filtering. If method is ‘proportional’, it is the fraction of the maximum value of each feature to use as limit. If method is ‘percentile’, it is the percentile to use as limit (0-100)
custom_limits (dict) – in case of ‘custom’ method, which limits should be used for the features. It is expected to be a dictionary with feature names as keys and floats as values
n_jobs (int) – how many cpus to use for the computation

plot(label_clusters: bool = False, show_all: bool = False)

Display plots representing the clusters features and which have been filtered out.

Parameters:

label_clusters (bool) – whether to label the clusters on the plots, by default False
show_all (bool) – whether to show all clusters or just the retained ones, by default False

writeMRCs(mrc_path: str, filename_pattern: str = 'cluster', verbose: bool = False)

Write MRC files for every cluster found in the segment

Parameters:

mrc_path (str) – Path to the directory where to save the MRC files
filename_pattern (str, optional) – Pattern for naming the files, by default ‘cluster’
verbose (bool, optional) – Whether to print progress messages, by default False

cima.detection.clusters.getPointwiseDensity(coords: ndarray, radius: float, n_jobs: int = 1, verbose: bool = False) → ndarray

Estimates the pointwise density of a set of points.

Parameters:

coords (np.ndarray) – A 2-dimensional numpy array with each row representing the location of a point
radius (float) – Radius inside of which to count neighbors
n_jobs (int) – How many CPUs have to be used for the computation

Returns:

A numpy array representing the density of each point in coords, estimated as the count of neighbors inside the specified radius divided by the volume of the corresponding sphere.

Return type:

np.ndarray

cima.detection.clusters.search_epsilon(xyz: ndarray, n_neighbors: int = 0, show: bool = True, n_cpus: int = 1) → int

Given point coordinates, this function estimates the optimal value of the epsilon parameter for the DBSCAN clustering algorithm. Run for a varied number of neighbors to find the optimal epsilon value.

Parameters:

xyz (np.ndarray) – Array of shape (n_samples, 3) containing the 3D coordinates of the points.
n_neighbors (str, optional) – The number of neighbors to use for the nearest neighbors search, by default 0. If 0, it is set to 2 * len(xyz[0]) - 1.
show (bool, optional) – If True, the function plots a graph of the sorted distances and the estimated elbow point, by default True
n_cpus (int, optional) – The number of jobs to run in parallel, by default 1

Returns:

The estimated optimal value of the epsilon parameter for the DBSCAN clustering algorithm.

Return type:

int