cima.detection package
Submodules
cima.detection.beads_identification module
- class cima.detection.beads_identification.BeadsFinder
Bases:
object- fit(seg: Segment, n_jobs: int = 1, radius: int = 40, min_persistence: float = 0.75, eps: int = 30, min_samples: int = 100, plot_beads: bool = False, plot_path: str = '', plot_filename: str = '', plot_format: str = 'pdf')
Function to find the beads in a given Segment, label them and add that info to the Segment under graphs_df. Also adds a debeaded segment for ease of use.
- Parameters:
seg (Segment) – The Segment object to find beads in.
n_jobs (int, optional) – The number of jobs to run in parallel, by default 1
radius (int, optional) – The radius to use for finding neighbors, by default 40
min_persistence (float, optional) – The minimum persistence ratio to classify a localization as bead-originated, by default 0.75
eps (int, optional) – The maximum distance between two samples for them to be considered as in the same neighborhood in the DBSCAN clustering, by default 30
min_samples (int, optional) – The number of samples in a neighborhood for a point to be considered as a core point in the DBSCAN clustering, by default 100
plot_beads (bool, optional) – Whether to plot the beads, by default False
plot_path (str, optional) – The path to save the plot, by default “” (which will save it in the same directory as the segment’s filename). DO NOT INCLUDE THE FILENAME
plot_filename (str, optional) – The filename to save the plot, by default “” (which will generate a filename based on the segment’s filename). DO NOT INCLUDE THE FORMAT
plot_format (str, optional) – The format to save the plot, by default “pdf”
- Returns:
self – The fitted BeadsFinder instance.
- Return type:
- show(path2save: str | None = None, n_jobs: int = 1, radius: int = 40)
Displays a representation of the connection matrices, first by considering all localizations and then considering one localization per frame. Then it also displays the persistence of each localization.
- Parameters:
path2save (Optional[str], optional) – If provided, saves the plots to the specified path instead of displaying them, by default None
n_jobs (int, optional) – The number of jobs to run in parallel for the nearest neighbors calculation, by default 1
radius (int, optional) – The radius to use for finding neighbors in the nearest neighbors calculation, by default 40
cima.detection.clusters module
- cima.detection.clusters.DBscan(StructureObj: Segment, epsilon2test=0, minpoints=100, n_jobs=8)
The function takes a StructureObj object and performs DBSCAN clustering with user-defined parameters.
- Parameters:
StructureObj (Segment) – Structure Object to cluster
epsi (int) – Epsilon value for DBSCAN
minpoints (int) – Minimum points for DBSCAN
n_jobs (int) – Number of parallel jobs to run during the DBSCAN computation
- Returns:
Clustered Structure Object with clusterID assigned.
- Return type:
- class cima.detection.clusters.DBscan_grid_search_stable
Bases:
objectSelects the pair of min_points and eps parameters which give the clustering with the highest stability, meaning that it changes less when changing the parameters. In the process computes DBSCAN labels for all the combinations of specified parameters. Also computes the grids of ari scores among neighborhood of labels, and the variation of those ari scores.
- copy()
Return an identical copy of the object.
- fit(segment: SegmentXYZ, min_pts_param: tuple[int, int, int] = (10, 300, 10), eps_param: tuple[int, int, int] = (0, 0, 0), consider_noise: bool = True, n_neighbors: int = 2, conv: bool = False, verbose: bool = False, n_jobs: int = 8, downsampling_rate: float = 1.0, limit_density: bool = False, random_seed: int = 0)
Computes labels, ARI and variance of ARI grids. Computes the rank of parameter combinations according to stability. Saves best labels and a copy of segment with them as clusterIDs.
- Parameters:
segment (SegmentXYZ) – SegmentXYZ containing the coodinates on which to run the clustering
min_pts_param (tuple[int, int, int], optional) – Tuple of the form (min, max, step) defining the range of minimum number of points to consider in DBSCAN
eps_param (tuple[int, int, int], optional) – Tuple of the form (min, max, step) defining the range of eps values to provide to DBSCAN
consider_noise (bool, optional) – This may be useful when the majority of points are classified as noise, because it concentrates the comparison on the signal part.
n_neighbors (int, optional) – Number of neighbors to consider when computing the rank of stability, by default 2.
conv (bool, optional) – Whether to apply a blurring on the grid before computing the rank of stability, by default False.
n_jobs (int, optional) – How many cpus to use, by default 8.
downsampling_rate (float, optional) – Rate of subselection of points on which to run DBSCAN. Allows to decrease computation time. When the rate is <1 the min_pts_range is adjusted so that the pattern on the grid is very similar to that that would be obtained with rate = 1. The results is less comparable as the rate is decreased towards 0. By default 1.0 (no downsampling)
limit_density (bool, optional) – Limit the search for stability among those parameters defining a density threshold between the 25th and 75th density percentile of coordinates. By default False.
random_seed (int, optional) – Used in the random selection of downsampled coords, to make it reproducible. By default 0.
- mergeOtherDBSCANGrid(other_scanner: DBscan_grid_search_stable, conv: bool = False, consider_noise: bool = True, n_neighbors: int = 2, verbose: bool = False)
Integrates the grid of precomputed labels contained in other_scanner into this object. Then computes everything else from them. Useful when you want to extend the grid without having to recompute the labels that you already have.
- Parameters:
other_scanner (DBscan_grid_search_stable) – The scanner (already fitted) to integrate into this one
conv (bool, optional) – Whether to apply a blurring on the grid before computing the rank of stability, by default False
consider_noise (bool, optional) – This may be useful when the majority of points are classified as noise, because it concentrates the comparison on the signal part, by default True
n_neighbors (int, optional) – The number of neighbors to consider for the ARI calculation, by default 2
verbose (bool, optional) – Whether to print progress messages, by default False
- Returns:
The updated DBscan_grid_search_stable object.
- Return type:
self
- Raises:
ValueError – If self is not fitted.
ValueError – If the other_scanner is not fitted.
ValueError – If the segment coordinates don’t match.
- plotAriGrid()
Plots the grid of ARI values. It will put a red dot on the best combination of parameters.
- plotAriVarGrid()
Plots the grid of ARI variation values. It will put a red dot on the best combination of parameters.
- saveLog(outfile)
Saves the ARI and ARI variation values in a single csv file, in decreasing order of stability
- class cima.detection.clusters.HDBSCAN_stable
Bases:
objectSelects the value of min_cluster_size which gives the clustering with the highest stability, meaning that it changes less when changing the parameter. In the process computes HDBSCAN labels for all the specified parameters. Also computes the grids of ARI scores among neighborhood of labels, and the variation of those ARI scores.
- ari_median
1D array containing the median ARI values for each min_cluster_size
- Type:
np.ndarray
- ari_var
1D array containing the ARI variance values for each min_cluster_size
- Type:
np.ndarray
- all_labels
2D array containing the clustering labels for each min_cluster_size
- Type:
np.ndarray
- min_cluster_size_range
List of min_cluster_size values used
- Type:
list[int]
- segment
SegmentXYZ containing the coordinates on which the clustering was run
- Type:
- ordered_params_df
DataFrame containing the ordered min_cluster_size, ARI and ARI variation values
- Type:
pd.DataFrame
- best_mcs
The best min_cluster_size value
- Type:
int
- best_ind
The index of the best min_cluster_size value in min_cluster_size_range
- Type:
int
- labels_
1D array containing the clustering labels for the best min_cluster_size
- Type:
np.ndarray
- copy()
Returns a copy of this object
- fit(segment: SegmentXYZ, min_cluster_size_range: list[int] = [], n_neighbors=2, n_jobs=8, conv=False, verbose=False, consider_noise=True)
Computes labels, ARI and ARI_var grids. Computes the rank of parameter combinations according to stability. Saves best labels and a copy of segment with them as clusterIDs.
- Parameters:
segment (SegmentXYZ) – SegmentXYZ containing the coordinates on which to run the clustering
min_cluster_size_range (list[int], optional) – List of min_samples values to provide to HDBSCAN
n_neighbors (int, optional)
- mergeOtherHDBSCANGrid(other_scanner, conv: bool = False, consider_noise: bool = True, n_neighbors: int = 2, verbose: bool = False)
Integrates the grid of precomputed labels contained in other_scanner into this object. Then computes everything else from them. Useful when you want to extend the grid without having to recompute the labels that you already have.
- Parameters:
other_scanner (HDBSCAN_stable) – The scanner (already fitted) to integrate into this one
conv (bool, optional) – Whether to apply convolution to the ARI scores, by default False
consider_noise (bool, optional) – Whether to consider noise points in the ARI calculation, by default True
n_neighbors (int, optional) – The number of neighbors to use for the ARI computation, by default 2
verbose (bool, optional) – Whether to print verbose output, by default False
- plotAriGrid()
Plots the grid of ari values
- plotAriVarGrid()
Plots the grid of ARI var values
- saveLog(outfile)
Saves the ari and ari variation values in a single csv file, in decreasing order of stability
- Parameters:
outfile (str) – Path to the output CSV file
- class cima.detection.clusters.ThresholdClusterFilter
Bases:
objectClass that filters clusters in a Segment based on computed features using specified thresholds.
- features_to_use
which features were used for the filtering
- Type:
list
- feats_df
dataframe with the computed features for all clusters
- Type:
pd.DataFrame
- limits
the limits used for the filtering
- Type:
dict
- where_retain
boolean array indicating which clusters were retained
- Type:
np.ndarray
- retained_clusters_ids
list of the ids of the retained clusters
- Type:
list
- new_labels
array of the new cluster labels for the original segment
- Type:
np.ndarray
- fit(segment: SegmentXYZ, features_to_use: list[str] = ['radius_of_gyration', 'volume', 'numerosity'], method: str = 'proportional', threshold: float = 0.2, custom_limits: dict = {}, n_jobs: int = 1, verbose: bool = False) None
Find clusters to retain based on their features and the specified thresholds.
- Parameters:
segment (SegmentXYZ) – Segment to filter
features_to_use (list) – which features to use. Any subset of radius_of_gyration, volume, numerosity
method (str) – ‘proportional’ or ‘percentile’ or ‘custom’
threshold (float) – threshold to use for the filtering. If method is ‘proportional’, it is the fraction of the maximum value of each feature to use as limit. If method is ‘percentile’, it is the percentile to use as limit (0-100)
custom_limits (dict) – in case of ‘custom’ method, which limits should be used for the features. It is expected to be a dictionary with feature names as keys and floats as values
n_jobs (int) – how many cpus to use for the computation
- plot(label_clusters: bool = False, show_all: bool = False)
Display plots representing the clusters features and which have been filtered out.
- Parameters:
label_clusters (bool) – whether to label the clusters on the plots, by default False
show_all (bool) – whether to show all clusters or just the retained ones, by default False
- writeMRCs(mrc_path: str, filename_pattern: str = 'cluster', verbose: bool = False)
Write MRC files for every cluster found in the segment
- Parameters:
mrc_path (str) – Path to the directory where to save the MRC files
filename_pattern (str, optional) – Pattern for naming the files, by default ‘cluster’
verbose (bool, optional) – Whether to print progress messages, by default False
- cima.detection.clusters.getPointwiseDensity(coords: ndarray, radius: float, n_jobs: int = 1, verbose: bool = False) ndarray
Estimates the pointwise density of a set of points.
- Parameters:
coords (np.ndarray) – A 2-dimensional numpy array with each row representing the location of a point
radius (float) – Radius inside of which to count neighbors
n_jobs (int) – How many CPUs have to be used for the computation
- Returns:
A numpy array representing the density of each point in coords, estimated as the count of neighbors inside the specified radius divided by the volume of the corresponding sphere.
- Return type:
np.ndarray
- cima.detection.clusters.search_epsilon(xyz: ndarray, n_neighbors: int = 0, show: bool = True, n_cpus: int = 1) int
Given point coordinates, this function estimates the optimal value of the epsilon parameter for the DBSCAN clustering algorithm. Run for a varied number of neighbors to find the optimal epsilon value.
- Parameters:
xyz (np.ndarray) – Array of shape (n_samples, 3) containing the 3D coordinates of the points.
n_neighbors (str, optional) – The number of neighbors to use for the nearest neighbors search, by default 0. If 0, it is set to 2 * len(xyz[0]) - 1.
show (bool, optional) – If True, the function plots a graph of the sorted distances and the estimated elbow point, by default True
n_cpus (int, optional) – The number of jobs to run in parallel, by default 1
- Returns:
The estimated optimal value of the epsilon parameter for the DBSCAN clustering algorithm.
- Return type:
int