asaplib.cluster package¶
Submodules¶
asaplib.cluster.ml_cluster_base module¶
Base classes for clustering algorithms.
asaplib.cluster.ml_cluster_fit module¶
Density-based Clustering Algorithms
-
class
asaplib.cluster.ml_cluster_fit.
DBCluster
(trainer)[source]¶ Bases:
asaplib.cluster.ml_cluster_base.ClusterBase
Performing clustering using density based clustering algorithm
-
class
asaplib.cluster.ml_cluster_fit.
LAIO_DB
(distances=None, indices=None, dens_type=None, dc=None, percent=2.0)[source]¶ Bases:
asaplib.cluster.ml_cluster_base.FitClusterBase
Clustering by fast search and find of density peaks, Rodriguez and Laio (2014).
https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf
- math
- ho_i,=,sum_j{chi(d_{ij}-d_{cut})}
- math
delta_i,=,min_{j:
ho_j> ho_i}(d_{ij})
-
fit
(data, rho=None)[source]¶ Compute the center labels.
- Parameters
data (numpy array of shape (Nele, proj_dim) where proj_dim is the number of components the kernel matrix has been) – projected to.
rho (densities, default is None since the DP.py module computes the densities itself.) –
Returns (cluster_labels: numpy array of shape (Nele,) giving the cluster (int from 0 to N) each data point) –
to. Halo points are designated as -1. (belongs) –
------- –
-
get_assignation
(data)[source]¶ - Parameters
data (numpy array of shape (Nele, proj_dim) where proj_dim gives the number of dimensions the kernel matrix is) – projected to
Returns (self.halo numpy array of shape (Nele, ) where halo points are designated as -1 and otherwise are) –
to their respective cluster centres. (assigned) –
-
get_dc
(data)[source]¶ Compute the cutoff distance given the data.
- Parameters
data (np.matrix) –
data array of shape (Nele, proj_dim) where N is the number of data points and proj_dim is the number of (The) –
components of the kernel matrix. (projected) –
- Returns
self.dc – the cutoff distance
- Return type
float
-
class
asaplib.cluster.ml_cluster_fit.
old_LAIO
(deltamin=- 1, rhomin=- 1)[source]¶ Bases:
asaplib.cluster.ml_cluster_base.FitClusterBase
- Laio Clustering scheme
Clustering by fast search and find of density peaks https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf
- math
rho_i,=,sum_j{chi(d_{ij}-d_{cut})} i.e. the local density of data point x_i
- math
delta_i,=,min_{j:rho_j>rho_i}(d_{ij}) i.e. the minimum distance to a neighbour with higher density
A summary of laio clustering algorithm: 1. First do a kernel density estimation (rho_i) for each data point i 2. For each data point i, compute the distance (delta_i) between i and j,
j is the closet data point that has a density higher then i, i.e. rho(j) > rho(i).
Plot the decision graph, which is a scatter plot of (rho_i, delta_i)
Select cluster centers ({cl}), which are the outliers in the decision graph that fulfills: i) rho({cl}) > rhomin ii) delta({cl}) > delta_min one needs to set the two parameters rhomin and delta_min.
After the cluster centers are determined, data points are assigned to the nearest cluster center.
one needs to set two parameters:
-
estimate_delta
(dist, rho)[source]¶ For each data point i, compute the distance (delta_i) between i and j, j is the closest data point that has a density higher then i, i.e. rho(j) > rho(i).
- Parameters
dist (distance matrix of shape (Nele, Nele)) –
rho (log densities for each data point array of shape (Nele,)) –
Returns (delta: numpy array of distances to nearest cluster centre for each datapoint.) – nneight: numpy array giving the index of the nearest cluster centre.
------- –
-
class
asaplib.cluster.ml_cluster_fit.
sklearn_DB
(eps=None, min_samples=None, metrictype='precomputed')[source]¶ Bases:
asaplib.cluster.ml_cluster_base.FitClusterBase
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html eps : float, optional
The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your dataset and distance function.
- min_samplesint, optional
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.
- metricstring, or callable
The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only “nonzero” elements may be considered neighbors for DBSCAN.
asaplib.cluster.ml_cluster_tools module¶
Tools to analyze clustering results
-
asaplib.cluster.ml_cluster_tools.
array_handling
(plist, attribute='mean')[source]¶ available attributes: mean, sum, min, max, mode, all
-
asaplib.cluster.ml_cluster_tools.
get_cluster_properties
(labels, properties, attribute='mean')[source]¶