asaplib.cluster package¶

Submodules¶

asaplib.cluster.ml_cluster_base module¶

Base classes for clustering algorithms.

class asaplib.cluster.ml_cluster_base.ClusterBase[source]¶

Bases: sklearn.base.ClusterMixin

Data structure to perform clustering and store data associated with the clustering output.

fit(X, y=None)[source]¶

Parameters

X –
y –

get_cluster_labels(index=[])[source]¶

Parameters: index –

get_name()[source]¶

get_params(deep=True)[source]¶

Parameters: deep –

class asaplib.cluster.ml_cluster_base.FitClusterBase[source]¶

Bases: object

fit(dmatrix, rho=None)[source]¶

Parameters

dmatrix –
rho –

asaplib.cluster.ml_cluster_fit module¶

Density-based Clustering Algorithms

class asaplib.cluster.ml_cluster_fit.DBCluster(trainer)[source]¶

Bases: asaplib.cluster.ml_cluster_base.ClusterBase

Performing clustering using density based clustering algorithm

fit(dmatrix, rho=None)[source]¶: fit the clustering model, assume input of NxN distance matrix or Nxm coordinates

get_cluster_labels(index=[])[source]¶: return the label of the samples in the list of index

get_n_cluster()[source]¶

get_n_noise()[source]¶

pack()[source]¶: return all the info

save_state(filename, mode='json')[source]¶

class asaplib.cluster.ml_cluster_fit.LAIO_DB(distances=None, indices=None, dens_type=None, dc=None, percent=2.0)[source]¶

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

Clustering by fast search and find of density peaks, Rodriguez and Laio (2014).

https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf

math

ho_i,=,sum_j{chi(d_{ij}-d_{cut})}

math: delta_i,=,min_{j:

ho_j> ho_i}(d_{ij})

fit(data, rho=None)[source]¶

Compute the center labels.

Parameters

data (numpy array of shape (Nele, proj_dim) where proj_dim is the number of components the kernel matrix has been) – projected to.
rho (densities, default is None since the DP.py module computes the densities itself.) –
Returns (cluster_labels: numpy array of shape (Nele,) giving the cluster (int from 0 to N) each data point) –
to. Halo points are designated as -1. (belongs) –
------- –

get_assignation(data)[source]¶

Parameters

data (numpy array of shape (Nele, proj_dim) where proj_dim gives the number of dimensions the kernel matrix is) – projected to
Returns (self.halo numpy array of shape (Nele, ) where halo points are designated as -1 and otherwise are) –
to their respective cluster centres. (assigned) –

get_dc(data)[source]¶

Compute the cutoff distance given the data.

Parameters

data (np.matrix) –
data array of shape (Nele, proj_dim) where N is the number of data points and proj_dim is the number of (The) –
components of the kernel matrix. (projected) –

Returns

self.dc – the cutoff distance

Return type

float

get_decision_graph(data, fplot=True)[source]¶

Method currently doesn’t produce the decision graph.

Parameters

data (numpy array of shape (Nele, proj_dim)) –
fplot (Boolean indicating whether or not to plot the decision graph) –

pack()[source]¶: Return dictionary containing the cutoff distance, self.dc, for data points to contribute to the local density of another data point as well as self.dens_cut, the density threshold for defining a cluster.

class asaplib.cluster.ml_cluster_fit.old_LAIO(deltamin=- 1, rhomin=- 1)[source]¶

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

Laio Clustering scheme

Clustering by fast search and find of density peaks https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf

math: rho_i,=,sum_j{chi(d_{ij}-d_{cut})} i.e. the local density of data point x_i
math: delta_i,=,min_{j:rho_j>rho_i}(d_{ij}) i.e. the minimum distance to a neighbour with higher density

A summary of laio clustering algorithm: 1. First do a kernel density estimation (rho_i) for each data point i 2. For each data point i, compute the distance (delta_i) between i and j,

j is the closet data point that has a density higher then i, i.e. rho(j) > rho(i).

Plot the decision graph, which is a scatter plot of (rho_i, delta_i)
Select cluster centers ({cl}), which are the outliers in the decision graph that fulfills: i) rho({cl}) > rhomin ii) delta({cl}) > delta_min one needs to set the two parameters rhomin and delta_min.
After the cluster centers are determined, data points are assigned to the nearest cluster center.

one needs to set two parameters:

estimate_delta(dist, rho)[source]¶

For each data point i, compute the distance (delta_i) between i and j, j is the closest data point that has a density higher then i, i.e. rho(j) > rho(i).

Parameters

dist (distance matrix of shape (Nele, Nele)) –
rho (log densities for each data point array of shape (Nele,)) –
Returns (delta: numpy array of distances to nearest cluster centre for each datapoint.) – nneight: numpy array giving the index of the nearest cluster centre.
------- –

fit(dmatrix, rho=None)[source]¶

Parameters

dmatrix (The distance matrix of shape (Nele, Nele)) –
rho (The log densities of the points of shape (Nele,)) –

pack()[source]¶: return all the info

class asaplib.cluster.ml_cluster_fit.sklearn_DB(eps=None, min_samples=None, metrictype='precomputed')[source]¶

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html eps : float, optional

The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your dataset and distance function.

min_samplesint, optional: The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.
metricstring, or callable: The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only “nonzero” elements may be considered neighbors for DBSCAN.

fit(dmatrix, rho=None)[source]¶

Parameters

dmatrix –
rho –

pack()[source]¶: return all the info

asaplib.cluster.ml_cluster_tools module¶

Tools to analyze clustering results

asaplib.cluster.ml_cluster_tools.array_handling(plist, attribute='mean')[source]¶: available attributes: mean, sum, min, max, mode, all

asaplib.cluster.ml_cluster_tools.get_cluster_properties(labels, properties, attribute='mean')[source]¶

asaplib.cluster.ml_cluster_tools.get_cluster_size(labels)[source]¶

asaplib.cluster.ml_cluster_tools.get_cluster_weighted_avg_properties(labels, properties, weights)[source]¶

asaplib.cluster.ml_cluster_tools.most_frequent(List)[source]¶

asaplib.cluster.ml_cluster_tools.output_cluster(prefix, labels, dicttags, tags)[source]¶

asaplib.cluster.ml_cluster_tools.output_cluster_sort(prefix, labels, dicttags, tags)[source]¶

asaplib.cluster package¶

Submodules¶

asaplib.cluster.ml_cluster_base module¶

asaplib.cluster.ml_cluster_fit module¶

asaplib.cluster.ml_cluster_tools module¶

Module contents¶