asaplib.cluster package

Submodules

asaplib.cluster.ml_cluster_base module

Base classes for clustering algorithms.

class asaplib.cluster.ml_cluster_base.ClusterBase[source]

Bases: sklearn.base.ClusterMixin

Data structure to perform clustering and store data associated with the clustering output.

fit(X, y=None)[source]
Parameters
  • X

  • y

get_cluster_labels(index=[])[source]
Parameters

index

get_name()[source]
get_params(deep=True)[source]
Parameters

deep

class asaplib.cluster.ml_cluster_base.FitClusterBase[source]

Bases: object

fit(dmatrix, rho=None)[source]
Parameters
  • dmatrix

  • rho

asaplib.cluster.ml_cluster_fit module

Density-based Clustering Algorithms

class asaplib.cluster.ml_cluster_fit.DBCluster(trainer)[source]

Bases: asaplib.cluster.ml_cluster_base.ClusterBase

Performing clustering using density based clustering algorithm

fit(dmatrix, rho=None)[source]

fit the clustering model, assume input of NxN distance matrix or Nxm coordinates

get_cluster_labels(index=[])[source]

return the label of the samples in the list of index

get_n_cluster()[source]
get_n_noise()[source]
pack()[source]

return all the info

save_state(filename, mode='json')[source]
class asaplib.cluster.ml_cluster_fit.LAIO_DB(distances=None, indices=None, dens_type=None, dc=None, percent=2.0)[source]

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

Clustering by fast search and find of density peaks, Rodriguez and Laio (2014).

https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf

math

ho_i,=,sum_j{chi(d_{ij}-d_{cut})}
math

delta_i,=,min_{j:

ho_j> ho_i}(d_{ij})

fit(data, rho=None)[source]

Compute the center labels.

Parameters
  • data (numpy array of shape (Nele, proj_dim) where proj_dim is the number of components the kernel matrix has been) – projected to.

  • rho (densities, default is None since the DP.py module computes the densities itself.) –

  • Returns (cluster_labels: numpy array of shape (Nele,) giving the cluster (int from 0 to N) each data point) –

  • to. Halo points are designated as -1. (belongs) –

  • -------

get_assignation(data)[source]
Parameters
  • data (numpy array of shape (Nele, proj_dim) where proj_dim gives the number of dimensions the kernel matrix is) – projected to

  • Returns (self.halo numpy array of shape (Nele, ) where halo points are designated as -1 and otherwise are) –

  • to their respective cluster centres. (assigned) –

get_dc(data)[source]

Compute the cutoff distance given the data.

Parameters
  • data (np.matrix) –

  • data array of shape (Nele, proj_dim) where N is the number of data points and proj_dim is the number of (The) –

  • components of the kernel matrix. (projected) –

Returns

self.dc – the cutoff distance

Return type

float

get_decision_graph(data, fplot=True)[source]

Method currently doesn’t produce the decision graph.

Parameters
  • data (numpy array of shape (Nele, proj_dim)) –

  • fplot (Boolean indicating whether or not to plot the decision graph) –

pack()[source]

Return dictionary containing the cutoff distance, self.dc, for data points to contribute to the local density of another data point as well as self.dens_cut, the density threshold for defining a cluster.

class asaplib.cluster.ml_cluster_fit.old_LAIO(deltamin=- 1, rhomin=- 1)[source]

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

Laio Clustering scheme

Clustering by fast search and find of density peaks https://science.sciencemag.org/content/sci/344/6191/1492.full.pdf

math

rho_i,=,sum_j{chi(d_{ij}-d_{cut})} i.e. the local density of data point x_i

math

delta_i,=,min_{j:rho_j>rho_i}(d_{ij}) i.e. the minimum distance to a neighbour with higher density

A summary of laio clustering algorithm: 1. First do a kernel density estimation (rho_i) for each data point i 2. For each data point i, compute the distance (delta_i) between i and j,

j is the closet data point that has a density higher then i, i.e. rho(j) > rho(i).

  1. Plot the decision graph, which is a scatter plot of (rho_i, delta_i)

  2. Select cluster centers ({cl}), which are the outliers in the decision graph that fulfills: i) rho({cl}) > rhomin ii) delta({cl}) > delta_min one needs to set the two parameters rhomin and delta_min.

  3. After the cluster centers are determined, data points are assigned to the nearest cluster center.

one needs to set two parameters:

estimate_delta(dist, rho)[source]

For each data point i, compute the distance (delta_i) between i and j, j is the closest data point that has a density higher then i, i.e. rho(j) > rho(i).

Parameters
  • dist (distance matrix of shape (Nele, Nele)) –

  • rho (log densities for each data point array of shape (Nele,)) –

  • Returns (delta: numpy array of distances to nearest cluster centre for each datapoint.) – nneight: numpy array giving the index of the nearest cluster centre.

  • -------

fit(dmatrix, rho=None)[source]
Parameters
  • dmatrix (The distance matrix of shape (Nele, Nele)) –

  • rho (The log densities of the points of shape (Nele,)) –

pack()[source]

return all the info

class asaplib.cluster.ml_cluster_fit.sklearn_DB(eps=None, min_samples=None, metrictype='precomputed')[source]

Bases: asaplib.cluster.ml_cluster_base.FitClusterBase

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html eps : float, optional

The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your dataset and distance function.

min_samplesint, optional

The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself.

metricstring, or callable

The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by sklearn.metrics.pairwise_distances for its metric parameter. If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only “nonzero” elements may be considered neighbors for DBSCAN.

fit(dmatrix, rho=None)[source]
Parameters
  • dmatrix

  • rho

pack()[source]

return all the info

asaplib.cluster.ml_cluster_tools module

Tools to analyze clustering results

asaplib.cluster.ml_cluster_tools.array_handling(plist, attribute='mean')[source]

available attributes: mean, sum, min, max, mode, all

asaplib.cluster.ml_cluster_tools.get_cluster_properties(labels, properties, attribute='mean')[source]
asaplib.cluster.ml_cluster_tools.get_cluster_size(labels)[source]
asaplib.cluster.ml_cluster_tools.get_cluster_weighted_avg_properties(labels, properties, weights)[source]
asaplib.cluster.ml_cluster_tools.most_frequent(List)[source]
asaplib.cluster.ml_cluster_tools.output_cluster(prefix, labels, dicttags, tags)[source]
asaplib.cluster.ml_cluster_tools.output_cluster_sort(prefix, labels, dicttags, tags)[source]

Module contents