How-to: asap cluster¶

asap cluster sub_command is for performing clustering of the data. One can do the cluster using the high-dimensional design matrix generated by asap gen_desc, or the low-dimensional projections of the design matrix generated by the command asap map.

Overview of sub-commands¶

option	description
dbscan	Density-based spatial clustering of applications with noise…
fdb	Clustering by fast search and find of density peaks (FDB)
plot_pca	Plot the clustering results using a PCA map.

Note

plot_pca does not perfrom the clustering task. it is only used to plot the clustering results using a PCA map. it should be used following a clustering command, e.g. asap cluster -f some.xyz -dm '[*]' fdb plot_pca

asap cluster¶

Clustering using the design matrix. This command function evaluated before the specific ones, we setup the general stuff here, such as read the files.

asap cluster [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

Options

-f, --fxyz <fxyz>¶: Input file that contains XYZ coordinates. See a list of possible input formats: https://wiki.fysik.dtu.dk/ase/ase/io/io.html If a wildcard * is used, all files matching the pattern is read.

-p, --prefix <prefix>¶: Prefix to be used for the output file.

--only_use_species <only_use_species>¶: Only use the atomic descriptors of species with the specified atomic number. Only makes sense if already using –use_atomic_descriptors.

-ua, --use_atomic_descriptors, --use_atomic¶: Use atomic descriptors instead of global ones.

-dm, --design_matrix <design_matrix>¶: Location of descriptor matrix file or name of the tags in ase xyz file the type is a list ‘[dm1, dm2]’, as we can put together simutanously several design matrix.

-km, --kernel_matrix <kernel_matrix>¶: Location of a kernel matrix file

--savetxt, --no-savetxt¶: Save the results to the txt file

--savexyz, --no-savexyz¶: Save the results to the xyz file

dbscan¶

Density-based spatial clustering of applications with noise (DBSCAN)

asap cluster dbscan [OPTIONS]

Options

--metric <metric>¶

controls how distance is computed in the ambient space of the input data. See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

Default: euclidean

-ms, --min_samples <min_samples>¶

The number of samples (or total weight) in a neighborhood for a point to be considered as a core point.

Default: 5

-e, --eps <eps>¶: The maximum distance between two samples for one to be considered as in the neighborhood of the other.

fdb¶

Clustering by fast search and find of density peaks (FDB)

asap cluster fdb [OPTIONS]

plot_pca¶

Plot the clustering results using a PCA map. Only use this command after fdb or dbscan.

asap cluster plot_pca [OPTIONS]

Options

-s, --style <style>¶

Style of the plot.

Default: default
Options: default|journal

-ar, --aspect_ratio <aspect_ratio>¶

Aspect ratio of the plot

Default: 2

-a, --annotate <annotate>¶: Location of tags to annotate the samples.

--adjusttext, --no-adjusttext¶: Adjust the annotation texts so they do not overlap.

--peratom¶: Save the per-atom projection.

--scale, --no-scale¶: Standard scaling of the coordinates.

-d, --dimension <dimension>¶: Number of the dimensions to keep in the output XYZ file.

--axes <axes>¶: Plot the projection along which projection axes.

-p, --prefix <prefix>¶: Prefix to be used for the output file.

Note