How-to: asap cluster

asap cluster sub_command is for performing clustering of the data. One can do the cluster using the high-dimensional design matrix generated by asap gen_desc, or the low-dimensional projections of the design matrix generated by the command asap map.

Overview of sub-commands

option

description

dbscan

Density-based spatial clustering of applications with noise…

fdb

Clustering by fast search and find of density peaks (FDB)

plot_pca

Plot the clustering results using a PCA map.

Note

plot_pca does not perfrom the clustering task. it is only used to plot the clustering results using a PCA map. it should be used following a clustering command, e.g. asap cluster -f some.xyz -dm '[*]' fdb plot_pca

asap cluster

Clustering using the design matrix. This command function evaluated before the specific ones, we setup the general stuff here, such as read the files.

asap cluster [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

Options

-f, --fxyz <fxyz>

Input file that contains XYZ coordinates. See a list of possible input formats: https://wiki.fysik.dtu.dk/ase/ase/io/io.html If a wildcard * is used, all files matching the pattern is read.

-p, --prefix <prefix>

Prefix to be used for the output file.

--only_use_species <only_use_species>

Only use the atomic descriptors of species with the specified atomic number. Only makes sense if already using –use_atomic_descriptors.

-ua, --use_atomic_descriptors, --use_atomic

Use atomic descriptors instead of global ones.

-dm, --design_matrix <design_matrix>

Location of descriptor matrix file or name of the tags in ase xyz file the type is a list ‘[dm1, dm2]’, as we can put together simutanously several design matrix.

-km, --kernel_matrix <kernel_matrix>

Location of a kernel matrix file

--savetxt, --no-savetxt

Save the results to the txt file

--savexyz, --no-savexyz

Save the results to the xyz file

dbscan

Density-based spatial clustering of applications with noise (DBSCAN)

asap cluster dbscan [OPTIONS]

Options

--metric <metric>

controls how distance is computed in the ambient space of the input data. See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

Default

euclidean

-ms, --min_samples <min_samples>

The number of samples (or total weight) in a neighborhood for a point to be considered as a core point.

Default

5

-e, --eps <eps>

The maximum distance between two samples for one to be considered as in the neighborhood of the other.

fdb

Clustering by fast search and find of density peaks (FDB)

asap cluster fdb [OPTIONS]

plot_pca

Plot the clustering results using a PCA map. Only use this command after fdb or dbscan.

asap cluster plot_pca [OPTIONS]

Options

-s, --style <style>

Style of the plot.

Default

default

Options

default|journal

-ar, --aspect_ratio <aspect_ratio>

Aspect ratio of the plot

Default

2

-a, --annotate <annotate>

Location of tags to annotate the samples.

--adjusttext, --no-adjusttext

Adjust the annotation texts so they do not overlap.

--peratom

Save the per-atom projection.

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

-p, --prefix <prefix>

Prefix to be used for the output file.

Note

More documentation to be added.