How-to: asap map¶
asap map sub_command is for making low-dimensional projections of the design matrix generated by the command asap gen_desc.
Overview of sub-commands¶
sub-commands that controls which algorithm to use for the dimensionality reduction:
option |
description |
|---|---|
pca |
Principal Component Analysis |
raw |
Just plot the raw coordinates |
skpca |
Sparse Kernel Principal Component Analysis |
tsne |
t-SNE |
umap |
UMAP |
asap map¶
Making 2D maps using dimensionality reduction. This command function evaluated before the specific ones, we setup the general stuff here, such as read the files.
asap map [OPTIONS] COMMAND [ARGS]...
Options
-
-f,--fxyz<fxyz>¶ Input file that contains XYZ coordinates. See a list of possible input formats: https://wiki.fysik.dtu.dk/ase/ase/io/io.html If a wildcard * is used, all files matching the pattern is read.
-
-p,--prefix<prefix>¶ Prefix to be used for the output file.
-
--only_use_species<only_use_species>¶ Only use the atomic descriptors of species with the specified atomic number. Only makes sense if already using –use_atomic_descriptors.
-
-ua,--use_atomic_descriptors,--use_atomic¶ Use atomic descriptors instead of global ones.
-
-dm,--design_matrix<design_matrix>¶ Location of descriptor matrix file or name of the tags in ase xyz file the type is a list ‘[dm1, dm2]’, as we can put together simutanously several design matrix.
-
-s,--style<style>¶ Style of the plot.
- Default
default
- Options
default|journal
-
-ar,--aspect_ratio<aspect_ratio>¶ Aspect ratio of the plot
- Default
2
-
-a,--annotate<annotate>¶ Location of tags to annotate the samples.
-
--adjusttext,--no-adjusttext¶ Adjust the annotation texts so they do not overlap.
-
--peratom¶ Save the per-atom projection.
-
-ep,--extra-properties<extra_properties>¶ Additional properties to be read for each frmae in CSV format.
-
-o,--output<output>¶ Output file format.
- Options
xyz|matrix|none|chemiscope
-
--keepraw,--no-keepraw¶ Keep the high dimensional descriptor when output XYZ file.
-
-c,--color<color>¶ Location of a file or name of the properties in the XYZ file. Used to color the scatter plot for all samples (N floats).
-
-ccol,--color_column<color_column>¶ The column number used in the color file. Starts from 0.
-
-clab,--color_label<color_label>¶ The label for the color bar.
-
-c0,--color_from_zero¶ Set the minimum to zero and only plot the excess.
- Default
False
-
-cmap,--colormap<colormap>¶ Colormap used. Common options: gnuplot, tab10, viridis, bwr, rainbow.
- Default
gnuplot
-
-nbs,--normalized_by_size¶ Normalize the quantity used for color function by the number of atoms in each frame.
- Default
False
pca¶
Principal Component Analysis
asap map pca [OPTIONS]
Options
-
--scale,--no-scale¶ Standard scaling of the coordinates.
-
-d,--dimension<dimension>¶ Number of the dimensions to keep in the output XYZ file.
-
--axes<axes>¶ Plot the projection along which projection axes.
raw¶
Just plot the raw coordinates
asap map raw [OPTIONS]
Options
-
--scale,--no-scale¶ Standard scaling of the coordinates.
-
-d,--dimension<dimension>¶ Number of the dimensions to keep in the output XYZ file.
-
--axes<axes>¶ Plot the projection along which projection axes.
skpca¶
Sparse Kernel Principal Component Analysis
asap map skpca [OPTIONS]
Options
-
--scale,--no-scale¶ Standard scaling of the coordinates.
-
-d,--dimension<dimension>¶ Number of the dimensions to keep in the output XYZ file.
-
--axes<axes>¶ Plot the projection along which projection axes.
-
-k,--kernel<kernel>¶ Kernel function for converting design matrix to kernel matrix.
- Default
linear
- Options
linear|polynomial|cosine
-
-kp,--kernel_parameter<kernel_parameter>¶ Parameter used in the kernel function.
-
-s,--sparse_mode<sparse_mode>¶ Sparsification method to use.
- Default
fps
- Options
random|cur|fps|sequential
-
-n,--n_sparse<n_sparse>¶ number of the representative samples, set negative if using no sparsification
- Default
100
tsne¶
t-SNE
asap map tsne [OPTIONS]
Options
-
--metric<metric>¶ controls how distance is computed in the ambient space of the input data. See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
- Default
euclidean
-
-l,--learning_rate<learning_rate>¶ The learning rate is usually in the range [10.0, 1000.0].
- Default
200.0
-
-e,--early_exaggeration<early_exaggeration>¶ Controls how tight natural clusters in the original space are in the embedded space.
- Default
12.0
-
-p,--perplexity<perplexity>¶ The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significanlty different results.
- Default
30.0
-
--pca,--no-pca¶ Preprocessing the data using PCA with dimension 50. Recommended.
-
--scale,--no-scale¶ Standard scaling of the coordinates.
-
-d,--dimension<dimension>¶ Number of the dimensions to keep in the output XYZ file.
-
--axes<axes>¶ Plot the projection along which projection axes.
umap¶
UMAP
asap map umap [OPTIONS]
Options
-
-nn,--n_neighbors<n_neighbors>¶ Controls how UMAP balances local versus global structure in the data.
- Default
10
-
-md,--min_dist<min_dist>¶ controls how tightly UMAP is allowed to pack points together.
- Default
0.1
-
--metric<metric>¶ controls how distance is computed in the ambient space of the input data. See: https://umap-learn.readthedocs.io/en/latest/parameters.html#metric
- Default
euclidean
-
--scale,--no-scale¶ Standard scaling of the coordinates.
-
-d,--dimension<dimension>¶ Number of the dimensions to keep in the output XYZ file.
-
--axes<axes>¶ Plot the projection along which projection axes.
Note
More documentation to be added.