How-to: asap map

asap map sub_command is for making low-dimensional projections of the design matrix generated by the command asap gen_desc.

Overview of sub-commands

sub-commands that controls which algorithm to use for the dimensionality reduction:

option

description

pca

Principal Component Analysis

raw

Just plot the raw coordinates

skpca

Sparse Kernel Principal Component Analysis

tsne

t-SNE

umap

UMAP

asap map

Making 2D maps using dimensionality reduction. This command function evaluated before the specific ones, we setup the general stuff here, such as read the files.

asap map [OPTIONS] COMMAND [ARGS]...

Options

-f, --fxyz <fxyz>

Input file that contains XYZ coordinates. See a list of possible input formats: https://wiki.fysik.dtu.dk/ase/ase/io/io.html If a wildcard * is used, all files matching the pattern is read.

-p, --prefix <prefix>

Prefix to be used for the output file.

--only_use_species <only_use_species>

Only use the atomic descriptors of species with the specified atomic number. Only makes sense if already using –use_atomic_descriptors.

-ua, --use_atomic_descriptors, --use_atomic

Use atomic descriptors instead of global ones.

-dm, --design_matrix <design_matrix>

Location of descriptor matrix file or name of the tags in ase xyz file the type is a list ‘[dm1, dm2]’, as we can put together simutanously several design matrix.

-s, --style <style>

Style of the plot.

Default

default

Options

default|journal

-ar, --aspect_ratio <aspect_ratio>

Aspect ratio of the plot

Default

2

-a, --annotate <annotate>

Location of tags to annotate the samples.

--adjusttext, --no-adjusttext

Adjust the annotation texts so they do not overlap.

--peratom

Save the per-atom projection.

-ep, --extra-properties <extra_properties>

Additional properties to be read for each frmae in CSV format.

-o, --output <output>

Output file format.

Options

xyz|matrix|none|chemiscope

--keepraw, --no-keepraw

Keep the high dimensional descriptor when output XYZ file.

-c, --color <color>

Location of a file or name of the properties in the XYZ file. Used to color the scatter plot for all samples (N floats).

-ccol, --color_column <color_column>

The column number used in the color file. Starts from 0.

-clab, --color_label <color_label>

The label for the color bar.

-c0, --color_from_zero

Set the minimum to zero and only plot the excess.

Default

False

-cmap, --colormap <colormap>

Colormap used. Common options: gnuplot, tab10, viridis, bwr, rainbow.

Default

gnuplot

-nbs, --normalized_by_size

Normalize the quantity used for color function by the number of atoms in each frame.

Default

False

pca

Principal Component Analysis

asap map pca [OPTIONS]

Options

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

raw

Just plot the raw coordinates

asap map raw [OPTIONS]

Options

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

skpca

Sparse Kernel Principal Component Analysis

asap map skpca [OPTIONS]

Options

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

-k, --kernel <kernel>

Kernel function for converting design matrix to kernel matrix.

Default

linear

Options

linear|polynomial|cosine

-kp, --kernel_parameter <kernel_parameter>

Parameter used in the kernel function.

-s, --sparse_mode <sparse_mode>

Sparsification method to use.

Default

fps

Options

random|cur|fps|sequential

-n, --n_sparse <n_sparse>

number of the representative samples, set negative if using no sparsification

Default

100

tsne

t-SNE

asap map tsne [OPTIONS]

Options

--metric <metric>

controls how distance is computed in the ambient space of the input data. See: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

Default

euclidean

-l, --learning_rate <learning_rate>

The learning rate is usually in the range [10.0, 1000.0].

Default

200.0

-e, --early_exaggeration <early_exaggeration>

Controls how tight natural clusters in the original space are in the embedded space.

Default

12.0

-p, --perplexity <perplexity>

The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significanlty different results.

Default

30.0

--pca, --no-pca

Preprocessing the data using PCA with dimension 50. Recommended.

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

umap

UMAP

asap map umap [OPTIONS]

Options

-nn, --n_neighbors <n_neighbors>

Controls how UMAP balances local versus global structure in the data.

Default

10

-md, --min_dist <min_dist>

controls how tightly UMAP is allowed to pack points together.

Default

0.1

--metric <metric>

controls how distance is computed in the ambient space of the input data. See: https://umap-learn.readthedocs.io/en/latest/parameters.html#metric

Default

euclidean

--scale, --no-scale

Standard scaling of the coordinates.

-d, --dimension <dimension>

Number of the dimensions to keep in the output XYZ file.

--axes <axes>

Plot the projection along which projection axes.

Note

More documentation to be added.