How-to: asap gen_desc

asap gen_desc sub_command is the descriptor generation command. This is in general the first step to do, no matter if you want to map the dataset, perform regression or any other analysis. Type asap gen_desc --help and asap gen_desc sub_command --help for helper strings.

Input

asap gen_desc can read any format that is also supported by ASE.[ase.io](https://wiki.fysik.dtu.dk/ase/ase/io/io.html) is supported. For example: xyz, lammps-data, cif, cell, vasp, res, gromacs, … However, it is most thoroughly tested on extended xyz files (units in angstrom, additional info and cell size in the comment line). It can read in lots of files using a wildcard and some common pattern e.g. H*.cell.

Note

when glob pattern is passed it should be quoted: e.g. asap gen_desc -f *.cell will not work but asap gen_desc -f "*.cell" does.

Note

if the program gets a bit confused about the file format, try supply some additional information using --fxyz_format '{...}' flag.

Output

The code will output two files

  • ${prefix}-desc.xyz, where ${prefix} is determined by the string that follows --prefix flag (default: ASAP). This file is an extended xyz file that contain the design matrix (descriptors for each structure and/or atomic environments).

  • ${prefix}-desc.yaml that contains all the meta information about how the design matrix was generated (i.e. which descriptor, hyper-parameter was used.)

Methods

There are two types of descriptors for atomic structures: the first type (e.g. ACSF, SOAP) starts from atomic descriptors for each atom in the structure, and then all the atomic descriptors associated with a structure are reduced to a global descriptor. The second type (e.g. Coulumb Matrix) directly generates global descriptors.

To reduce the atomic descriptors of all atoms in a structure to a single vector representing the whole strucuture, asap calls a reducer, and there are different options. For example, for a structure A, the most straightforward way to get its global descriptor is to simply take the average of the atomic ones, i.e.

\[\Phi (A) = \dfrac{1}{N_A}\sum_{i \in A}^{N_A} \psi (\mathcal{X}_i).\]

which is the average reducer. Alternatively, one can take the sum. The moment_average or moment_average reducer first take the moment of the atomic descriptors, e.g.

\[\Phi (A) = \dfrac{1}{N_A}\sum_{i \in A}^{N_A} \psi^{z} (\mathcal{X}_i).\]

where z (--zeta/-z) is the moment to take when converting atomic descriptors to global ones.

In addition to these, one can perform the summation or the average operation on the per-element basis, by using the flag --element_wise/-e.

Overview of sub-commands

sub-commands that controls the actual generation of the descriptor matrix:

option

description

acsf

Generate ACSF descriptors

cm

Generate the Coulomb Matrix descriptors

run

Running analysis using input files

soap

Generate SOAP descriptors

asap gen_desc

Descriptor generation command This command function evaluated before the descriptor specific ones, we setup the general stuff here, such as read the files.

asap gen_desc [OPTIONS] COMMAND [ARGS]...

Options

-s, --stride <stride>

Read in the xyz trajectory with X stide. Default: read/compute all frames.

--periodic, --no-periodic

Is the system periodic? If not specified, will infer from the XYZ file.

-i, --in_file, --in <in_file>

The state file that includes a dictionary-like specifications of descriptors to use.

-f, --fxyz <fxyz>

Input file that contains XYZ coordinates. See a list of possible input formats: https://wiki.fysik.dtu.dk/ase/ase/io/io.html If a wildcard * is used, all files matching the pattern is read.

--fxyz_format <fxyz_format>

Additional info for the input file format. e.g. {“format”:”lammps-data”,”units”:”metal”,”style”:”full”}

-p, --prefix <prefix>

Prefix to be used for the output file.

-np, --number_processes, --nprocess <number_processes>

Number of processes when compute the descriptors in parrallel.

Default

1

acsf

Generate ACSF descriptors

asap gen_desc acsf [OPTIONS]

Options

-c, --cutoff <cutoff>

Cutoff radius

-u, --universal_acsf, --uacsf <universal_acsf>

Try out our universal ACSF parameters.

Default

minimal

Options

none|smart|minimal|longrange

--tag <tag>

Tag for the descriptors.

-pa, --peratom

Save the per-atom local descriptors.

Default

False

-e, --element_wise

element-wise operation to get global descriptors from the atomic soap vectors

Default

False

-z, --zeta <zeta>

Moments to take when converting atomic descriptors to global ones.

-r, --reducer_type <reducer_type>

type of operations to get global descriptors from the atomic soap vectors, e.g. [average], [sum], [moment_avg], [moment_sum].

Default

average

cm

Generate the Coulomb Matrix descriptors

asap gen_desc cm [OPTIONS]

Options

--tag <tag>

Tag for the descriptors.

run

Running analysis using input files

asap gen_desc run [OPTIONS]

soap

Generate SOAP descriptors

asap gen_desc soap [OPTIONS]

Options

-c, --cutoff <cutoff>

Cutoff radius

-n, --nmax <nmax>

Maximum radial label

-l, --lmax <lmax>

Maximum angular label (<= 9)

--rbf <rbf>

Radial basis function

Default

gto

Options

gto|polynomial

-sigma, -g, --atom-gaussian-width <atom_gaussian_width>

The width of the Gaussian centered on atoms.

Default

0.5

--crossover, --no-crossover

If to included the crossover of atomic types.

Default

False

-u, --universal_soap, --usoap <universal_soap>

Try out our universal SOAP parameters.

Default

minimal

Options

none|smart|minimal|longrange

--tag <tag>

Tag for the descriptors.

-pa, --peratom

Save the per-atom local descriptors.

Default

False

-e, --element_wise

element-wise operation to get global descriptors from the atomic soap vectors

Default

False

-z, --zeta <zeta>

Moments to take when converting atomic descriptors to global ones.

-r, --reducer_type <reducer_type>

type of operations to get global descriptors from the atomic soap vectors, e.g. [average], [sum], [moment_avg], [moment_sum].

Default

average

Note

More documentation to be added.