asaplib.data package¶

Submodules¶

asaplib.data.design_matrix module¶

Class for storing and handling design matrices

class asaplib.data.design_matrix.Design_Matrix(X=[], y=[], whiten=True, test_ratio=0, random_seed=42, z=[], tags=[])[source]¶

Bases: object

extended design matrix class

Parameters

X (array-like, shape=[n_samples,n_desc]) –
points. (Input) –
y (array-like, shape=[n_samples]) –
for every point (label) –
testratio (float, ratio of the test fraction) –
z (array-like, shape=[n_samples]) –
label for every point (additional) –
tags (array-like, strings, shape=[n_samples]) –
tags for each data point (additional) –

compute_fit(learner, tag=None, store_results=True, plot=True)[source]¶

Fit the design matrix X and the values y using a learner

Parameters

learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods
tag (str) – The name of this learner

compute_learning_curve(learner, tag=None, lc_points=8, lc_repeats=8, randomseed=42, verbose=True)[source]¶

Fit the learning curve using a learner

Parameters

lc_points (int) – the number of points on the learning curve
lc_repeats (int) – the number of sub-samples to take when compute the learning curve
learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods
tag (str) – The name of this learner

get_sparsified_matrix()[source]¶

save_state(filename, mode='yaml')[source]¶: output json or yaml file

sparsify(n_sparse=None, sparse_mode='fps')[source]¶

select representative data points using the design matrix

Parameters

n_sparse (int) – number of representative points n_sparse == None means 5% of the data n_sparse < 0 means no sparsification
sparse_mode (str) – Methods to use for sparsification [cur], [fps], [random]

asaplib.data.xyz module¶

ASAPXYZ class for handing atomic coordinate input and compute/output

class asaplib.data.xyz.ASAPXYZ(fxyz=None, stride=1, periodic=True, fileformat=None)[source]¶

Bases: object

extended xyz class

Parameters

fxyz (string_like) – the path to the extended xyz file
fmat (string_like) – the name of the descriptors in the extended xyz file
use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file
stride (int) – the stride when reading the xyz file

compute_atomic_descriptors(desc_spec_dict={}, sbs=[], tag=None, n_process=1)[source]¶

compute the atomic descriptors for selected frames :param desc_spec: contrains infos on the descriptors to use :type desc_spec: a list of dictionaries :param e.g.: :param .. code-block:: python: atomic_desc_dict = {

“firstsoap”: {“type”: ‘SOAP’,”species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}

Parameters: sbs (array, integer) – the index of the subset of structures to compute

compute_global_descriptors(desc_spec_dict={}, sbs=[], keep_atomic=False, tag=None, n_process=1)[source]¶

compute the atomic descriptors for selected frames :param desc_spec_dict: :type desc_spec_dict: dictionaries that specify which global descriptor to use. :param e.g.: :param .. code-block:: python:

{‘global_desc1’:
{“type”: ‘CM’}}

# or

{‘global_desc2’:

{‘atomic_descriptor’:

atomic_desc_dict,

‘reducer_function’:
reducer_dict

}}

atomic_desc_dict = {

“firstsoap”:
{“type”: ‘SOAP’, “species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}

reducer_dict = {‘first_reducer’:
{‘reducer_type’: reducer_type, ‘zeta’: zeta, ‘species’: species, ‘element_wise’: element_wise}}

Parameters: sbs (array, integer) – list of the indexes of the subset

fetch_computed_descriptors(desc_dict_keys=[], sbs=[])[source]¶

Fetch the computed descriptors for selected frames :param desc_spec_keys: for which computed descriptors to fetch. :type desc_spec_keys: a list (str-like) of keys :param sbs: :type sbs: array, integer

Returns: desc
Return type: np.matrix [n_frame, n_desc]

get_atomic_descriptors(desc_name_list=[], species_name=None)[source]¶

extract the descriptor array from each frame

Parameters

desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file
species_name (int) –

the atomic number of the species selected.
Only the desciptors of atoms of the specified specied will be returned.

species_name=None means all atoms are selected.

Returns

atomic_desc

Return type

np.matrix

get_atomic_property(y_key=None, extensive=False, sbs=[], species_name=None)[source]¶

extract the property array from each atom

Parameters

y_key (string_like) – the name of the property in the extended xyz file
sbs (array, integer) –
specie (int) –

the atomic number of the species selected.
Only the properties of atoms of the specified specied will be returned. species_name=None means all atoms are selected.

Returns

y_all

Return type

array [N_atoms]

get_descriptors(desc_name_list=[], use_atomic_desc=False, species_name=None)[source]¶

extract the descriptor array from each frame

Parameters

desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file
use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file
species_name (int) – the atomic number of the species selected. Only the desciptors of atoms of the specified specied will be returned. species_name=None means all atoms are selected.

Returns

desc (np.matrix)
atomic_desc (np.matrix)

get_global_species()[source]¶

get_natom_list()[source]¶

get_natom_list_by_species(species_name=None)[source]¶

get_num_frames()[source]¶

get_property(y_key=None, extensive=False, sbs=[])[source]¶

extract specified property from selected frames

Parameters

y_key (string_like) – the name of the property in the extended xyz file
sbs (array, integer) –

Returns

y_all

Return type

array [N_samples]

get_total_natoms()[source]¶

get_xyz()[source]¶

load_properties(filename, header='infer', prefix='X', **kwargs)[source]¶

Load properties from a CSV file

Read in the CSV file and save the columns to the info dictionary of the frames.

Parameters

filename (str) – Name of the CSV file.
header (int) – Row number of the header. Defaults to use the first row unless explicit names for the columns are given

remove_atomic_descriptors(desc_name_list=[])[source]¶: remove the desciptors

remove_descriptors(desc_name_list=[])[source]¶: remove the desciptors

save_descriptor_acronym_state(filename, mode='yaml')[source]¶

save_state(filename, mode='yaml')[source]¶

set_atomic_descriptors(atomic_desc=None, atomic_desc_name=None, species_name=None)[source]¶

write the descriptor array to the atom object

Parameters: desc (np.matrix, shape=[n_descriptors, n_atoms]) –

set_descriptors(desc=None, desc_name=None)[source]¶

write the descriptor array to the atom object

Parameters: desc (np.matrix, shape=[n_descriptors, n_frames]) –

standardize(sbs=[], symprec=0.01)[source]¶: reduce to primitive cell

symmetrise(sbs=[], symprec=0.01)[source]¶

write(filename, sbs=[], save_acronym=False, wrap_output=True)[source]¶

write the selected frames or all the frames to a xyz file

Parameters

filename (str) –
sbs (array, integer) –

write_atomic_descriptor_matrix(filename, desc_name, sbs=[], comment='')[source]¶

write the selected descriptor matrix in a matrix format to file

Parameters

filename (str) –
desc_name (str) – Name of the properties/descriptors to write
sbs (array, integer) –
comment (str) –

write_chemiscope(filename, sbs=None, save_acronym=False, cutoff=None, wrap_output=True)[source]¶

write the selected frames or all the frames to ChemiScope JSON

Parameters

filename (str) –
sbs (array, integer) –
cutoff (float) – generate cutoff for atomic environments, set to None to disable atomic environments

write_computed_descriptors(filename, desc_dict_keys=[], sbs=[], comment='')[source]¶

write the computed descriptors for selected frames :param desc_spec_keys: a list (str-like) of keys for which computed descriptors to fetch. :type desc_spec_keys: list :param sbs: :type sbs: array, integer

Returns: desc
Return type: np.matrix [n_frame, n_desc]

write_descriptor_matrix(filename, desc_name_list, sbs=[], comment='')[source]¶

write the selected descriptor matrix in a matrix format to file

Parameters

filename (str) –
desc_name_list (a list of str.) – Name of the properties/descriptors to write
sbs (array, integer) –
comment (str) –

asaplib.data package¶

Submodules¶

asaplib.data.design_matrix module¶

asaplib.data.xyz module¶

Module contents¶