asaplib.data package

Submodules

asaplib.data.design_matrix module

Class for storing and handling design matrices

class asaplib.data.design_matrix.Design_Matrix(X=[], y=[], whiten=True, test_ratio=0, random_seed=42, z=[], tags=[])[source]

Bases: object

extended design matrix class

Parameters
  • X (array-like, shape=[n_samples,n_desc]) –

  • points. (Input) –

  • y (array-like, shape=[n_samples]) –

  • for every point (label) –

  • testratio (float, ratio of the test fraction) –

  • z (array-like, shape=[n_samples]) –

  • label for every point (additional) –

  • tags (array-like, strings, shape=[n_samples]) –

  • tags for each data point (additional) –

compute_fit(learner, tag=None, store_results=True, plot=True)[source]

Fit the design matrix X and the values y using a learner

Parameters
  • learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods

  • tag (str) – The name of this learner

compute_learning_curve(learner, tag=None, lc_points=8, lc_repeats=8, randomseed=42, verbose=True)[source]

Fit the learning curve using a learner

Parameters
  • lc_points (int) – the number of points on the learning curve

  • lc_repeats (int) – the number of sub-samples to take when compute the learning curve

  • learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods

  • tag (str) – The name of this learner

get_sparsified_matrix()[source]
save_state(filename, mode='yaml')[source]

output json or yaml file

sparsify(n_sparse=None, sparse_mode='fps')[source]

select representative data points using the design matrix

Parameters
  • n_sparse (int) – number of representative points n_sparse == None means 5% of the data n_sparse < 0 means no sparsification

  • sparse_mode (str) – Methods to use for sparsification [cur], [fps], [random]

asaplib.data.xyz module

ASAPXYZ class for handing atomic coordinate input and compute/output

class asaplib.data.xyz.ASAPXYZ(fxyz=None, stride=1, periodic=True, fileformat=None)[source]

Bases: object

extended xyz class

Parameters
  • fxyz (string_like) – the path to the extended xyz file

  • fmat (string_like) – the name of the descriptors in the extended xyz file

  • use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file

  • stride (int) – the stride when reading the xyz file

compute_atomic_descriptors(desc_spec_dict={}, sbs=[], tag=None, n_process=1)[source]

compute the atomic descriptors for selected frames :param desc_spec: contrains infos on the descriptors to use :type desc_spec: a list of dictionaries :param e.g.: :param .. code-block:: python: atomic_desc_dict = {

“firstsoap”: {“type”: ‘SOAP’,”species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}

Parameters

sbs (array, integer) – the index of the subset of structures to compute

compute_global_descriptors(desc_spec_dict={}, sbs=[], keep_atomic=False, tag=None, n_process=1)[source]

compute the atomic descriptors for selected frames :param desc_spec_dict: :type desc_spec_dict: dictionaries that specify which global descriptor to use. :param e.g.: :param .. code-block:: python:

{‘global_desc1’:

{“type”: ‘CM’}}

# or

{‘global_desc2’:
{‘atomic_descriptor’:

atomic_desc_dict,

‘reducer_function’:

reducer_dict

}}

atomic_desc_dict = {
“firstsoap”:

{“type”: ‘SOAP’, “species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}

reducer_dict = {‘first_reducer’:

{‘reducer_type’: reducer_type, ‘zeta’: zeta, ‘species’: species, ‘element_wise’: element_wise}}

Parameters

sbs (array, integer) – list of the indexes of the subset

fetch_computed_descriptors(desc_dict_keys=[], sbs=[])[source]

Fetch the computed descriptors for selected frames :param desc_spec_keys: for which computed descriptors to fetch. :type desc_spec_keys: a list (str-like) of keys :param sbs: :type sbs: array, integer

Returns

desc

Return type

np.matrix [n_frame, n_desc]

get_atomic_descriptors(desc_name_list=[], species_name=None)[source]

extract the descriptor array from each frame

Parameters
  • desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file

  • species_name (int) –

    the atomic number of the species selected.

    Only the desciptors of atoms of the specified specied will be returned.

    species_name=None means all atoms are selected.

Returns

atomic_desc

Return type

np.matrix

get_atomic_property(y_key=None, extensive=False, sbs=[], species_name=None)[source]

extract the property array from each atom

Parameters
  • y_key (string_like) – the name of the property in the extended xyz file

  • sbs (array, integer) –

  • specie (int) –

    the atomic number of the species selected.

    Only the properties of atoms of the specified specied will be returned. species_name=None means all atoms are selected.

Returns

y_all

Return type

array [N_atoms]

get_descriptors(desc_name_list=[], use_atomic_desc=False, species_name=None)[source]

extract the descriptor array from each frame

Parameters
  • desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file

  • use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file

  • species_name (int) – the atomic number of the species selected. Only the desciptors of atoms of the specified specied will be returned. species_name=None means all atoms are selected.

Returns

  • desc (np.matrix)

  • atomic_desc (np.matrix)

get_global_species()[source]
get_natom_list()[source]
get_natom_list_by_species(species_name=None)[source]
get_num_frames()[source]
get_property(y_key=None, extensive=False, sbs=[])[source]

extract specified property from selected frames

Parameters
  • y_key (string_like) – the name of the property in the extended xyz file

  • sbs (array, integer) –

Returns

y_all

Return type

array [N_samples]

get_total_natoms()[source]
get_xyz()[source]
load_properties(filename, header='infer', prefix='X', **kwargs)[source]

Load properties from a CSV file

Read in the CSV file and save the columns to the info dictionary of the frames.

Parameters
  • filename (str) – Name of the CSV file.

  • header (int) – Row number of the header. Defaults to use the first row unless explicit names for the columns are given

remove_atomic_descriptors(desc_name_list=[])[source]

remove the desciptors

remove_descriptors(desc_name_list=[])[source]

remove the desciptors

save_descriptor_acronym_state(filename, mode='yaml')[source]
save_state(filename, mode='yaml')[source]
set_atomic_descriptors(atomic_desc=None, atomic_desc_name=None, species_name=None)[source]

write the descriptor array to the atom object

Parameters

desc (np.matrix, shape=[n_descriptors, n_atoms]) –

set_descriptors(desc=None, desc_name=None)[source]

write the descriptor array to the atom object

Parameters

desc (np.matrix, shape=[n_descriptors, n_frames]) –

standardize(sbs=[], symprec=0.01)[source]

reduce to primitive cell

symmetrise(sbs=[], symprec=0.01)[source]
write(filename, sbs=[], save_acronym=False, wrap_output=True)[source]

write the selected frames or all the frames to a xyz file

Parameters
  • filename (str) –

  • sbs (array, integer) –

write_atomic_descriptor_matrix(filename, desc_name, sbs=[], comment='')[source]

write the selected descriptor matrix in a matrix format to file

Parameters
  • filename (str) –

  • desc_name (str) – Name of the properties/descriptors to write

  • sbs (array, integer) –

  • comment (str) –

write_chemiscope(filename, sbs=None, save_acronym=False, cutoff=None, wrap_output=True)[source]

write the selected frames or all the frames to ChemiScope JSON

Parameters
  • filename (str) –

  • sbs (array, integer) –

  • cutoff (float) – generate cutoff for atomic environments, set to None to disable atomic environments

write_computed_descriptors(filename, desc_dict_keys=[], sbs=[], comment='')[source]

write the computed descriptors for selected frames :param desc_spec_keys: a list (str-like) of keys for which computed descriptors to fetch. :type desc_spec_keys: list :param sbs: :type sbs: array, integer

Returns

desc

Return type

np.matrix [n_frame, n_desc]

write_descriptor_matrix(filename, desc_name_list, sbs=[], comment='')[source]

write the selected descriptor matrix in a matrix format to file

Parameters
  • filename (str) –

  • desc_name_list (a list of str.) – Name of the properties/descriptors to write

  • sbs (array, integer) –

  • comment (str) –

Module contents