asaplib.data package¶
Submodules¶
asaplib.data.design_matrix module¶
Class for storing and handling design matrices
-
class
asaplib.data.design_matrix.
Design_Matrix
(X=[], y=[], whiten=True, test_ratio=0, random_seed=42, z=[], tags=[])[source]¶ Bases:
object
extended design matrix class
- Parameters
X (array-like, shape=[n_samples,n_desc]) –
points. (Input) –
y (array-like, shape=[n_samples]) –
for every point (label) –
testratio (float, ratio of the test fraction) –
z (array-like, shape=[n_samples]) –
label for every point (additional) –
tags (array-like, strings, shape=[n_samples]) –
tags for each data point (additional) –
-
compute_fit
(learner, tag=None, store_results=True, plot=True)[source]¶ Fit the design matrix X and the values y using a learner
- Parameters
learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods
tag (str) – The name of this learner
-
compute_learning_curve
(learner, tag=None, lc_points=8, lc_repeats=8, randomseed=42, verbose=True)[source]¶ Fit the learning curve using a learner
- Parameters
lc_points (int) – the number of points on the learning curve
lc_repeats (int) – the number of sub-samples to take when compute the learning curve
learner (a learner object) – e.g. RidgeRegression needs to have .fit(), .predict(), .get_train_test_error(), .fit_predict_error() methods
tag (str) – The name of this learner
-
sparsify
(n_sparse=None, sparse_mode='fps')[source]¶ select representative data points using the design matrix
- Parameters
n_sparse (int) – number of representative points n_sparse == None means 5% of the data n_sparse < 0 means no sparsification
sparse_mode (str) – Methods to use for sparsification [cur], [fps], [random]
asaplib.data.xyz module¶
ASAPXYZ class for handing atomic coordinate input and compute/output
-
class
asaplib.data.xyz.
ASAPXYZ
(fxyz=None, stride=1, periodic=True, fileformat=None)[source]¶ Bases:
object
extended xyz class
- Parameters
fxyz (string_like) – the path to the extended xyz file
fmat (string_like) – the name of the descriptors in the extended xyz file
use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file
stride (int) – the stride when reading the xyz file
-
compute_atomic_descriptors
(desc_spec_dict={}, sbs=[], tag=None, n_process=1)[source]¶ compute the atomic descriptors for selected frames :param desc_spec: contrains infos on the descriptors to use :type desc_spec: a list of dictionaries :param e.g.: :param .. code-block:: python: atomic_desc_dict = {
“firstsoap”: {“type”: ‘SOAP’,”species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}
- Parameters
sbs (array, integer) – the index of the subset of structures to compute
-
compute_global_descriptors
(desc_spec_dict={}, sbs=[], keep_atomic=False, tag=None, n_process=1)[source]¶ compute the atomic descriptors for selected frames :param desc_spec_dict: :type desc_spec_dict: dictionaries that specify which global descriptor to use. :param e.g.: :param .. code-block:: python:
- {‘global_desc1’:
{“type”: ‘CM’}}
# or
- {‘global_desc2’:
- {‘atomic_descriptor’:
atomic_desc_dict,
- ‘reducer_function’:
reducer_dict
}}
- atomic_desc_dict = {
- “firstsoap”:
{“type”: ‘SOAP’, “species”: [1, 6, 7, 8], “cutoff”: 2.0, “atom_gaussian_width”: 0.2, “n”: 4, “l”: 4}}
- reducer_dict = {‘first_reducer’:
{‘reducer_type’: reducer_type, ‘zeta’: zeta, ‘species’: species, ‘element_wise’: element_wise}}
- Parameters
sbs (array, integer) – list of the indexes of the subset
-
fetch_computed_descriptors
(desc_dict_keys=[], sbs=[])[source]¶ Fetch the computed descriptors for selected frames :param desc_spec_keys: for which computed descriptors to fetch. :type desc_spec_keys: a list (str-like) of keys :param sbs: :type sbs: array, integer
- Returns
desc
- Return type
np.matrix [n_frame, n_desc]
-
get_atomic_descriptors
(desc_name_list=[], species_name=None)[source]¶ extract the descriptor array from each frame
- Parameters
desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file
species_name (int) –
- the atomic number of the species selected.
Only the desciptors of atoms of the specified specied will be returned.
species_name=None means all atoms are selected.
- Returns
atomic_desc
- Return type
np.matrix
-
get_atomic_property
(y_key=None, extensive=False, sbs=[], species_name=None)[source]¶ extract the property array from each atom
- Parameters
y_key (string_like) – the name of the property in the extended xyz file
sbs (array, integer) –
specie (int) –
- the atomic number of the species selected.
Only the properties of atoms of the specified specied will be returned. species_name=None means all atoms are selected.
- Returns
y_all
- Return type
array [N_atoms]
-
get_descriptors
(desc_name_list=[], use_atomic_desc=False, species_name=None)[source]¶ extract the descriptor array from each frame
- Parameters
desc_name_list (a list of strings) – the name of the .info[] in the extended xyz file
use_atomic_desc (bool) – return the descriptors for each atom, read from the xyz file
species_name (int) – the atomic number of the species selected. Only the desciptors of atoms of the specified specied will be returned. species_name=None means all atoms are selected.
- Returns
desc (np.matrix)
atomic_desc (np.matrix)
-
get_property
(y_key=None, extensive=False, sbs=[])[source]¶ extract specified property from selected frames
- Parameters
y_key (string_like) – the name of the property in the extended xyz file
sbs (array, integer) –
- Returns
y_all
- Return type
array [N_samples]
-
load_properties
(filename, header='infer', prefix='X', **kwargs)[source]¶ Load properties from a CSV file
Read in the CSV file and save the columns to the info dictionary of the frames.
- Parameters
filename (str) – Name of the CSV file.
header (int) – Row number of the header. Defaults to use the first row unless explicit names for the columns are given
-
set_atomic_descriptors
(atomic_desc=None, atomic_desc_name=None, species_name=None)[source]¶ write the descriptor array to the atom object
- Parameters
desc (np.matrix, shape=[n_descriptors, n_atoms]) –
-
set_descriptors
(desc=None, desc_name=None)[source]¶ write the descriptor array to the atom object
- Parameters
desc (np.matrix, shape=[n_descriptors, n_frames]) –
-
write
(filename, sbs=[], save_acronym=False, wrap_output=True)[source]¶ write the selected frames or all the frames to a xyz file
- Parameters
filename (str) –
sbs (array, integer) –
-
write_atomic_descriptor_matrix
(filename, desc_name, sbs=[], comment='')[source]¶ write the selected descriptor matrix in a matrix format to file
- Parameters
filename (str) –
desc_name (str) – Name of the properties/descriptors to write
sbs (array, integer) –
comment (str) –
-
write_chemiscope
(filename, sbs=None, save_acronym=False, cutoff=None, wrap_output=True)[source]¶ write the selected frames or all the frames to ChemiScope JSON
- Parameters
filename (str) –
sbs (array, integer) –
cutoff (float) – generate cutoff for atomic environments, set to None to disable atomic environments
-
write_computed_descriptors
(filename, desc_dict_keys=[], sbs=[], comment='')[source]¶ write the computed descriptors for selected frames :param desc_spec_keys: a list (str-like) of keys for which computed descriptors to fetch. :type desc_spec_keys: list :param sbs: :type sbs: array, integer
- Returns
desc
- Return type
np.matrix [n_frame, n_desc]