matminer.featurizers.site package

Subpackages

Submodules

matminer.featurizers.site.bonding module

Site featurizers based on bonding.

class matminer.featurizers.site.bonding.AverageBondAngle(method)

Bases: BaseFeaturizer

Determines the average bond angles of a specific site with its nearest neighbors using one of pymatgen’s NearNeighbor classes. Neighbors that are adjacent to each other are stored and angle between them are computed. ‘Average bond angle’ of a site is the mean bond angle between all its nearest neighbors.

__init__(method)

Initialize featurizer

Args:
method (NearNeighbor) - subclass under NearNeighbor used to compute nearest

neighbors

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)

Get average bond length of a site and all its nearest neighbors.

Args:

strc (Structure): Pymatgen Structure object idx (int): index of target site in structure object

Returns:

average bond length (list)

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.bonding.AverageBondLength(method)

Bases: BaseFeaturizer

Determines the average bond length between one specific site and all its nearest neighbors using one of pymatgen’s NearNeighbor classes. These nearest neighbor calculators return weights related to the proximity of each neighbor to this site. ‘Average bond length’ of a site is the weighted average of the distance between site and all its nearest neighbors.

__init__(method)

Initialize featurizer

Args:

method (NearNeighbor) - subclass under NearNeighbor used to compute nearest neighbors

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)

Get weighted average bond length of a site and all its nearest neighbors.

Args:

strc (Structure): Pymatgen Structure object idx (int): index of target site in structure object

Returns:

average bond length (list)

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.bonding.BondOrientationalParameter(max_l=10, compute_w=False, compute_w_hat=False)

Bases: BaseFeaturizer

Averages of spherical harmonics of local neighbors

Bond Orientational Parameters (BOPs) describe the local environment around an atom by considering the local symmetry of the bonds as computed using spherical harmonics. To create descriptors that are invariant to rotating the coordinate system, we use the average of all spherical harmonics of a certain degree - following the approach of Steinhardt et al.. We weigh the contributions of each neighbor with the solid angle of the Voronoi tessellation (see Mickel et al. <https://aip.scitation.org/doi/abs/10.1063/1.4774084>_ for further discussion). The weighing scheme makes these descriptors vary smoothly with small distortions of a crystal structure.

In addition to the average spherical harmonics, this class can also compute the W and \hat{W} parameters proposed by Steinhardt et al..

Attributes:

BOOP Q l=<n> - Average spherical harmonic for a certain degree, n. BOOP W l=<n> - W parameter for a certain degree of spherical harmonic, n. BOOP What l=<n> - \hat{W} parameter for a certain degree of spherical harmonic, n.

References:

Steinhardt et al., _PRB_ (1983) Seko et al., _PRB_ (2017)

__init__(max_l=10, compute_w=False, compute_w_hat=False)

Initialize the featurizer

Args:

max_l (int) - Maximum spherical harmonic to consider compute_w (bool) - Whether to compute Ws as well compute_w_hat (bool) - Whether to compute What

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.site.bonding.get_wigner_coeffs(l)

Get the list of non-zero Wigner 3j triplets Args:

l (int): Desired l

Returns:
List of tuples that contain:
  • ((int)) m coordinates of the triplet

  • (float) Wigner coefficient

matminer.featurizers.site.chemical module

Site featurizers based on local chemical information, rather than geometry alone.

class matminer.featurizers.site.chemical.ChemicalSRO(nn, includes=None, excludes=None, sort=True)

Bases: BaseFeaturizer

Chemical short range ordering, deviation of local site and nominal structure compositions

Chemical SRO features to evaluate the deviation of local chemistry with the nominal composition of the structure.

A local bonding preference is computed using f_el = N_el/(sum of N_el) - c_el, where N_el is the number of each element type in the neighbors around the target site, sum of N_el is the sum of all possible element types (coordination number), and c_el is the composition of the specific element in the entire structure. A positive f_el indicates the “bonding” with the specific element is favored, at least in the target site; A negative f_el indicates the “bonding” is not favored, at least in the target site.

Note that ChemicalSRO is only featurized for elements identified by “fit” (see following), thus “fit” must be called before “featurize”, or else an error will be raised.

Features:
CSRO__[nn method]_[element] - The Chemical SRO of a site computed based

on neighbors determined with a certain NN-detection method for a certain element.

__init__(nn, includes=None, excludes=None, sort=True)

Initialize the featurizer

Args:
nn (NearestNeighbor): instance of one of pymatgen’s NearestNeighbor

classes.

includes (array-like or str): elements included to calculate CSRO. excludes (array-like or str): elements excluded to calculate CSRO. sort (bool): whether to sort elements by mendeleev number.

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get CSRO features of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

(list of floats): Chemical SRO features for each element.

fit(X, y=None)

Identify elements to be included in the following featurization, by intersecting the elements present in the passed structures with those explicitly included (or excluded) in __init__. Only elements in the self.el_list_ will be featurized. Besides, compositions of the passed structures will also be “stored” in a dict of self.el_amt_dict_, avoiding repeated calculation of composition when featurizing multiple sites in the same structure. Args:

X (array-like): containing Pymatgen structures and sites, supports

multiple choices: -2D array-like object:

e.g. [[struct, site], [struct, site], …]

np.array([[struct, site], [struct, site], …])

-Pandas dataframe:

e.g. df[[‘struct’, ‘site’]]

y : unused (added for consistency with overridden method signature)

Returns:

self

static from_preset(preset, **kwargs)

Use one of the standard instances of a given NearNeighbor class. Args:

preset (str): preset type (“VoronoiNN”, “JmolNN”,

“MiniumDistanceNN”, “MinimumOKeeffeNN”, or “MinimumVIRENN”).

**kwargs: allow to pass args to the NearNeighbor class.

Returns:

ChemicalSRO from a preset.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.chemical.EwaldSiteEnergy(accuracy=None)

Bases: BaseFeaturizer

Compute site energy from Coulombic interactions

User notes:
  • This class uses that charges that are already-defined for the structure.

  • Ewald summations can be expensive. If you evaluating every site in many large structures, run all of the sites for each structure at the same time. We cache the Ewald result for the structure that was run last, so looping over sites and then structures is faster than structures than sites.

Features:

ewald_site_energy - Energy for the site computed from Coulombic interactions

__init__(accuracy=None)
Args:

accuracy (int): Accuracy of Ewald summation, number of decimal places

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)
Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

([float]) - Electrostatic energy of the site

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.chemical.LocalPropertyDifference(data_source=<matminer.utils.data.MagpieData object>, weight='area', properties=('Electronegativity', ), signed=False)

Bases: BaseFeaturizer

Differences in elemental properties between site and its neighboring sites.

Uses the Voronoi tessellation of the structure to determine the neighbors of the site, and assigns each neighbor (n) a weight (A_n) that corresponds to the area of the facet on the tessellation corresponding to that neighbor. The local property difference is then computed by \frac{\sum_n {A_n |p_n - p_0|}}{\sum_n {A_n}} where p_n is the property (e.g., atomic number) of a neighbor and p_0 is the property of a site. If signed parameter is assigned True, signed difference of the properties is returned instead of absolute difference.

Features:
  • “local property difference in [property]” - Weighted average

    of differences between an elemental property of a site and that of each of its neighbors, weighted by size of face on Voronoi tessellation

References:

Ward et al. _PRB_ 2017

__init__(data_source=<matminer.utils.data.MagpieData object>, weight='area', properties=('Electronegativity', ), signed=False)

Initialize the featurizer

Args:
data_source (AbstractData) - Class from which to retrieve

elemental properties

weight (str) - What aspect of each voronoi facet to use to

weigh each neighbor (see VoronoiNN)

properties ([str]) - List of properties to use (default=[‘Electronegativity’]) signed (bool) - whether to return absolute difference or signed difference of

properties(default=False (absolute difference))

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

static from_preset(preset)

Create a new LocalPropertyDifference class according to a preset

Args:

preset (str) - Name of preset

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.chemical.SiteElementalProperty(data_source=None, properties=('Number',))

Bases: BaseFeaturizer

Elemental properties of atom on a certain site

Features:

site [property] - Elemental property for this site

References:

Seko et al., _PRB_ (2017) Schmidt et al., _Chem Mater_. (2017)

__init__(data_source=None, properties=('Number',))

Initialize the featurizer

Args:

data_source (AbstractData): Tool used to look up elemental properties properties ([string]): List of properties to use for features

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(strc, idx)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

static from_preset(preset)

Create the class with pre-defined settings

Args:

preset (string): Desired preset

Returns:

SiteElementalProperty initialized with desired settings

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.site.external module

Site featurizers requiring external libraries for core functionality.

class matminer.featurizers.site.external.SOAP(rcut, nmax, lmax, sigma, periodic, rbf='gto', crossover=True, compression=None)

Bases: BaseFeaturizer

Smooth overlap of atomic positions (interface via DScribe).

Class for generating a partial power spectrum from Smooth Overlap of Atomic Orbitals (SOAP). This implementation uses real (tesseral) spherical harmonics as the angular basis set and provides two orthonormalized alternatives for the radial basis functions: spherical primitive gaussian type orbitals (“gto”) or the polynomial basis set (“polynomial”). By default the faster gto-basis is used. Please see the DScribe SOAP documentation for more details.

Note that SOAP is only featurized for elements identified by “fit” (see following), thus “fit” must be called before “featurize”, or else an error will be raised.

Based originally on the following publications:

“On representing chemical environments, Albert P. Bartók, Risi

Kondor, and Gábor Csányi, Phys. Rev. B 87, 184115, (2013), https://doi.org/10.1103/PhysRevB.87.184115

“Comparing molecules and solids across structural and alchemical

space”, Sandip De, Albert P. Bartók, Gábor Csányi and Michele Ceriotti, Phys. Chem. Chem. Phys. 18, 13754 (2016), https://doi.org/10.1039/c6cp00415f

Implementation (and some documentation) originally based on DScribe: https://github.com/SINGROUP/dscribe.

“DScribe: Library of descriptors for machine learning in materials science”,

Himanen, L., J{“a}ger, M. O.J., Morooka, E. V., Federici Canova, F., Ranawat, Y. S., Gao, D. Z., Rinke, P. and Foster, A. S. Computer Physics Communications, 106949 (2019), https://doi.org/10.1016/j.cpc.2019.106949

Args:
rcut (float): A cutoff for local region in angstroms. Should be

bigger than 1 angstrom.

nmax (int): The number of radial basis functions. lmax (int): The maximum degree of spherical harmonics. sigma (float): The standard deviation of the gaussians used to expand the

atomic density.

rbf (str): The radial basis functions to use. The available options are:

  • “gto”: Spherical gaussian type orbitals defined as g_{nl}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'l} r^l e^{-\alpha_{n'l}r^2}

  • “polynomial”: Polynomial basis defined as g_{n}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'} (r-r_\mathrm{cut})^{n'+2}

periodic (bool): Determines whether the system is considered to be

periodic.

crossover (bool): Determines if crossover of atomic types should

be included in the power spectrum. If enabled, the power spectrum is calculated over all unique species combinations Z and Z’. If disabled, the power spectrum does not contain cross-species information and is only run over each unique species Z. Turned on by default to correspond to the original definition. Deprecated in DScribe 2.1 and replace with general ‘compression’ parameter (see below).

compression (dict): Contains the options which specify the feature compression to apply.

See DScribe documentation for details.

__init__(rcut, nmax, lmax, sigma, periodic, rbf='gto', crossover=True, compression=None)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

fit(X, y=None)

Fit the SOAP featurizer to a dataframe.

Args:

X ([SiteCollection]): For example, a list of pymatgen Structures. y : unused (added for consistency with overridden method signature)

Returns:

self

classmethod from_preset(preset)

Create a SOAP featurizer object from sensible or published presets. Args:

preset (str): Choose from:
“formation energy”: Preset used for formation energy prediction

in the original Dscribe paper.

Returns:

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.site.fingerprint module

Site featurizers that fingerprint a site using local geometry.

class matminer.featurizers.site.fingerprint.AGNIFingerprints(directions=(None, 'x', 'y', 'z'), etas=None, cutoff=8)

Bases: BaseFeaturizer

Product integral of RDF and Gaussian window function, from Botu et al.

Integral of the product of the radial distribution function and a Gaussian window function. Originally used by Botu et al to fit empiricial potentials. These features come in two forms: atomic fingerprints and direction-resolved fingerprints. Atomic fingerprints describe the local environment of an atom and are computed using the function: A_i(\eta) = \sum\limits_{i \ne j} e^{-(\frac{r_{ij}}{\eta})^2} f(r_{ij}) where i is the index of the atom, j is the index of a neighboring atom, \eta is a scaling function, r_{ij} is the distance between atoms i and j, and f(r) is a cutoff function where f(r) = 0.5[\cos(\frac{\pi r_{ij}}{R_c}) + 1] if r < R_c and 0 otherwise. The direction-resolved fingerprints are computed using V_i^k(\eta) = \sum\limits_{i \ne j} \frac{r_{ij}^k}{r_{ij}} e^{-(\frac{r_{ij}}{\eta})^2} f(r_{ij}) where r_{ij}^k is the k^{th} component of \bold{r}_i - \bold{r}_j. Parameters: TODO: Differentiate between different atom types (maybe as another class)

__init__(directions=(None, 'x', 'y', 'z'), etas=None, cutoff=8)
Args:
directions (iterable): List of directions for the fingerprints. Can

be one or more of ‘None`, ‘x’, ‘y’, or ‘z’

etas (iterable of floats): List of which window widths to compute cutoff (float): Cutoff distance (Angstroms)

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.fingerprint.ChemEnvSiteFingerprint(cetypes, strategy, geom_finder, max_csm=8, max_dist_fac=1.41)

Bases: BaseFeaturizer

Resemblance of given sites to ideal environments

Site fingerprint computed from pymatgen’s ChemEnv package that provides resemblance percentages of a given site to ideal environments. Args:

cetypes ([str]): chemical environments (CEs) to be

considered.

strategy (ChemenvStrategy): ChemEnv neighbor-finding strategy. geom_finder (LocalGeometryFinder): ChemEnv local geometry finder. max_csm (float): maximum continuous symmetry measure (CSM;

default of 8 taken from chemenv). Note that any CSM larger than max_csm will be set to max_csm in order to avoid negative values (i.e., all features are constrained to be between 0 and 1).

max_dist_fac (float): maximum distance factor (default: 1.41).

__init__(cetypes, strategy, geom_finder, max_csm=8, max_dist_fac=1.41)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get ChemEnv fingerprint of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure struct.

Returns:
(numpy array): resemblance fraction of target site to ideal

local environments.

static from_preset(preset)

Use a standard collection of CE types and choose your ChemEnv neighbor-finding strategy. Args:

preset (str): preset types (“simple” or

“multi_weights”).

Returns:

ChemEnvSiteFingerprint object from a preset.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.fingerprint.CrystalNNFingerprint(op_types, chem_info=None, **kwargs)

Bases: BaseFeaturizer

A local order parameter fingerprint for periodic crystals.

The fingerprint represents the value of various order parameters for the site. The “wt” order parameter describes how consistent a site is with a certain coordination number. The remaining order parameters are computed by multiplying the “wt” for that coordination number with the OP value.

The chem_info parameter can be used to also get chemical descriptors that describe differences in some chemical parameter (e.g., electronegativity) between the central site and the site neighbors.

__init__(op_types, chem_info=None, **kwargs)

Initialize the CrystalNNFingerprint. Use the from_preset() function to use default params. Args:

op_types (dict): a dict of coordination number (int) to a list of str

representing the order parameter types

chem_info (dict): a dict of chemical properties (e.g., atomic mass)

to dictionaries that map an element to a value (e.g., chem_info[“Pauling scale”][“O”] = 3.44)

**kwargs: other settings to be passed into CrystalNN class

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get crystal fingerprint of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

list of weighted order parameters of target site.

static from_preset(preset: Literal['cn', 'ops'], **kwargs)

Use preset parameters to get the fingerprint Args:

preset (‘cn’ | ‘ops’): Initializes the featurizer to use coordination number (‘cn’) or structural

order parameters like octahedral, tetrahedral (‘ops’).

**kwargs: other settings to be passed into CrystalNN class

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.fingerprint.OPSiteFingerprint(target_motifs=None, dr=0.1, ddr=0.01, ndr=1, dop=0.001, dist_exp=2, zero_ops=True)

Bases: BaseFeaturizer

Local structure order parameters computed from a site’s neighbor env.

For each order parameter, we determine the neighbor shell that complies with the expected coordination number. For example, we find the 4 nearest neighbors for the tetrahedral OP, the 6 nearest for the octahedral OP, and the 8 nearest neighbors for the bcc OP. If we don’t find such a shell, the OP is either set to zero or evaluated with the shell of the next largest observed coordination number. Args:

target_motifs (dict): target op or motif type where keys

are corresponding coordination numbers (e.g., {4: “tetrahedral”}).

dr (float): width for binning neighbors in unit of relative

distances (= distance/nearest neighbor distance). The binning is necessary to make the neighbor-finding step robust against small numerical variations in neighbor distances (default: 0.1).

ddr (float): variation of width for finding stable OP values. ndr (int): number of width variations for each variation direction

(e.g., ndr = 0 only uses the input dr, whereas ndr=1 tests dr = dr - ddr, dr, and dr + ddr.

dop (float): binning width to compute histogram for each OP

if ndr > 0.

dist_exp (boolean): exponent for distance factor to multiply

order parameters with that penalizes (large) variations in distances in a given motif. 0 will switch the option off (default: 2).

zero_ops (boolean): set an OP to zero if there is no neighbor

shell that complies with the expected coordination number of a given OP (e.g., CN=4 for tetrahedron; default: True).

__init__(target_motifs=None, dr=0.1, ddr=0.01, ndr=1, dop=0.001, dist_exp=2, zero_ops=True)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get OP fingerprint of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

opvals (numpy array): order parameters of target site.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.fingerprint.VoronoiFingerprint(cutoff=6.5, use_symm_weights=False, symm_weights='solid_angle', stats_vol=None, stats_area=None, stats_dist=None)

Bases: BaseFeaturizer

Voronoi tessellation-based features around target site.

Calculate the following sets of features based on Voronoi tessellation analysis around the target site: Voronoi indices

n_i denotes the number of i-edged facets, and i is in the range of 3-10. e.g. for bcc lattice, the Voronoi indices are [0,6,0,8,…]; for fcc/hcp lattice, the Voronoi indices are [0,12,0,0,…]; for icosahedra, the Voronoi indices are [0,0,12,0,…];

i-fold symmetry indices

computed as n_i/sum(n_i), and i is in the range of 3-10. reflect the strength of i-fold symmetry in local sites. e.g. for bcc lattice, the i-fold symmetry indices are [0,6/14,0,8/14,…]

indicating both 4-fold and a stronger 6-fold symmetries are present;

for fcc/hcp lattice, the i-fold symmetry factors are [0,1,0,0,…],

indicating only 4-fold symmetry is present;

for icosahedra, the Voronoi indices are [0,0,1,0,…],

indicating only 5-fold symmetry is present;

Weighted i-fold symmetry indices

if use_weights = True

Voronoi volume

total volume of the Voronoi polyhedron around the target site

Voronoi volume statistics of sub_polyhedra formed by each facet + center

stats_vol = [‘mean’, ‘std_dev’, ‘minimum’, ‘maximum’]

Voronoi area

total area of the Voronoi polyhedron around the target site

Voronoi area statistics of the facets

stats_area = [‘mean’, ‘std_dev’, ‘minimum’, ‘maximum’]

Voronoi nearest-neighboring distance statistics

stats_dist = [‘mean’, ‘std_dev’, ‘minimum’, ‘maximum’]

Args:
cutoff (float): cutoff distance in determining the potential

neighbors for Voronoi tessellation analysis. (default: 6.5)

use_symm_weights(bool): whether to use weights to derive weighted

i-fold symmetry indices.

symm_weights(str): weights to be used in weighted i-fold symmetry

indices. Supported options: ‘solid_angle’, ‘area’, ‘volume’, ‘face_dist’. (default: ‘solid_angle’)

stats_vol (list of str): volume statistics types. stats_area (list of str): area statistics types. stats_dist (list of str): neighboring distance statistics types.

__init__(cutoff=6.5, use_symm_weights=False, symm_weights='solid_angle', stats_vol=None, stats_area=None, stats_dist=None)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get Voronoi fingerprints of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:
(list of floats): Voronoi fingerprints.

-Voronoi indices -i-fold symmetry indices -weighted i-fold symmetry indices (if use_symm_weights = True) -Voronoi volume -Voronoi volume statistics -Voronoi area -Voronoi area statistics -Voronoi dist statistics

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.site.fingerprint.load_cn_motif_op_params()

Load the file for the local env motif parameters into a dictionary.

Returns:

(dict)

matminer.featurizers.site.fingerprint.load_cn_target_motif_op()

Load the file fpor the

Returns:

(dict)

matminer.featurizers.site.misc module

Miscellaneous site featurizers.

class matminer.featurizers.site.misc.CoordinationNumber(nn=None, use_weights='none')

Bases: BaseFeaturizer

Number of first nearest neighbors of a site.

Determines the number of nearest neighbors of a site using one of pymatgen’s NearNeighbor classes. These nearest neighbor calculators can return weights related to the proximity of each neighbor to this site. It is possible to take these weights into account to prevent the coordination number from changing discontinuously with small perturbations of a structure, either by summing the total weights or using the normalization method presented by [Ward et al.](http://link.aps.org/doi/10.1103/PhysRevB.96.014107)

Features:
CN_[method] - Coordination number computed using a certain method

for calculating nearest neighbors.

__init__(nn=None, use_weights='none')

Initialize the featurizer

Args:

nn (NearestNeighbor) - Method used to determine coordination number use_weights (string) - Method used to account for weights of neighbors:

‘none’ - Do not use weights when computing coordination number ‘sum’ - Use sum of weights as the coordination number ‘effective’ - Compute the ‘effective coordination number’, which

is computed as \frac{(\sum_n w_n)^2)}{\sum_n w_n^2}

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get coordintion number of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure struct.

Returns:

[float] - Coordination number

static from_preset(preset, **kwargs)

Use one of the standard instances of a given NearNeighbor class. Args:

preset (str): preset type (“VoronoiNN”, “JmolNN”,

“MiniumDistanceNN”, “MinimumOKeeffeNN”, or “MinimumVIRENN”).

**kwargs: allow to pass args to the NearNeighbor class.

Returns:

CoordinationNumber from a preset.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.misc.IntersticeDistribution(cutoff=6.5, interstice_types=None, stats=None, radius_type='MiracleRadius')

Bases: BaseFeaturizer

Interstice distribution in the neighboring cluster around an atom site.

The interstices are categorized to distance, area and volume interstices. Each of these metrics is a measures of the relative amount of empty space around each atom as determined using atomic sphere models. The distance interstice is the fraction of a bonding line unoccupied by the atom spheres; The area interstice is the unoccupied area within the triangulated surface formed by atom triplets in convex hull formed by neighbors, and the volume interstice is the unoccupied portion of a tetrahedron formed between the central atom and neighbor atom triplets. Please refer to the original paper for more details (Wang et al. Nat Commun 10, 5537 (2019))

For amorphous alloys (metallic glasses), the coordination environments are anisotropic, which can be reflected in the inequality of the interstices present around an atom. To describe the anisotropy, here we derive statistics of the interstices to featurize the interstice distribution around the atom. Other methods can be grouping the interstices into histogram grids of fixed bins and the features are then a vector of the values of the histograms.

User note: This class is particularly designed for featuring the site-specific packing heterogeneity in metallic glasses, especially the all-metallic-element ones. If non-metallic-elements are present in the structures, the interstice estimates may have larger deviation from actual values (despite this deviation is systematic and thus the interstice estimates can still be used to represent the packing heterogeneity).

Args:
cutoff (float): cutoff distance in determining the potential

neighbors for Voronoi tessellation analysis. (default: 6.5)

interstice_types (str or [str]): interstice distribution types,

support sub-list of [‘dist’, ‘area’, ‘vol’].

stats ([str]): statistics of distance/area/volume interstices. radius_type (str): source of radius estimate. (default: “MiracleRadius”)

__init__(cutoff=6.5, interstice_types=None, stats=None, radius_type='MiracleRadius')
static analyze_area_interstice(nn_coords, nn_rs, convex_hull_simplices)

Analyze the area interstices in the neighbor convex hull facets. Args:

nn_coords (array-like, shape (N, 3)): Nearest Neighbors’ coordinates nn_rs ([float]): Nearest Neighbors’ radii. convex_hull_simplices (array-like, shape (M, 3)): Indices of points

forming the simplicial facets of convex hull.

Returns:

area_interstice_list ([float]): Area interstice list.

static analyze_dist_interstices(center_r, nn_rs, nn_dists)

Analyze the distance interstices between center atom and neighbors. Args:

center_r (float): central atom’s radius. nn_rs ([float]): Nearest Neighbors’ radii. nn_dists ([float]): Nearest Neighbors’ distances.

Returns:

dist_interstice_list ([float]): Distance interstice list.

static analyze_vol_interstice(center_coords, nn_coords, center_r, nn_rs, convex_hull_simplices)

Analyze the volume interstices in the tetrahedra formed by center atom and neighbor convex hull triplets. Args:

center_coords ([float]): Central atomic coordinates. nn_coords (array-like, shape (N, 3)): Nearest Neighbors’ coordinates center_r (float): central atom’s radius. nn_rs ([float]): Nearest Neighbors’ radii. convex_hull_simplices (array-like, shape (M, 3)): Indices of points

forming the simplicial facets of convex hull.

Returns:

volume_interstice_list ([float]): Volume interstice list.

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get interstice distribution fingerprints of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

interstice_fps ([float]): Interstice distribution fingerprints.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.site.rdf module

Site featurizers based on distribution functions.

class matminer.featurizers.site.rdf.AngularFourierSeries(bins, cutoff=10.0)

Bases: BaseFeaturizer

Compute the angular Fourier series (AFS), including both angular and radial info

The AFS is the product of pairwise distance function (g_n, g_n’) between two pairs of atoms (sharing the common central site) and the cosine of the angle between the two pairs. The AFS is a 2-dimensional feature (the axes are g_n, g_n’).

Examples of distance functionals are square functions, Gaussian, trig functions, and Bessel functions. An example for Gaussian:

lambda d: exp( -(d - d_n)**2 ), where d_n is the coefficient for g_n

See grdf() for a full list of available binning functions.

There are two preset conditions:

gaussian: bin functions are gaussians histogram: bin functions are rectangular functions

Features:

AFS ([gn], [gn’]) - Angular Fourier Series between binning functions (g1 and g2)

Args:
bins: ([AbstractPairwise]) a list of binning functions that

implement the AbstractPairwise base class

cutoff: (float) maximum distance to look for neighbors. The

featurizer will run slowly for large distance cutoffs because of the number of neighbor pairs scales as the square of the number of neighbors

__init__(bins, cutoff=10.0)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get AFS of the input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure struct.

Returns:
Flattened list of AFS values. the list order is:

g_n g_n’

static from_preset(preset, width=0.5, spacing=0.5, cutoff=10)
Preset bin functions for this featurizer. Example use:
>>> AFS = AngularFourierSeries.from_preset('gaussian')
>>> AFS.featurize(struct, idx)
Args:

preset (str): shape of bin (either ‘gaussian’ or ‘histogram’) width (float): bin width. std dev for gaussian, width for histogram spacing (float): the spacing between bin centers cutoff (float): maximum distance to look for neighbors

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.rdf.GaussianSymmFunc(etas_g2=None, etas_g4=None, zetas_g4=None, gammas_g4=None, cutoff=6.5)

Bases: BaseFeaturizer

Gaussian symmetry function features suggested by Behler et al.

The function is based on pair distances and angles, to approximate the functional dependence of local energies, originally used in the fitting of machine-learning potentials. The symmetry functions can be divided to a set of radial functions (g2 function), and a set of angular functions (g4 function). The number of symmetry functions returned are based on parameters of etas_g2, etas_g4, zetas_g4 and gammas_g4. See the original papers for more details: “Atom-centered symmetry functions for constructing high-dimensional neural network potentials”, J Behler, J Chem Phys 134, 074106 (2011). The cutoff function is taken as the polynomial form (cosine_cutoff) to give a smoothed truncation. A Fortran and a different Python version can be found in the code Amp: Atomistic Machine-learning Package (https://bitbucket.org/andrewpeterson/amp). Args:

etas_g2 (list of floats): etas used in radial functions.

(default: [0.05, 4., 20., 80.])

etas_g4 (list of floats): etas used in angular functions.

(default: [0.005])

zetas_g4 (list of floats): zetas used in angular functions.

(default: [1., 4.])

gammas_g4 (list of floats): gammas used in angular functions.

(default: [+1., -1.])

cutoff (float): cutoff distance. (default: 6.5)

__init__(etas_g2=None, etas_g4=None, zetas_g4=None, gammas_g4=None, cutoff=6.5)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

static cosine_cutoff(rs, cutoff)

Polynomial cutoff function to give a smoothed truncation of the Gaussian symmetry functions. Args:

rs (ndarray): distances to elements cutoff (float): cutoff distance.

Returns:

(ndarray) cutoff function.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get Gaussian symmetry function features of site with given index in input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure.

Returns:

(list of floats): Gaussian symmetry function features.

static g2(eta, rs, cutoff)

Gaussian radial symmetry function of the center atom, given an eta parameter. Args:

eta: radial function parameter. rs: distances from the central atom to each neighbor cutoff (float): cutoff distance.

Returns:

(float) Gaussian radial symmetry function.

static g4(etas, zetas, gammas, neigh_dist, neigh_coords, cutoff)

Gaussian angular symmetry function of the center atom, given a set of eta, zeta and gamma parameters. Args:

etas ([float]): angular function parameters. zetas ([float]): angular function parameters. gammas ([float]): angular function parameters. neigh_dist (list of [floats]): coordinates of neighboring atoms, with respect

to the central atom

cutoff (float): cutoff parameter.

Returns:

(float) Gaussian angular symmetry function for all combinations of eta, zeta, gamma

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.site.rdf.GeneralizedRadialDistributionFunction(bins, cutoff=20.0, mode='GRDF')

Bases: BaseFeaturizer

Compute the general radial distribution function (GRDF) for a site.

The GRDF is a radial measure of crystal order around a site. There are two featurizing modes:

  1. GRDF: (recommended) - n_bins length vector

    In GRDF mode, The GRDF is computed by considering all sites around a central site (i.e., no sites are omitted when computing the GRDF). The features output from this mode will be vectors with length n_bins.

  2. pairwise GRDF: (advanced users) - n_bins x n_sites matrix

    In this mode, GRDFs are still computed around a central site, but only one other site (and their translational equivalents) are used to compute a GRDF (e.g. site 1 with site 2 and the translational equivalents of site 2). This results in a n_sites x n_bins matrix of features. Requires fit for determining the max number of sites for

The GRDF is a generalization of the partial radial distribution function (PRDF). In contrast with the PRDF, the bins of the GRDF are not mutually- exclusive and need not carry a constant weight of 1. The PRDF is a case of the GRDF when the bins are rectangular functions. Examples of other functions to use with the GRDF are Gaussian, trig, and Bessel functions.

See grdf() for a full list of available binning functions.

There are two preset conditions:

gaussian: bin functions are gaussians histogram: bin functions are rectangular functions

Args:
bins: ([AbstractPairwise]) List of pairwise binning functions. Each of these functions

must implement the AbstractPairwise class.

cutoff: (float) maximum distance to look for neighbors mode: (str) the featurizing mode. supported options are:

‘GRDF’ and ‘pairwise_GRDF’

__init__(bins, cutoff=20.0, mode='GRDF')
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(struct, idx)

Get GRDF of the input structure. Args:

struct (Structure): Pymatgen Structure object. idx (int): index of target site in structure struct.

Returns:
Flattened list of GRDF values. For each run mode the list order is:

GRDF: bin# pairwise GRDF: site2# bin#

The site2# corresponds to a pymatgen site index and bin# corresponds to one of the bin functions

fit(X, y=None, **fit_kwargs)

Determine the maximum number of sites in X to assign correct feature labels

Args:
X - [list of tuples], training data

tuple values should be (struct, idx)

Returns:

self

static from_preset(preset, width=1.0, spacing=1.0, cutoff=10, mode='GRDF')
Preset bin functions for this featurizer. Example use:
>>> GRDF = GeneralizedRadialDistributionFunction.from_preset('gaussian')
>>> GRDF.featurize(struct, idx)
Args:

preset (str): shape of bin (either ‘gaussian’ or ‘histogram’) width (float): bin width. std dev for gaussian, width for histogram spacing (float): the spacing between bin centers cutoff (float): maximum distance to look for neighbors mode (str): featurizing mode. either ‘GRDF’ or ‘pairwise_GRDF’

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Module contents