matminer.featurizers.structure package¶

Subpackages¶

matminer.featurizers.structure.tests package

Submodules¶

matminer.featurizers.structure.bonding module¶

Structure featurizers based on bonding.

class matminer.featurizers.structure.bonding.BagofBonds(coulomb_matrix=SineCoulombMatrix(flatten=False), token=' - ')¶

Bases: BaseFeaturizer

Compute a Bag of Bonds vector, as first described by Hansen et al. (2015).

The Bag of Bonds approach is based creating an even-length vector from a Coulomb matrix output. Practically, it represents the Coloumbic interactions between each possible set of sites in a structure as a vector.

BagofBonds must be fit to an iterable of structures using the “fit” method before featurization can occur. This is because the bags and the maximum lengths of each bag must be set prior to featurization. We recommend fitting and featurizing on the same data to maintain consistency between generated feature sets. This can be done using the fit_transform method (for lists of structures) or the fit_featurize_dataframe method (for dataframes).

BagofBonds is based on a method by Hansen et. al “Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space” (2015).

Args:

coulomb_matrix (BaseFeaturizer): A featurizer object containing a: “featurize” method which returns a matrix of size nsites x nsites. Good choices are CoulombMatrix() or SineCoulombMatrix(), with the flatten=False parameter set.
token (str): The string used to separate species in a bond, including: spaces. The token must contain at least one space and cannot have alphabetic characters in it, and should be padded by spaces. For example, for the bond Cs+ - Cl-, the token is ‘ - ‘. This determines how bonds are represented in the dataframe.

__init__(coulomb_matrix=SineCoulombMatrix(flatten=False), token=' - ')¶

bag(s, return_baglens=False)¶

Convert a structure into a bag of bonds, where each bag has no padded zeros. using this function will give the ‘raw’ bags, which when concatenated, will have different lengths.

Args:

s (Structure): A pymatgen Structure or IStructure object. May also: work with a
return_baglens (bool): If True, returns the bag of bonds with as: a dictionary with the number of bonds as values in place of the vectors of coulomb matrix vals. If False, calculates Coulomb matrix values and returns ‘raw’ bags.

Returns:

(dict) A bag of bonds, where the keys are sorted tuples of pymatgen: Site objects representing bonds or sites, and the values are the Coulomb matrix values for that bag.

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Featurizes a structure according to the bag of bonds method. Specifically, each structure is first bagged by flattening the Coulomb matrix for the structure. Then, it is zero-padded according to the maximum number of bonds in each bag, for the set of bags that BagofBonds was fit with.

Args:: s (Structure): A pymatgen structure object
Returns:: (list): The Bag of Bonds vector for the input structure

fit(X, y=None)¶

Define the bags using a list of structures.

Both the names of the bags (e.g., Cs-Cl) and the maximum lengths of the bags are set with fit.

Args:

X (Series/list): An iterable of pymatgen Structure: objects which will be used to determine the allowed bond types and bag lengths.

y : unused (added for consistency with overridden method signature)

Returns:

self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.bonding.BondFractions(nn=CrystalNN(), bbv=0, no_oxi=False, approx_bonds=False, token=' - ', allowed_bonds=None)¶

Bases: BaseFeaturizer

Compute the fraction of each bond in a structure, based on NearestNeighbors.

For example, in a structure with 2 Li-O bonds and 3 Li-P bonds:

Li-0: 0.4 Li-P: 0.6

Features:

BondFractions must be fit with iterable of structures before featurization in order to define the allowed bond types (features). To do this, pass a list of allowed_bonds. Otherwise, fit based on a list of structures. If allowed_bonds is defined and BondFractions is also fit, the intersection of the two lists of possible bonds is used.

For dataframes containing structures of various compositions, a unified dataframe is returned which has the collection of all possible bond types gathered from all structures as columns. To approximate bonds based on chemical rules (ie, for a structure which you’d like to featurize but has bonds not in the allowed set), use approx_bonds = True.

BondFractions is based on the “sum over bonds” in the Bag of Bonds approach, based on a method by Hansen et. al “Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space” (2015).

Args:

nn (NearestNeighbors): A Pymatgen nearest neighbors derived object. For: example, pymatgen.analysis.local_env.VoronoiNN().
bbv (float): The ‘bad bond values’, values substituted for: structure-bond combinations which can not physically exist, but exist in the unified dataframe. For example, if a dataframe contains structures of BaLiP and BaTiO3, determines the value to place in the Li-P column for the BaTiO3 row; by default, is 0.
no_oxi (bool): If True, the featurizer will be agnostic to oxidation: states, which prevents oxidation states from differentiating bonds. For example, if True, Ca - O is identical to Ca2+ - O2-, Ca3+ - O-, etc., and all of them will be included in Ca - O column.
approx_bonds (bool): If True, approximates the fractions of bonds not: in allowed_bonds (forbidden bonds) with similar allowed bonds. Chemical rules are used to determine which bonds are most ‘similar’; particularly, the Euclidean distance between the 2-tuples of the bonds in Mendeleev no. space is minimized for the approximate bond chosen.
token (str): The string used to separate species in a bond, including: spaces. The token must contain at least one space and cannot have alphabetic characters in it, and should be padded by spaces. For example, for the bond Cs+ - Cl-, the token is ‘ - ‘. This determines how bonds are represented in the dataframe.
allowed_bonds ([str]): A listlike object containing bond types as: strings. For example, Cs - Cl, or Li+ - O2-. Ions and elements will still have distinct bonds if (1) the bonds list originally contained them and (2) no_oxi is False. These must match the token specified.

__init__(nn=CrystalNN(), bbv=0, no_oxi=False, approx_bonds=False, token=' - ', allowed_bonds=None)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

enumerate_all_bonds(structures)¶

Identify all the unique, possible bonds types of all structures present, and create the ‘unified’ bonds list.

Args:: structures (list/ndarray): List of pymatgen Structures
Returns:: A tuple of unique, possible bond types for an entire list of structures. This tuple is used to form the unified feature labels.

enumerate_bonds(s)¶

Lists out all the bond possibilities in a single structure.

Args:: s (Structure): A pymatgen structure
Returns:: A list of bond types in ‘Li-O’ form, where the order of the elements in each bond type is alphabetic.

feature_labels()¶: Returns the list of allowed bonds. Throws an error if the featurizer has not been fit.

featurize(s)¶

Quantify the fractions of each bond type in a structure.

For collections of structures, bonds types which are not found in a particular structure (e.g., Li-P in BaTiO3) are represented as NaN.

Args:

s (Structure): A pymatgen Structure object

Returns:

(list) The feature list of bond fractions, in the order of the: alphabetized corresponding bond names.

fit(X, y=None)¶

Define the bond types allowed to be returned during each featurization. Bonds found during featurization which are not allowed will be omitted from the returned dataframe or matrix.

Fit BondFractions by either passing an iterable of structures to training_data or by defining the bonds explicitly with allowed_bonds in __init__.

Args:

X (Series/list): An iterable of pymatgen Structure: objects which will be used to determine the allowed bond types.

y : unused (added for consistency with overridden method signature)

Returns:

self

static from_preset(preset, **kwargs)¶

Use one of the standard instances of a given NearNeighbor class. Pass args to __init__, such as allowed_bonds, using this method as well.

Args:: preset (str): preset type (“CrystalNN”, “VoronoiNN”, “JmolNN”, “MiniumDistanceNN”, “MinimumOKeeffeNN”, or “MinimumVIRENN”).
Returns:: CoordinationNumber from a preset.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.bonding.GlobalInstabilityIndex(r_cut=4.0, disordered_pymatgen=False)¶

Bases: BaseFeaturizer

The global instability index of a structure.

The default is to use IUCr 2016 bond valence parameters for computing bond valence sums. If the structure has disordered site occupancies or non-integer valences on sites, pymatgen’s bond valence sum method can be used instead.

Note that pymatgen’s bond valence sum method is prone to error unless the correct scale factor is supplied. A scale factor based on testing with perovskites is used here. TODO: Use scipy to optimize scale factor for minimizing GII

Based on the following publication:

‘Structural characterization of R2BaCuO5 (R = Y, Lu, Yb, Tm, Er, Ho,

Dy, Gd, Eu and Sm) oxides by X-ray and neutron diffraction’, A.Salinas-Sanchez, J.L.Garcia-Muñoz, J.Rodriguez-Carvajal, R.Saez-Puche, and J.L.Martinez, Journal of Solid State Chemistry, 100, 201-211 (1992), https://doi.org/10.1016/0022-4596(92)90094-C

Args:

r_cut: Float, how far to search for neighbors when computing bond valences disordered_pymatgen: Boolean, whether to fall back on pymatgen’s bond

valence sum method for disordered structures

Features:

The global instability index is the square root of the sum of squared: differences of the bond valence sums from the formal valences averaged over all atoms in the unit cell.

__init__(r_cut=4.0, disordered_pymatgen=False)¶

calc_bv_sum(site_val, site_el, neighbor_list)¶: Computes bond valence sum for site. Args:

site_val (Integer): valence of site site_el (String): element name neighbor_list (List): List of neighboring sites and their distances

calc_gii_iucr(s)¶

Computes global instability index using tabulated bv params.

Args:: s: Pymatgen Structure object
Returns:: gii: Float, the global instability index

calc_gii_pymatgen(struct, scale_factor=0.965)¶

Calculates global instability index using Pymatgen’s bond valence sum Args:

struct: Pymatgen Structure object scale_factor: Float, tunable scale factor for bond valence

Returns:: gii: Float, global instability index

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

static compute_bv(params, dist)¶

Compute bond valence from parameters. Args:

params: Dataframe with Ro and B parameters dist: Float, distance to neighboring atom

Returns:: bv: Float, bond valence

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(struct)¶

Get global instability index.

Args:: struct: Pymatgen Structure object
Returns:: [gii]: Length 1 list with float value

get_bv_params(cation, anion, cat_val, an_val)¶

Lookup bond valence parameters from IUPAC table. Args:

cation: String, cation element anion: String, anion element cat_val: Integer, cation formal valence an_val: Integer, anion formal valence

Returns:: bond_val_list: dataframe of bond valence parameters

get_equiv_sites(s, site)¶: Find identical sites from analyzing space group symmetry.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(struct)¶

Bond valence methods require atom pairs with oxidation states.

Additionally, check if at least the first and last site’s species have a entry in the bond valence parameters.

Args:: struct: Pymatgen Structure

class matminer.featurizers.structure.bonding.MinimumRelativeDistances(cutoff=10.0, flatten=True, include_distances=True, include_species=True)¶

Bases: BaseFeaturizer

Determines the relative distance of each site to its closest neighbor.

We use the relative distance, f_ij = r_ij / (r^atom_i + r^atom_j), as a measure rather than the absolute distances, r_ij, to account for the fact that different atoms/species have different sizes. The function uses the valence-ionic radius estimator implemented in Pymatgen.

The features can be flattened so a uniform-length vector is returned for each material, regardless of the number of sites in each structure. Returning flat output REQUIRES fitting (using self.fit(…)). If fit, structures having fewer sites than the max sites among the fitting structures are extended with NaNs; structures with more sites are truncated. To return non-flat (i.e., requiring further processing) features so that no features are NaN and no distances are truncated, use flatten=False.

Features:

If using flatten=True: site #{number} min. rel. dist. (float): The minimum relative distance of

site {number}

site #{number} specie (str): The string representing the specie at site: {number}
site #{number} neighbor specie(s) (str, tuple(str)): The neighbor specie: used to determine the minimum relative distance with respect to site {number}. If multiple neighbor sites have equivalent minimum relative distances,all these sites are listed in a tuple.

If using flatten=False: minimum relative distance of each site ([float]): List of the minimum

relative distance for each site. Structures with different numbers of sites will return a different length vector.

Args:

cutoff (float): (absolute) distance up to which tentative closest: neighbors (on the basis of relative distances) are to be determined.
flatten (bool): If True, returns a uniform length feature vector for: each structure regardless of the number of sites in the structure. If True, you must call .fit() before featurizing.
include_distances (bool): Include the numerical minimum relative: distance in the returned features. Only used if flatten=True.
include_species (bool): Include the species for each site and the: species of the neighbor (as determined by minimum rel. distance). Only used as flatten=True.

__init__(cutoff=10.0, flatten=True, include_distances=True, include_species=True)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get minimum relative distances of all sites of the input structure.

Args:

s: Pymatgen Structure object.

Returns:

dists_relative_min: (list of floats) list of all minimum relative: distances (i.e., for all sites).

fit(X, y=None)¶

Fit the MRD featurizer to a list of structures. Args:

X ([Structure]): A list of pymatgen structures. y : unused (added for consistency with overridden method signature)

Returns:: self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.bonding.StructuralHeterogeneity(weight='area', stats=('minimum', 'maximum', 'range', 'mean', 'avg_dev'))¶

Bases: BaseFeaturizer

Variance in the bond lengths and atomic volumes in a structure

These features are based on several statistics derived from the Voronoi tessellation of a structure. The first set of features relate to the variance in the average bond length across all atoms in the structure. The second relate to the variance of bond lengths between each neighbor of each atom. The final feature is the variance in Voronoi cell sizes across the structure.

We define the ‘average bond length’ of a site as the weighted average of the bond lengths for all neighbors. By default, the weight is the area of the face between the sites.

The ‘neighbor distance variation’ is defined as the weighted mean absolute deviation in both length for all neighbors of a particular site. As before, the weight is according to face area by default. For this statistic, we divide the mean absolute deviation by the mean neighbor distance for that site.

Features:

mean absolute deviation in relative bond length - Mean absolute deviation: in the average bond lengths for all sites, divided by the mean average bond length
max relative bond length - Maximum average bond length, divided by the: mean average bond length
min relative bond length - Minimum average bond length, divided by the: mean average bond length
[stat] neighbor distance variation - Statistic (e.g., mean) of the: neighbor distance variation
mean absolute deviation in relative cell size - Mean absolute deviation: in the Voronoi cell volume across all sites in the structure. Divided by the mean Voronoi cell volume.

References:

Ward et al. _PRB_ 2017

__init__(weight='area', stats=('minimum', 'maximum', 'range', 'mean', 'avg_dev'))¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.composite module¶

Structure featurizers producing more than one kind of structure feature data.

class matminer.featurizers.structure.composite.JarvisCFID(use_cell=True, use_chem=True, use_chg=True, use_rdf=True, use_adf=True, use_ddf=True, use_nn=True)¶

Bases: BaseFeaturizer

Classical Force-Field Inspired Descriptors (CFID) from Jarvis-ML.

Chemo-structural descriptors from five different sub-methods, including pairwise radial, nearest neighbor, bond-angle, dihedral-angle and core-charge distributions. With all descriptors enabled, there are 1,557 features per structure.

Adapted from the nist/jarvis package hosted at: https://github.com/usnistgov/jarvis

Find details at: https://journals.aps.org/prmaterials/abstract/10.1103/

PhysRevMaterials.2.083801

Args/Features:

use_cell (bool): Use structure cell descriptors (4 features, based: on DensityFeatures and log volume per atom).

use_chem (bool): Use chemical composition descriptors (438 features) use_chg (bool): Use core charge descriptors (378 features) use_adf (bool): Use angular distribution function (179 features x 2, one

set of features for each cutoff).

use_rdf (bool): Use radial distribution function (100 features) use_ddf (bool): Use dihedral angle distribution function (179 features) use_nn (bool): Use nearest neighbors (100 descriptors)

__init__(use_cell=True, use_chem=True, use_chg=True, use_rdf=True, use_adf=True, use_ddf=True, use_nn=True)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get chemo-structural CFID descriptors

Args:: s: Structure object
Returns:: (np.ndarray) Final descriptors

get_chem(element)¶

Get chemical descriptors for an element

Args:: element: element name
Returns:: arr: descriptor array value

get_chg(element)¶

Get charge descriptors for an element

Args:: element: element name
Returns:: arr: descriptor array values

get_distributions(structure, c_size=10.0, max_cut=5.0)¶

Get radial and angular distribution functions

Args:: structure: Structure object c_size: max. cell size max_cut: max. bond cut-off for angular distribution
Returns:: adfa, adfb, ddf, rdf, bondo Angular distribution up to first cut-off Angular distribution up to second cut-off Dihedral angle distribution up to first cut-off Radial distribution function Bond order distribution

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.matrix module¶

Structure featurizers generating a matrix for each structure.

Most matrix structure featurizers contain the ability to flatten matrices to be dataframe-friendly.

class matminer.featurizers.structure.matrix.CoulombMatrix(diag_elems=True, flatten=True)¶

Bases: BaseFeaturizer

The Coulomb matrix, a representation of nuclear coulombic interaction.

Generate the Coulomb matrix, M, of the input structure (or molecule). The Coulomb matrix was put forward by Rupp et al. (Phys. Rev. Lett. 108, 058301, 2012) and is defined by off-diagonal elements M_ij = Z_i*Z_j/|R_i-R_j| and diagonal elements 0.5*Z_i^2.4, where Z_i and R_i denote the nuclear charge and the position of atom i, respectively.

Coulomb Matrix features are flattened (for ML-readiness) by default. Use fit before featurizing to use flattened features. To return the matrix form, set flatten=False.

Args:

diag_elems (bool): flag indication whether (True, default) to use: the original definition of the diagonal elements; if set to False, the diagonal elements are set to 0
flatten (bool): If True, returns a flattened vector based on eigenvalues: of the matrix form. Otherwise, returns a matrix object (single feature), which will likely need to be processed further.

__init__(diag_elems=True, flatten=True)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get Coulomb matrix of input structure.

Args:: s: input Structure (or Molecule) object.
Returns:: m: (Nsites x Nsites matrix) Coulomb matrix.

fit(X, y=None)¶

Fit the Coulomb Matrix to a list of structures.

Args:: X ([Structure]): A list of pymatgen structures. y : unused (added for consistency with overridden method signature)
Returns:: self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.matrix.OrbitalFieldMatrix(period_tag=False, flatten=True)¶

Bases: BaseFeaturizer

Representation based on the valence shell electrons of neighboring atoms.

Each atom is described by a 32-element vector (or 39-element vector, see period tag for details) uniquely representing the valence subshell. A 32x32 (39x39) matrix is formed by multiplying two atomic vectors. An OFM for an atomic environment is the sum of these matrices for each atom the center atom coordinates with multiplied by a distance function (In this case, 1/r times the weight of the coordinating atom in the Voronoi

Polyhedra method). The OFM of a structure or molecule is the average of the OFMs for all the sites in the structure.

Args:

period_tag (bool): In the original OFM, an element is represented: by a vector of length 32, where each element is 1 or 0, which represents the valence subshell of the element. With period_tag=True, the vector size is increased to 39, where the 7 extra elements represent the period of the element. Note lanthanides are treated as period 6, actinides as period 7. Default False as in the original paper.
flatten (bool): Flatten the avg OFM to a 1024-vector (if period_tag: False) or a 1521-vector (if period_tag=True).

…attribute:: size

Either 32 or 39, the size of the vectors used to describe elements.

Reference:

Pham et al. _Sci Tech Adv Mat_. 2017 <http://dx.doi.org/10.1080/14686996.2017.1378060>_

__init__(period_tag=False, flatten=True)¶

Initialize the featurizer

Args:

period_tag (bool): In the original OFM, an element is represented: by a vector of length 32, where each element is 1 or 0, which represents the valence subshell of the element. With period_tag=True, the vector size is increased to 39, where the 7 extra elements represent the period of the element. Note lanthanides are treated as period 6, actinides as period 7. Default False as in the original paper.

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Makes a supercell for structure s (to protect sites from coordinating with themselves), and then finds the mean of the orbital field matrices of each site to characterize a structure

Args:

s (Structure): structure to characterize

Returns:

mean_ofm (size X size matrix): orbital field matrix: characterizing s

get_atom_ofms(struct, symm=False)¶

Calls get_single_ofm for every site in struct. If symm=True, get_single_ofm is called for symmetrically distinct sites, and counts is constructed such that ofms[i] occurs counts[i] times in the structure

Args:

struct (Structure): structure for find ofms for symm (bool): whether to calculate ofm for only symmetrically

distinct sites

Returns:

ofms ([size X size matrix] X len(struct)): ofms for struct if symm:

ofms ([size X size matrix] X number of symmetrically distinct sites):
ofms for struct

counts: number of identical sites for each ofm

get_mean_ofm(ofms, counts)¶: Averages a list of ofms, weights by counts

get_ohv(sp, period_tag)¶

Get the “one-hot-vector” for pymatgen Element sp. This 32 or 39-length vector represents the valence shell of the given element. Args:

sp (Element): element whose ohv should be returned period_tag (bool): If true, the vector contains items

corresponding to the period of the element

Returns:: my_ohv (numpy array length 39 if period_tag, else 32): ohv for sp

get_single_ofm(site, site_dict)¶

Gets the orbital field matrix for a single chemical environment, where site is the center atom whose environment is characterized and site_dict is a dictionary of site : weight, where the weights are the Voronoi Polyhedra weights of the corresponding coordinating sites.

Args:: site (Site): center atom site_dict (dict of Site:float): chemical environment
Returns:: atom_ofm (size X size numpy matrix): ofm for site

get_structure_ofm(struct)¶: Calls get_mean_ofm on the results of get_atom_ofms to give a size X size matrix characterizing a structure

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.matrix.SineCoulombMatrix(diag_elems=True, flatten=True)¶

Bases: BaseFeaturizer

A variant of the Coulomb matrix developed for periodic crystals.

This function generates a variant of the Coulomb matrix developed for periodic crystals by Faber et al. (Inter. J. Quantum Chem. 115, 16, 2015). It is identical to the Coulomb matrix, except that the inverse distance function is replaced by the inverse of a sin**2 function of the vector between the sites which is periodic in the dimensions of the structure lattice. See paper for details.

Coulomb Matrix features are flattened (for ML-readiness) by default. Use fit before featurizing to use flattened features. To return the matrix form, set flatten=False.

Args:

diag_elems (bool): flag indication whether (True, default) to use: the original definition of the diagonal elements; if set to False, the diagonal elements are set to 0
flatten (bool): If True, returns a flattened vector based on eigenvalues: of the matrix form. Otherwise, returns a matrix object (single feature), which will likely need to be processed further.

__init__(diag_elems=True, flatten=True)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Args:: s (Structure or Molecule): input structure (or molecule)
Returns:: (Nsites x Nsites matrix) Sine matrix or

fit(X, y=None)¶

Fit the Sine Coulomb Matrix to a list of structures.

Args:: X ([Structure]): A list of pymatgen structures. y : unused (added for consistency with overridden method signature)
Returns:: self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.misc module¶

Miscellaneous structure featurizers.

class matminer.featurizers.structure.misc.EwaldEnergy(accuracy=4, per_atom=True)¶

Bases: BaseFeaturizer

Compute the energy from Coulombic interactions.

Note: The energy is computed using _charges already defined for the structure_.

Features:: ewald_energy - Coulomb interaction energy of the structure

__init__(accuracy=4, per_atom=True)¶

Args:: accuracy (int): Accuracy of Ewald summation, number of decimal places

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Args:: (Structure) - Structure being analyzed
Returns:: ([float]) - Electrostatic energy of the structure

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.misc.StructureComposition(featurizer=None)¶

Bases: BaseFeaturizer

Features related to the composition of a structure

This class is just a wrapper that calls a composition-based featurizer on the composition of a Structure

Features:

Depends on the featurizer

__init__(featurizer=None)¶

Initialize the featurizer

Args:: featurizer (BaseFeaturizer) - Composition-based featurizer

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

fit(X, y=None, **fit_kwargs)¶

Update the parameters of this featurizer based on available data

Args:: X - [list of tuples], training data
Returns:: self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.misc.XRDPowderPattern(two_theta_range=(0, 127), bw_method=0.05, pattern_length=None, **kwargs)¶

Bases: BaseFeaturizer

1D array representing powder diffraction of a structure as calculated by pymatgen. The powder is smeared / normalized according to gaussian_kde.

__init__(two_theta_range=(0, 127), bw_method=0.05, pattern_length=None, **kwargs)¶

Initialize the featurizer.

Args:

two_theta_range ([float of length 2]): Tuple for range of: two_thetas to calculate in degrees. Defaults to (0, 90). Set to None if you want all diffracted beams within the limiting sphere of radius 2 / wavelength.

bw_method (float): how much to smear the XRD pattern pattern_length (float): length of final array; defaults to one value

per degree (i.e. two_theta_range + 1)

**kwargs: any other arguments to pass into pymatgen’s XRDCalculator,: such as the type of radiation.

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.order module¶

Structure featurizers based on packing or ordering.

class matminer.featurizers.structure.order.ChemicalOrdering(shells=(1, 2, 3), weight='area')¶

Bases: BaseFeaturizer

How much the ordering of species in the structure differs from random

These parameters describe how much the ordering of all species in a structure deviates from random using a Warren-Cowley-like ordering parameter. The first step of this calculation is to determine the nearest neighbor shells of each site. Then, for each shell a degree of order for each type is determined by computing:

\alpha (t,s) = 1 - \frac{\sum_n w_n \delta (t - t_n)}{x_t \sum_n w_n}

where w_n is the weight associated with a certain neighbor, t_p is the type of the neighbor, and x_t is the fraction of type t in the structure. For atoms that are randomly dispersed in a structure, this formula yields 0 for all types. For structures where each site is surrounded only by atoms of another type, this formula yields large values of alpha.

The mean absolute value of this parameter across all sites is used as a feature.

Features:

mean ordering parameter shell [n] - Mean ordering parameter for: atoms in the n<sup>th</sup> neighbor shell

References:

Ward et al. _PRB_ 2017

__init__(shells=(1, 2, 3), weight='area')¶

Initialize the featurizer

Args:: shells ([int]) - Which neighbor shells to evaluate weight (str) - Attribute used to weigh neighbor contributions

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.order.DensityFeatures(desired_features=None)¶

Bases: BaseFeaturizer

Calculates density and density-like features

Features:

density
volume per atom
(“vpa”), and packing fraction

__init__(desired_features=None)¶

Args:

desired_features: [str] - choose from “density”, “vpa”,: “packing fraction”

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(s: Structure) → bool¶

Precheck a single entry. DensityFeatures does not work for disordered structures. To precheck an entire dataframe (qnd automatically gather the fraction of structures that will pass the precheck), please use precheck_dataframe.

Args:: s (pymatgen.Structure): The structure to precheck.
Returns:: (bool): If True, s passed the precheck; otherwise, it failed.

class matminer.featurizers.structure.order.MaximumPackingEfficiency¶

Bases: BaseFeaturizer

Maximum possible packing efficiency of this structure

Uses a Voronoi tessellation to determine the largest radius each atom can have before any atoms touches any one of their neighbors. Given the maximum radius size, this class computes the maximum packing efficiency of the structure as a feature.

Features:: max packing efficiency - Maximum possible packing efficiency

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(strc)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.order.StructuralComplexity(symprec=0.1)¶

Bases: BaseFeaturizer

Shannon information entropy of a structure.

This descriptor treat a structure as a message to evaluate structural complexity (S) using the following equation:

S = - v \sum_{i=1}^{k} p_i \log_2 p_i

p_i = m_i / v

where v is the total number of atoms in the unit cell, p_i is the probability mass function, k is the number of symmetrically inequivalent sites, and m_i is the number of sites classified in i th symmetrically inequivalent site.

Features:

information entropy (bits/atom)
information entropy (bits/unit cell)

Args:

symprec: precision for symmetrizing a structure

__init__(symprec=0.1)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(struct)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.rdf module¶

Structure featurizers implementing radial distribution functions.

class matminer.featurizers.structure.rdf.ElectronicRadialDistributionFunction(cutoff=20, dr=0.05)¶

Bases: BaseFeaturizer

Calculate the inherent electronic radial distribution function (ReDF)

The ReDF is defined according to Willighagen et al., Acta Cryst., 2005, B61, 29-36.

The ReDF is a structure-integral RDF (i.e., summed over all sites) in which the positions of neighboring sites are weighted by electrostatic interactions inferred from atomic partial charges. Atomic charges are obtained from the ValenceIonicRadiusEvaluator class.

WARNING: The ReDF needs oxidation states to work correctly.

Args:

cutoff: (float) distance up to which the ReDF is to be: calculated.

dr: (float) width of bins (“x”-axis) of ReDF (default: 0.05 A).

Attributes:

distances (np.ndarray): The distances at which each bin begins.

__init__(cutoff=20, dr=0.05)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get ReDF of input structure.

Args:: s: input Structure object.

Returns: (list) the ReDF

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(s) → bool¶

Check the structure to ensure the ReDF can be run. Args:

s (pymatgen. Structure): Structure to precheck

Returns:: (bool)

class matminer.featurizers.structure.rdf.PartialRadialDistributionFunction(cutoff=20.0, bin_size=0.1, include_elems=(), exclude_elems=())¶

Bases: BaseFeaturizer

Compute the partial radial distribution function (PRDF) of an xtal structure

The PRDF of a crystal structure is the radial distribution function broken down for each pair of atom types. The PRDF was proposed as a structural descriptor by [Schutt et al.] (https://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118)

Args:: cutoff: (float) distance up to which to calculate the RDF. bin_size: (float) size of each bin of the (discrete) RDF. include_elems: (list of string), list of elements that must be included in PRDF exclude_elems: (list of string), list of elements that should not be included in PRDF
Features:: Each feature corresponds to the density of number of bonds for a certain pair of elements at a certain range of distances. For example, “Al-Al PRDF r=1.00-1.50” corresponds to the density of Al-Al bonds between 1 and 1.5 distance units By default, this featurizer generates RDFs for each pair of elements in the training set.

__init__(cutoff=20.0, bin_size=0.1, include_elems=(), exclude_elems=())¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

compute_prdf(s)¶

Compute the PRDF for a structure

Args:: s: (Structure), structure to be evaluated
Returns:: dist_bins - float, start of each of the bins prdf - dict, where the keys is a pair of elements (strings),

and the value is the radial distribution function for those paris of elements

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get PRDF of the input structure. Args:

s: Pymatgen Structure object.

Returns:

prdf, dist: (tuple of arrays) the first element is a: dictionary where keys are tuples of element names and values are PRDFs.

fit(X, y=None)¶

Define the list of elements to be included in the PRDF. By default, the PRDF will include all of the elements in X

Args:

X: (numpy array nx1) structures used in the training set. Each entry: must be Pymatgen Structure objects.

y: Not used fit_kwargs: not used

Returns:

self

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(s)¶

Precheck the structure is ordered. Args:

s: (pymatgen.Structure)

Returns:: (bool): True if passing precheck, false if failing

class matminer.featurizers.structure.rdf.RadialDistributionFunction(cutoff=20.0, bin_size=0.1)¶

Bases: BaseFeaturizer

Calculate the radial distribution function (RDF) of a crystal structure.

Features:

Radial distribution function. Each feature is the “density” of the distribution at a certain radius.

Args:

cutoff: (float) Angstrom distance up to which to calculate the RDF. bin_size: (float) size in Angstrom of each bin of the (discrete) RDF.

Attributes:

bin_distances (np.Ndarray): The distances each bin represents. Can be: used for graphing the RDF.

__init__(cutoff=20.0, bin_size=0.1)¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get RDF of the input structure. Args:

s (Structure): Pymatgen Structure object.

Returns:

rdf: (iterable) the first element is the: normalized RDF, whereas the second element is the inner radius of the RDF bin.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(s)¶

Precheck the structure is ordered. Args:

s: (pymatgen.Structure)

Returns:: (bool): True if passing precheck, false if failing

matminer.featurizers.structure.rdf.get_rdf_bin_labels(bin_distances, cutoff)¶

Common function for getting bin labels given the distances at which each bin begins and the ending cutoff. Args:

bin_distances (np.ndarray): The distances at which each bin begins. cutoff (float): The final cutoff value.

Returns:: [str]: The feature labels for the *RDF

matminer.featurizers.structure.sites module¶

Structure featurizers based on aggregating site features.

class matminer.featurizers.structure.sites.PartialsSiteStatsFingerprint(site_featurizer, stats=('mean', 'std_dev'), min_oxi=None, max_oxi=None, covariance=False, include_elems=(), exclude_elems=())¶

Bases: SiteStatsFingerprint

Computes statistics of properties across all sites in a structure, and breaks these down by element. This featurizer first uses a site featurizer class (see site.py for options) to compute features of each site of a specific element in a structure, and then computes features of the entire structure by measuring statistics of each attribute. Features:

Returns each statistic of each site feature, broken down by element

__init__(site_featurizer, stats=('mean', 'std_dev'), min_oxi=None, max_oxi=None, covariance=False, include_elems=(), exclude_elems=())¶

Args:

site_featurizer (BaseFeaturizer): a site-based featurizer stats ([str]): list of weighted statistics to compute for each feature.

If stats is None, a list is returned for each features that contains the calculated feature for each site in the structure. *Note for nth mode, stat must be ‘n*_mode’; e.g. stat=’2nd_mode’

min_oxi (int): minimum site oxidation state for inclusion (e.g.,: zero means metals/cations only)

max_oxi (int): maximum site oxidation state for inclusion covariance (bool): Whether to compute the covariance of site features

compute_pssf(s, e)¶

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Get PSSF of the input structure. Args:

s: Pymatgen Structure object.

Returns:: pssf: 1D array of each element’s ssf

fit(X, y=None)¶

Define the list of elements to be included in the PRDF. By default, the PRDF will include all of the elements in X Args:

X: (numpy array nx1) structures used in the training set. Each entry
must be Pymatgen Structure objects.

y: Not used fit_kwargs: not used

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.sites.SiteStatsFingerprint(site_featurizer, stats=('mean', 'std_dev'), min_oxi=None, max_oxi=None, covariance=False)¶

Bases: BaseFeaturizer

Computes statistics of properties across all sites in a structure.

This featurizer first uses a site featurizer class (see site.py for options) to compute features of each site in a structure, and then computes features of the entire structure by measuring statistics of each attribute. Can optionally compute the statistics of only sites with certain ranges of oxidation states (e.g., only anions).

Features:

Returns each statistic of each site feature

__init__(site_featurizer, stats=('mean', 'std_dev'), min_oxi=None, max_oxi=None, covariance=False)¶

Args:

site_featurizer (BaseFeaturizer): a site-based featurizer stats ([str]): list of weighted statistics to compute for each feature.

If stats is None, a list is returned for each features that contains the calculated feature for each site in the structure. *Note for nth mode, stat must be ‘n*_mode’; e.g. stat=’2nd_mode’

min_oxi (int): minimum site oxidation state for inclusion (e.g.,: zero means metals/cations only)

max_oxi (int): maximum site oxidation state for inclusion covariance (bool): Whether to compute the covariance of site features

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

fit(X, y=None, **fit_kwargs)¶

Fit the SiteStatsFeaturizer using the fitting function of the underlying site featurizer. Only applicable if the site featurizer is fittable. See the “.fit()” method of the site_featurizer used to construct the class for more information. Args:

X (Iterable): y (optional, Iterable): **fit_kwargs: Keyword arguments used by the fit function of the

site featurizer class.

Returns:: self (SiteStatsFeaturizer)

classmethod from_preset(preset, **kwargs)¶

Create a SiteStatsFingerprint class according to a preset

Args:: preset (str) - Name of preset kwargs - Options for SiteStatsFingerprint

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure.symmetry module¶

Structure featurizers based on symmetry.

class matminer.featurizers.structure.symmetry.Dimensionality(nn_method=CrystalNN())¶

Bases: BaseFeaturizer

Returns dimensionality of structure: 1 means linear chains of atoms OR isolated atoms/no bonds, 2 means layered, 3 means 3D connected structure. This feature is sensitive to bond length tables that you use.

__init__(nn_method=CrystalNN())¶

Args:

**nn_method: The nearest neighbor method used to determine atomic: connectivity.

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.structure.symmetry.GlobalSymmetryFeatures(desired_features=None)¶

Bases: BaseFeaturizer

Determines symmetry features, e.g. spacegroup number and crystal system

Features:

Spacegroup number
Crystal system (1 of 7)
Centrosymmetry (has inversion symmetry)
Number of symmetry ops, obtained from the spacegroup

__init__(desired_features=None)¶

all_features = ['spacegroup_num', 'crystal_system', 'crystal_system_int', 'is_centrosymmetric', 'n_symmetry_ops']¶

citations()¶

Citation(s) and reference(s) for this feature.

Returns:

(list) each element should be a string citation,: ideally in BibTeX format.

crystal_idx = {'cubic': 1, 'hexagonal': 2, 'monoclinic': 6, 'orthorhombic': 5, 'tetragonal': 4, 'triclinic': 7, 'trigonal': 3}¶

feature_labels()¶

Generate attribute names.

Returns:: ([str]) attribute labels.

featurize(s)¶

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:: x: input data to featurize (type depends on featurizer).
Returns:: (list) one or more features.

implementors()¶

List of implementors of the feature.

Returns:

(list) each element should either be a string with author name (e.g.,: “Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.structure package¶

Subpackages¶

Submodules¶

matminer.featurizers.structure.bonding module¶

matminer.featurizers.structure.composite module¶

matminer.featurizers.structure.matrix module¶

matminer.featurizers.structure.misc module¶

matminer.featurizers.structure.order module¶

matminer.featurizers.structure.rdf module¶

matminer.featurizers.structure.sites module¶

matminer.featurizers.structure.symmetry module¶

Module contents¶

Table of Contents

This Page