matminer.featurizers.composition package

Subpackages

Submodules

matminer.featurizers.composition.alloy module

Composition featurizers specialized for use with alloys.

class matminer.featurizers.composition.alloy.Miedema(struct_types='all', ss_types='min', data_source='Miedema')

Bases: BaseFeaturizer

Formation enthalpies of intermetallic compounds, from Miedema et al.

Calculate the formation enthalpies of the intermetallic compound, solid solution and amorphous phase of a given composition, based on semi-empirical Miedema model (and some extensions), particularly for transitional metal alloys.

Support elemental, binary and multicomponent alloys. For elemental/binary alloys, the formulation is based on the original works by Miedema et al. in 1980s; For multicomponent alloys, the formulation is basically the linear combination of sub-binary systems. This is reported to work well for ternary alloys, but needs to be careful with quaternary alloys and more.

Args:
struct_types (str or [str]): default=’all’

‘inter’: intermetallic compound; ‘ss’: solid solution ‘amor’: amorphous phase; ‘all’: same for [‘inter’, ‘ss’, ‘amor’] [‘inter’, ‘ss’]: amorphous phase and solid solution

ss_types (str or [str]): only for ss, default=’min’

‘fcc’: fcc solid solution; ‘bcc’: bcc solid solution ‘hcp’: hcp solid solution; ‘no_latt’: solid solution with no specific structure type ‘min’: min value of [‘fcc’, ‘bcc’, ‘hcp’, ‘no_latt’] ‘all’: same for [‘fcc’, ‘bcc’, ‘hcp’, ‘no_latt’] [‘fcc’, ‘bcc’]: fcc and bcc solid solutions

data_source (str): source of dataset, default=’Miedema’

‘Miedema’: ‘Miedema.csv’ placed in “matminer/utils/data_files/”, containing the following model parameters for 73 elements: ‘molar_volume’, ‘electron_density’, ‘electronegativity’ ‘valence_electrons’, ‘a_const’, ‘R_const’, ‘H_trans’ ‘compressibility’, ‘shear_modulus’, ‘melting_point’ ‘structural_stability’. Please see the references for details.

Returns:
(list of floats) Miedema formation enthalpies (eV/atom) for input

struct_types: -Miedema_deltaH_inter: for intermetallic compound -Miedema_deltaH_ss: for solid solution, can include ‘fcc’, ‘bcc’,

‘hcp’, ‘no_latt’, ‘min’ based on input ss_types

-Miedema_deltaH_amor: for amorphous phase

__init__(struct_types='all', ss_types='min', data_source='Miedema')
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

deltaH_chem(elements, fracs, struct)

Chemical term of formation enthalpy Args:

elements (list of str): list of elements fracs (list of floats): list of atomic fractions struct (str): ‘inter’, ‘ss’ or ‘amor’

Returns:

deltaH_chem (float): chemical term of formation enthalpy

deltaH_elast(elements, fracs)

Elastic term of formation enthalpy Args:

elements (list of str): list of elements fracs (list of floats): list of atomic fractions

Returns:

deltaH_elastic (float): elastic term of formation enthalpy

deltaH_struct(elements, fracs, latt)

Structural term of formation enthalpy, only for solid solution Args:

elements (list of str): list of elements fracs (list of floats): list of atomic fractions latt (str): ‘fcc’, ‘bcc’, ‘hcp’ or ‘no_latt’

Returns:

deltaH_struct (float): structural term of formation enthalpy

deltaH_topo(elements, fracs)

Topological term of formation enthalpy, only for amorphous phase Args:

elements (list of str): list of elements fracs (list of floats): list of atomic fractions

Returns:

deltaH_topo (float): topological term of formation enthalpy

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get Miedema formation enthalpies of target structures: inter, amor, ss (can be further divided into ‘min’, ‘fcc’, ‘bcc’, ‘hcp’, ‘no_latt’

for different lattice_types)

Args:

comp: Pymatgen composition object

Returns:

miedema (list of floats): formation enthalpies of target structures

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(c: Composition) bool

Precheck a single entry. Miedema does not work for compositions containing any elements for which the Miedema model has no parameters. To precheck an entire dataframe (qnd automatically gather the fraction of structures that will pass the precheck), please use precheck_dataframe.

Args:

c (pymatgen.Composition): The composition to precheck.

Returns:

(bool): If True, s passed the precheck; otherwise, it failed.

class matminer.featurizers.composition.alloy.WenAlloys

Bases: BaseFeaturizer

Calculate features for alloy properties.

Based on the work:

“Machine learning assisted design of high entropy alloys with desired property” by Wen et al., Acta Materiala 170, 109-117 (2019).

Copyright 2020 Battelle Energy Alliance, LLC ALL RIGHTS RESERVED

Features:

Yang omega Yang delta Radii local mismatch Radii gamma Configuration entropy Lambda entropy Electronegativity delta Electronegativity local mismatch VEC mean Mixing enthalpy Mean cohesive energy Interant electrons Shear modulus mean Shear modulus delta Shear modulus local mismatch Shear modulus strength model

Copyright 2020 Battelle Energy Alliance, LLC ALL RIGHTS RESERVED

__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

static compute_atomic_fraction(elements, composition)

Get atomic fraction string.

Args:

elements ([pymatgen.Element or str]): List of elements composition (pymatgen.Composition): Composition

Returns:

(str)

static compute_configuration_entropy(fractions)

Compute the configuration entropy.

R \sum^n_{i=1} c_i \ln{c_i}

where c_i are the fraction of each element i and R is the ideal gas constant Args:

fractions ([float]): List of element fractions

Returns:

(float) gamma

static compute_delta(variable, fractions)

Compute Yang’s delta parameter for a generic variable.

\sqrt{\sum^n_{i=1} c_i \left( 1 - \frac{v_i}{\bar{v}} \right)^2 }

where c_i and v_i are the fraction and variable of element i, and \bar{v} is the fraction-weighted average of the variable. Args:

variable (list): List of properties to asses fractions (list): List of fractions to asses

Returns:

(float) delta

compute_enthalpy(elements, fractions)

Compute mixing enthalpy.

Args:

elements ([pymatgen.Element or str]): List of elements fractions [float]: Fractions of elements in composition

Returns:

(float) H_mixing

static compute_gamma_radii(miracle_radius_stats)
Compute Gamma of the radii. The solid angles of the

atomic packing for the elements with the most significant and smallest atomic sizes.

:math:`

rac{1 - sqrt{ rac{((r + r_{min})^2 - r^2)}{(r + r_{min})^2}}}{1 - sqrt{ rac{((r + r_{max})^2 - r^2)}{(r + r_{max})^2}}}`

where r, r_{min} and r_{max} are the mean radii min radii and max radii.

Args:

miracle_radius_stats (dict): Dictionary of stats for miracleradius via compute_magpie_summary

Returns:

(float) gamma

static compute_lambda(yang_delta, entropy)
Args:

yang_delta (float): Yang Solid Solution Delta entropy (float): Configuration entropy

Returns:

float

static compute_local_mismatch(variable, fractions)

Compute local mismatch of a given variable.

:math:`sum^n_{i=1} sum^n_{j=1,i

eq j} c_i c_j | v_i - v_j |^2`

where c_{i,j} and v_{i,j} are the fraction and variable of element i,j. Args:

variable (list): List of properties to asses fractions (list): List of fractions to asses

Returns:

(float) local mismatch

compute_magpie_summary(attribute_name, elements, fractions)

Get limited list of weighted statistics according to magpie data.

Args:

attribute_name (str): Name of magpie attribute to retrieve elements ([pymatgen.element or str]): List of elements fractions ([float]): List of element fractions

Returns:

(dict) Dictionary of element-fraction weighted statistics for attribute.

static compute_strength_local_mismatch_shear(shear_modulus, mean_shear_modulus, fractions)

The local mismatch of the shear values.

:math:`sum^n_{i=1}

rac{c_i rac{2(G_i - G)}{G_i + G} }{left(1 + 0.5 |c_i rac{2(G_i - G)}{G_i + G} ight)|}`

where c_{i}, :math:’G’ and G_{i} are the fraction, mean shear modulus and shear modulus of element i. Args:

shear_modulus ([float]): List of shear moduli of elements mean_shear_modulus(float): Mean of shear moduli fractions ([float]): List of element fractions in the composition

Returns:

(float) strengthening local mismatch

static compute_weight_fraction(elements, composition)

Get weight fraction string.

Args:

elements ([pymatgen.Element or str]): List of elements composition (pymatgen.Composition): Composition

Returns:

(str)

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get elemental property attributes Args:

comp: Pymatgen composition object

Returns:

(list): Generated Wen et al. features.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(comp)

Precheck (provide an estimate of whether a featurizer will work or not) for a single entry (e.g., a single composition). If the entry fails the precheck, it will most likely fail featurization; if it passes, it is likely (but not guaranteed) to featurize correctly.

Prechecks should be:
  • accurate (but can be good estimates rather than ground truth)

  • fast to evaluate

  • unlikely to be obsolete via changes in the featurizer in the near

    future

This method should be overridden by any featurizer requiring its use, as by default all entries will pass prechecking. Also, precheck is a good opportunity to throw warnings about long runtimes (e.g., doing nearest neighbors computations on a structure with many thousand sites).

See the documentation for precheck_dataframe for more information.

Args:
*x (Composition, Structure, etc.): Input to-be-featurized. Can be

a single input or multiple inputs.

Returns:

(bool): True, if passes the precheck. False, if fails.

class matminer.featurizers.composition.alloy.YangSolidSolution

Bases: BaseFeaturizer

Mixing thermochemistry and size mismatch terms of Yang and Zhang (2012)

This featurizer returns two different features developed by .. Yang and Zhang https://linkinghub.elsevier.com/retrieve/pii/S0254058411009357 to predict whether metal alloys will form metallic glasses, crystalline solid solutions, or intermetallics. The first, Omega, is related to the balance between the mixing entropy and mixing enthalpy of the liquid phase. The second, delta, is related to the atomic size mismatch between the different elements of the material.

Features

Yang omega - Mixing thermochemistry feature, Omega Yang delta - Atomic size mismatch term

References:
__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

compute_delta(comp)

Compute Yang’s delta parameter

\sqrt{\sum^n_{i=1} c_i \left( 1 - \frac{r_i}{\bar{r}} \right)^2 }

where c_i and r_i are the fraction and radius of element i, and \bar{r} is the fraction-weighted average of the radii. We use the radii compiled by .. Miracle et al. https://www.tandfonline.com/doi/ref/10.1179/095066010X12646898728200?scroll=top.

Args:

comp (Composition) - Composition to assess

Returns:

(float) delta

compute_omega(comp)

Compute Yang’s mixing thermodynamics descriptor

\frac{T_m \Delta S_{mix}}{ | \Delta H_{mix} | }

Where T_m is average melting temperature, \Delta S_{mix} is the ideal mixing entropy, and \Delta H_{mix} is the average mixing enthalpies of all pairs of elements in the alloy

Args:

comp (Composition) - Composition to featurizer

Returns:

(float) Omega

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

precheck(c: Composition) bool

Precheck a single entry. YangSolidSolution does not work for compositions containing any binary element combinations for which the model has no parameters. We can nearly equivalently approximate this by checking against the unary element list.

To precheck an entire dataframe (qnd automatically gather the fraction of structures that will pass the precheck), please use precheck_dataframe.

Args:

c (pymatgen.Composition): The composition to precheck.

Returns:

(bool): If True, s passed the precheck; otherwise, it failed.

matminer.featurizers.composition.composite module

Composition featurizers for composite features containing more than 1 category of general-purpose data.

class matminer.featurizers.composition.composite.ElementProperty(data_source, features, stats)

Bases: BaseFeaturizer

Class to calculate elemental property attributes.

To initialize quickly, use the from_preset() method.

Features: Based on the statistics of the data_source chosen, computed by element stoichiometry. The format generally is:

“{data source} {statistic} {property}”

For example:

“PymatgenData range X” # Range of electronegativity from Pymatgen data

For a list of all statistics, see the PropertyStats documentation; for a list of all attributes available for a given data_source, see the documentation for the data sources (e.g., PymatgenData, MagpieData, MatscholarElementData, etc.).

Args:
data_source (AbstractData or str): source from which to retrieve

element property data (or use str for preset: “pymatgen”, “magpie”, or “deml”)

features (list of strings): List of elemental properties to use

(these must be supported by data_source)

stats (list of strings): a list of weighted statistics to compute to for each

property (see PropertyStats for available stats)

__init__(data_source, features, stats)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get elemental property attributes

Args:

comp: Pymatgen composition object

Returns:

all_attributes: Specified property statistics of features

classmethod from_preset(preset_name)

Return ElementProperty from a preset string Args:

preset_name: (str) can be one of “magpie”, “deml”, “matminer”,

“matscholar_el”, or “megnet_el”.

Returns:

ElementProperty based on the preset name.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.composite.Meredig

Bases: BaseFeaturizer

Class to calculate features as defined in Meredig et. al.

Features:

Atomic fraction of each of the first 103 elements, in order of atomic number. 17 statistics of elemental properties;

Mean atomic weight of constituent elements Mean periodic table row and column number Mean and range of atomic number Mean and range of atomic radius Mean and range of electronegativity Mean number of valence electrons in each orbital Fraction of total valence electrons in each orbital

__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get elemental property attributes

Args:

comp: Pymatgen composition object

Returns:

all_attributes: Specified property statistics of features

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition.element module

Composition featurizers for elemental data and stoichiometry.

class matminer.featurizers.composition.element.BandCenter

Bases: BaseFeaturizer

Estimation of absolute position of band center using electronegativity.

Features
  • Band center

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

deml_data = <matminer.utils.data.DemlData object>
feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

(Rough) estimation of absolution position of band center using geometric mean of electronegativity.

Args:

comp (Composition).

Returns:

(float) band center.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

magpie_data = <matminer.utils.data.MagpieData object>
class matminer.featurizers.composition.element.ElementFraction

Bases: BaseFeaturizer

Class to calculate the atomic fraction of each element in a composition.

Generates a vector where each index represents an element in atomic number order.

__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:

comp: Pymatgen Composition object

Returns:

vector (list of floats): fraction of each element in a composition

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.element.Stoichiometry(p_list=(0, 2, 3, 5, 7, 10), num_atoms=False)

Bases: BaseFeaturizer

Calculate norms of stoichiometric attributes.

Parameters:

p_list (list of ints): list of norms to calculate num_atoms (bool): whether to return number of atoms per formula unit

__init__(p_list=(0, 2, 3, 5, 7, 10), num_atoms=False)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get stoichiometric attributes Args:

comp: Pymatgen composition object p_list (list of ints)

Returns:
p_norm (list of floats): Lp norm-based stoichiometric attributes.

Returns number of atoms if no p-values specified.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.element.TMetalFraction

Bases: BaseFeaturizer

Class to calculate fraction of magnetic transition metals in a composition.

Parameters:

data_source (data class): source from which to retrieve element data

Generates: Fraction of magnetic transition metal atoms in a compound

__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:

comp: Pymatgen Composition object

Returns:

frac_magn_atoms (single-element list): fraction of magnetic transitional metal atoms in a compound

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition.ion module

Composition featurizers for compositions with ionic data.

class matminer.featurizers.composition.ion.CationProperty(data_source, features, stats)

Bases: ElementProperty

Features based on properties of cations in a material

Requires that oxidation states have already been determined. Property statistics weighted by composition.

Features: Based on the statistics of the data_source chosen, computed by element stoichiometry. The format generally is:

“{data source} {statistic} {property}”

For example:

“DemlData range magn_moment” # Range of magnetic moment via Deml et al. data

For a list of all statistics, see the PropertyStats documentation; for a list of all attributes available for a given data_source, see the documentation for the data sources (e.g., PymatgenData, MagpieData, MatscholarElementData, etc.).

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Get elemental property attributes

Args:

comp: Pymatgen composition object

Returns:

all_attributes: Specified property statistics of features

classmethod from_preset(preset_name)

Return ElementProperty from a preset string Args:

preset_name: (str) can be one of “magpie”, “deml”, “matminer”,

“matscholar_el”, or “megnet_el”.

Returns:

ElementProperty based on the preset name.

class matminer.featurizers.composition.ion.ElectronAffinity

Bases: BaseFeaturizer

Calculate average electron affinity times formal charge of anion elements. Note: The formal charges must already be computed before calling featurize. Generates average (electron affinity*formal charge) of anions.

__init__()
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:

comp: (Composition) Composition to be featurized

Returns:

avg_anion_affin (single-element list): average electron affinity*formal charge of anions

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.ion.ElectronegativityDiff(stats=None)

Bases: BaseFeaturizer

Features from electronegativity differences between anions and cations.

These features are computed by first determining the concentration-weighted average electronegativity of the anions. For example, the average electronegativity of the anions in CaCoSO is equal to 1/2 of that of S and 1/2 of that of O. We then compute the difference between the electronegativity of each cation and the average anion electronegativity.

The feature values are then determined based on the concentration-weighted statistics in the same manner as ElementProperty features. For example, one value could be the mean electronegativity difference over all the anions.

Parameters:

stats: Property statistics to compute

Generates average electronegativity difference between cations and anions

__init__(stats=None)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:

comp: Pymatgen Composition object

Returns:

en_diff_stats (list of floats): Property stats of electronegativity difference

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.ion.IonProperty(data_source=<matminer.utils.data.PymatgenData object>, fast=False)

Bases: BaseFeaturizer

Ionic property attributes. Similar to ElementProperty.

__init__(data_source=<matminer.utils.data.PymatgenData object>, fast=False)
Args:
data_source - (OxidationStateMixin) - A AbstractData class that supports

the get_oxidation_state method.

fast - (boolean) whether to assume elements exist in a single oxidation state,

which can dramatically accelerate the calculation of whether an ionic compound is possible, but will miss heterovalent compounds like Fe3O4.

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Ionic character attributes

Args:

comp: (Composition) Composition to be featurized

Returns:

cpd_possible (bool): Indicates if a neutral ionic compound is possible max_ionic_char (float): Maximum ionic character between two atoms avg_ionic_char (float): Average ionic character

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.ion.OxidationStates(stats=None)

Bases: BaseFeaturizer

Statistics about the oxidation states for each specie. Features are concentration-weighted statistics of the oxidation states.

__init__(stats=None)
Args:

stats - (list of string), which statistics compute

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

classmethod from_preset(preset_name)
implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition.ion.is_ionic(comp)

Determines whether a compound is an ionic compound.

Looks at the oxidation states of each site and checks if both anions and cations exist

Args:

comp (Composition): Composition to check

Returns:

(bool) Whether the composition describes an ionic compound

matminer.featurizers.composition.orbital module

Composition featurizers for orbital data.

class matminer.featurizers.composition.orbital.AtomicOrbitals

Bases: BaseFeaturizer

Determine HOMO/LUMO features based on a composition.

The highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) are estiated from the atomic orbital energies of the composition. The atomic orbital energies are from NIST: https://www.nist.gov/pml/data/atomic-reference-data-electronic-structure-calculations

Warning: For compositions with inter-species fractions greater than 10,000 (e.g. dilute alloys such as FeC0.00001) the composition will be truncated (to Fe in this example). In such extreme cases, the truncation likely reflects the true physics of the situation (i.e. that the dilute element does not significantly contribute orbital character to the band structure), but the user should be aware of this behavior.

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:
comp: (Composition)

pymatgen Composition object

Returns:

HOMO_character: (str) orbital symbol (‘s’, ‘p’, ‘d’, or ‘f’) HOMO_element: (str) symbol of element for HOMO HOMO_energy: (float in eV) absolute energy of HOMO LUMO_character: (str) orbital symbol (‘s’, ‘p’, ‘d’, or ‘f’) LUMO_element: (str) symbol of element for LUMO LUMO_energy: (float in eV) absolute energy of LUMO gap_AO: (float in eV)

the estimated bandgap from HOMO and LUMO energeis

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.orbital.ValenceOrbital(orbitals=('s', 'p', 'd', 'f'), props=('avg', 'frac'))

Bases: BaseFeaturizer

Attributes of valence orbital shells

Args:

data_source (data object): source from which to retrieve element data orbitals (list): orbitals to calculate props (list): specifies whether to return average number of electrons in each orbital,

fraction of electrons in each orbital, or both

__init__(orbitals=('s', 'p', 'd', 'f'), props=('avg', 'frac'))
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Weighted fraction of valence electrons in each orbital

Args:

comp: Pymatgen composition object

Returns:
valence_attributes (list of floats): Average number and/or

fraction of valence electrons in specified orbitals

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition.packing module

Composition featurizers for determining packing characteristics.

class matminer.featurizers.composition.packing.AtomicPackingEfficiency(threshold=0.01, n_nearest=(1, 3, 5), max_types=6)

Bases: BaseFeaturizer

Packing efficiency based on a geometric theory of the amorphous packing of hard spheres.

This featurizer computes two different kinds of the features. The first relate to the distance between a composition and the composition of the clusters of atoms expected to be efficiently packed based on a theory from `Laws et al.<http://www.nature.com/doifinder/10.1038/ncomms9123>`_. The second corresponds to the packing efficiency of a system if all atoms in the alloy are simultaneously as efficiently-packed as possible.

The packing efficiency in these models is based on the Atomic Packing Efficiency (APE), which measures the difference between the ratio of the radii of the central atom to its neighbors and the ideal ratio of a cluster with the same number of atoms that has optimal packing efficiency. If the difference between the ratios is too large, the APE is positive. If the difference is too small, the APE is negative.

Features:
dist from {k} clusters |APE| < {thr} - The distance between an

alloy composition and the k clusters that have a packing efficiency below thr from ideal

mean simul. packing efficiency - Mean packing efficiency of all atoms.

The packing efficiency is measured with respect to ideal (0)

mean abs simul. packing efficiency - Mean absolute value of the

packing efficiencies. Closer to zero is more efficiently packed

References:

[1] K.J. Laws, D.B. Miracle, M. Ferry, A predictive structural model for bulk metallic glasses, Nat. Commun. 6 (2015) 8123. doi:10.1038/ncomms9123.

__init__(threshold=0.01, n_nearest=(1, 3, 5), max_types=6)

Initialize the featurizer

Args:
threshold (float):Threshold to use for determining whether

a cluster is efficiently packed.

n_nearest ({int}): Number of nearest clusters to use when considering features max_types (int): Maximum number of atom types to consider when

looking for efficient clusters. The process for finding efficient clusters very expensive for large numbers of types

citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

compute_nearest_cluster_distance(comp)

Compute the distance between a composition and that the nearest efficiently-packed clusters.

Measures the mean L_2 distance between the alloy composition and the k-nearest clusters with Atomic Packing Efficiencies within the user-specified tolerance of 1. k is any of the numbers defined in the “n_nearest” parameter of this class.

If there are less than k efficient clusters in the system, we use the maximum distance between any two compositions (1) for the unmatched neighbors.

Args:

comp (Composition): Composition of material to evaluate

Return:

[float] Average distances

compute_simultaneous_packing_efficiency(comp)

Compute the packing efficiency of the system when the neighbor shell of each atom has the same composition as the alloy. When this criterion is satisfied, it is possible for every atom in this system to be simultaneously as efficiently-packed as possible.

Args:

comp (Composition): Composition to be assessed

Returns

(float) Average APE of all atoms (float) Average deviation of the APE of each atom from ideal (0)

create_cluster_lookup_tool(elements)

Get the compositions of efficiently-packed clusters in a certain system of elements

Args:

elements ([Element]): Elements in system

Return:
(NearNeighbors): Tool to find nearby clusters in this system. None

if there are no efficiently-packed clusters for this combination of elements

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)

Main featurizer function, which has to be implemented in any derived featurizer subclass.

Args:

x: input data to featurize (type depends on featurizer).

Returns:

(list) one or more features.

find_ideal_cluster_size(radius_ratio)

Get the optimal cluster size for a certain radius ratio

Finds the number of nearest neighbors n that minimizes |1 - rp(n)/r|, where rp(n) is the ideal radius ratio for a certain n and r is the actual ratio.

Args:

radius_ratio (float): r / r_{neighbor}

Returns:

(int) number of neighboring atoms for that will be the most efficiently packed. (float) Optimal APE

get_ideal_radius_ratio(n_neighbors)

Compute the idea ratio between the central atom and neighboring atoms for a neighbor with a certain number of nearest neighbors.

Based on work by Miracle, Lord, and Ranganathan.

Args:

n_neighbors (int): Number of atoms in 1st NN shell

Return:

(float) ideal radius ratio r / r_{neighbor}

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

matminer.featurizers.composition.thermo module

Composition featurizers for thermodynamic properties.

class matminer.featurizers.composition.thermo.CohesiveEnergy(mapi_key=None)

Bases: BaseFeaturizer

Cohesive energy per atom using elemental cohesive energies and formation energy.

Get cohesive energy per atom of a compound by adding known elemental cohesive energies from the formation energy of the compound.

Parameters:
mapi_key (str): Materials API key for looking up formation energy

by composition alone (if you don’t set the formation energy yourself).

__init__(mapi_key=None)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp, formation_energy_per_atom=None)
Args:

comp: (pymatgen.Composition): A composition formation_energy_per_atom: (float) the formation energy per atom of

your compound. If not set, will look up the most stable formation energy from the Materials Project database.

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

class matminer.featurizers.composition.thermo.CohesiveEnergyMP(mapi_key=None)

Bases: BaseFeaturizer

Cohesive energy per atom lookup using Materials Project

Parameters:
mapi_key (str): Materials API key for looking up cohesive energy

by composition alone.

__init__(mapi_key=None)
citations()

Citation(s) and reference(s) for this feature.

Returns:
(list) each element should be a string citation,

ideally in BibTeX format.

feature_labels()

Generate attribute names.

Returns:

([str]) attribute labels.

featurize(comp)
Args:

comp: (str) compound composition, eg: “NaCl”

implementors()

List of implementors of the feature.

Returns:
(list) each element should either be a string with author name (e.g.,

“Anubhav Jain”) or a dictionary with required key “name” and other keys like “email” or “institution” (e.g., {“name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).

Module contents