matminer.utils package

Subpackages

Submodules

matminer.utils.caching module

Provides utility functions for caching the results of expensive operations, such as determining the nearest neighbors of atoms in a structure

matminer.utils.caching.get_all_nearest_neighbors(method, structure)

Get the nearest neighbor list of a structure

Args:

method (NearNeighbor) - Method used to compute nearest neighbors structure (IStructure) - Structure to study

Returns:

Output of method.get_all_nn_info(structure)

matminer.utils.caching.get_nearest_neighbors(method, structure, site_idx)

Get the nearest neighbor list of a particular site in a structure

Args:

method (NearNeighbor) - Method used to compute nearest neighbors structure (Structure) - Structure to study site_idx (int) - Index of site to study

Returns:

Output of method.get_nn_info(structure, site_idx)

matminer.utils.data module

Utility classes for retrieving elemental properties. Provides a uniform interface to several different elemental property resources including pymatgen and Magpie.

class matminer.utils.data.AbstractData

Bases: object

Abstract class for retrieving elemental properties

All classes must implement the get_elemental_property operation. These operations should return scalar values (ideally floats) and nan if a property does not exist

get_elemental_properties(elems, property_name)

Get elemental properties for a list of elements

Args:

elems - ([Element]) list of elements property_name - (str) property to be retrieved

Returns:

[float], properties of elements

abstract get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.CohesiveEnergyData

Bases: AbstractData

Get the cohesive energy of an element.

Data is extracted from KnowledgeDoor Cohesive Energy Handbook online (http://www.knowledgedoor.com/2/elements_handbook/cohesive_energy.html), which in turn got the data from Introduction to Solid State Physics, 8th Edition, by Charles Kittel (ISBN 978-0-471-41526-8), 2005.

__init__()
get_elemental_property(elem, property_name='cohesive energy')
Args:

elem: (Element) Element of interest property_name (str): unused, always returns cohesive energy

Returns:

(float): cohesive energy of the element

class matminer.utils.data.DemlData

Bases: OxidationStateDependentData, OxidationStatesMixin

Class to get data from Deml data file. See also: A.M. Deml, R. O’Hayre, C. Wolverton, V. Stevanovic, Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression, Phys. Rev. B - Condens. Matter Mater. Phys. 93 (2016).

The meanings of each feature in the data can be found in ./data_files/deml_elementdata.py

__init__()
get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.IUCrBondValenceData(interpolate_soft=True)

Bases: object

Get empirical bond valence parameters.

Data come from International Union of Crystallography 2016 tables. (https://www.iucr.org/resources/data/datasets/bond-valence-parameters) Both the raw source CIF and cleaned csv file are made accessible here. Within the source CIF, there are citations for every set of parameters.

The copyright notice and disclaimer are reproduced below #*************************************************************** # COPYRIGHT NOTICE # This table may be used and distributed without fee for # non-profit purposes providing # 1) that this copyright notice is included and # 2) no fee is charged for the table and # 3) details of any changes made in this list by anyone other than # the copyright owner are suitably noted in the _audit_update record # Please consult the copyright owner regarding any other uses. # # The copyright is owned by I. David Brown, Brockhouse Institute for # Materials Research, McMaster University, Hamilton, Ontario Canada. # idbrown@mcmaster.ca # #*****************************DISCLAIMER************************ # # The values reported here are taken from the literature and # other sources and the author does not warrant their correctness # nor accept any responsibility for errors. Users are advised to # consult the primary sources. # #***************************************************************

__init__(interpolate_soft=True)

Load bond valence parameters as pandas dataframe.

If interpolate_soft is True, fill in some missing values for anions such as I, Br, N, S, Se, etc. with the assumption that bond valence parameters of such anions don’t depend on cation oxidation state. This assumption comes from Brese and O’Keeffe, (1991), Acta Cryst. B47, 194, which states “with less electronegative anions, … R is not very different for different oxidation states in general.” In the original data source file, only one set of parameters is usually provided for those less electronegative anions in a 9+ oxidation state, indicating they can be used with all oxidation states.

get_bv_params(cation, anion, cat_val, an_val)

Lookup bond valence parameters from IUPAC table. Args:

cation (Element): cation element anion (Element): anion element cat_val (Integer): cation formal oxidation state an_val (Integer): anion formal oxidation state

Returns:

bond_val_list: dataframe of bond valence parameters

interpolate_soft_anions()

Fill in missing parameters for oxidation states of soft anions.

class matminer.utils.data.MEGNetElementData

Bases: AbstractData

Class to get neural network embeddings of elements. These embeddings were generated using the Materials Graph Network (MEGNet) developed by the MaterialsVirtualLab at U.C. San Diego and described in the publication:

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong Chemistry of Materials 2019 31 (9), 3564-3572, https://doi.org/10.1021/acs.chemmater.9b01294

The code for MEGNet can be found at: https://github.com/materialsvirtuallab/megnet

The embeddings were generated by training the MEGNet Graph Network on 60,000 structures from the Materials Project for predicting formation energy, and may be an effective way of applying transfer learning to smaller datasets using crystal-graph-based networks.

The representations are learned during training to predict a specific property, though they may be useful for a range of properties.

__init__()
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.MagpieData

Bases: AbstractData, OxidationStatesMixin

Class to get data from Magpie files. See also: L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials, Npj Comput. Mater. 2 (2016) 16028.

Finding the exact meaning of each of these features can be quite difficult. Reproduced in ./data_files/magpie_elementdata_feature_descriptions.txt.

__init__()
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.MatscholarElementData

Bases: AbstractData

Class to get word embedding vectors of elements. These word embeddings were generated using NLP + Neural Network techniques on more than 3 million scientific abstracts.

The data returned by this class are simply learned representations of the elements, taken from:

Tshitoyan, V., Dagdelen, J., Weston, L. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019). https://doi.org/10.1038/s41586-019-1335-8

__init__()
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.MixingEnthalpy

Bases: object

Values of \Delta H^{max}_{AB} for different pairs of elements.

Based on the Miedema model. Tabulated by:

A. Takeuchi, A. Inoue, Classification of Bulk Metallic Glasses by Atomic Size Difference, Heat of Mixing and Period of Constituent Elements and Its Application to Characterization of the Main Alloying Element. Mater. Trans. 46, 2817–2829 (2005).

Attributes:
valid_element_list ([Element]): A list of elements for which the

mixing enthalpy parameters are defined (although no guarantees are provided that all combinations of this list will be available).

__init__()
get_mixing_enthalpy(elemA, elemB)

Get the mixing enthalpy between different elements

Args:

elemA (Element): An element elemB (Element): Second element

Returns:

(float) mixing enthalpy, nan if pair is not in a table

class matminer.utils.data.OxidationStateDependentData

Bases: AbstractData

Abstract class that also includes oxidation-state-dependent properties

abstract get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_charge_dependent_property_from_specie(specie, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

specie - (Specie), Specie of interest property_name - (string), name of property

Return:

(float) - Value of property

class matminer.utils.data.OxidationStatesMixin

Bases: object

Abstract class interface for retrieving the oxidation states of each element

abstract get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.PymatgenData(use_common_oxi_states=True)

Bases: OxidationStateDependentData, OxidationStatesMixin

Class to get data from pymatgen. See also: S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, Comput. Mater. Sci. 68 (2013) 314-319.

Meanings of each feature can be obtained from the pymatgen.Composition documentation (attributes).

__init__(use_common_oxi_states=True)
get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Get the oxidation states of an element

Args:

elem - (Element) target element common - (boolean), whether to return only the common oxidation states,

or all known oxidation states

Returns:

[int] list of oxidation states

matminer.utils.flatten_dict module

matminer.utils.flatten_dict.flatten_dict(nested_dict, lead_key=None, unwind_arrays=True)

Helper function to flatten nested dictionary, recursively walks through nested dictionary to get keys corresponding to dot-notation keys, e. g. converts {“a”: {“b”: 1, “c”: 2}} to {“a.b”: 1, “a.c”: 2}

Args:

nested_dict ({}): nested dictionary to flatten unwind_arrays (bool): whether to flatten lists/tuples

with numerically indexed dot notation, defaults to True

lead_key (str): string to append to front of all keys,

used primarily for recursion

Returns:

non-nested dictionary

matminer.utils.io module

This module defines functions for writing and reading matminer related objects

matminer.utils.io.load_dataframe_from_json(filename, pbar=True, decode=True)

Load pandas dataframe from a json file.

Automatically decodes and instantiates pymatgen objects in the dataframe.

Args:
filename (str): Path to json file. Can be a compressed file (gz and bz2)

are supported.

pbar (bool): If true, shows an ASCII progress bar for loading data from disk. decode (bool): If true, will automatically decode objects (slow, convenient).

If false, will return json representations of the objects (fast, inconvenient).

Returns:

(Pandas.DataFrame): A pandas dataframe.

matminer.utils.io.store_dataframe_as_json(dataframe, filename, compression=None, orient='split', pbar=True)

Store pandas dataframe as a json file.

Automatically encodes pymatgen objects as dictionaries.

Args:

dataframe (Pandas.Dataframe): A pandas dataframe. filename (str): Path to json file. compression (str or None): A compression mode. Valid options are “gz”,

“bz2”, and None. Defaults to None. If the filename does not end in with the correct suffix it will be added automatically.

orient (str): Determines the format in which the dictionary data is

stored. This takes the same set of arguments as the orient option in pandas.DataFrame.to_dict() function. ‘split’ is recommended as it is relatively space efficient and preserves the dtype of the index.

pbar (bool): If True, shows a progress bar for encoding objects to

compatible json format (normally the rate-limiting step).

matminer.utils.kernels module

matminer.utils.kernels.gaussian_kernel(arr0, arr1, SIGMA)

Returns a Gaussian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

matminer.utils.kernels.laplacian_kernel(arr0, arr1, SIGMA)

Returns a Laplacian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

matminer.utils.pipeline module

class matminer.utils.pipeline.DropExcluded(excluded)

Bases: BaseEstimator, TransformerMixin

Transformer for removing unwanted columns from a dataframe. Passes back the remaining columns.

Helper class for making sklearn pipelines with matminer.

Args:

excluded (list of labels): A list of column labels to drop from the dataframe

__init__(excluded)
fit(x, y=None)
set_fit_request(*, x: bool | None | str = '$UNCHANGED$') DropExcluded

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for x parameter in fit.

Returns

selfobject

The updated object.

set_transform_request(*, df: bool | None | str = '$UNCHANGED$') DropExcluded

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for df parameter in transform.

Returns

selfobject

The updated object.

transform(df)
class matminer.utils.pipeline.ItemSelector(label)

Bases: BaseEstimator, TransformerMixin

A utility for extracting a column from a DataFrame in a sklearn pipeline, for example in a FeatureUnion pipeline to featurize a dataset.

Helper class for making sklearn pipelines with matminer.

See (http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html)

Args:

label : The label of the column to select.

__init__(label)
fit(x, y=None)
set_fit_request(*, x: bool | None | str = '$UNCHANGED$') ItemSelector

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for x parameter in fit.

Returns

selfobject

The updated object.

set_transform_request(*, dataframe: bool | None | str = '$UNCHANGED$') ItemSelector

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

dataframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for dataframe parameter in transform.

Returns

selfobject

The updated object.

transform(dataframe)

matminer.utils.utils module

matminer.utils.utils.homogenize_multiindex(df, default_key, coerce=False)

Homogenizes a dataframe column index to a 2-level multiindex.

Args:

df (pandas DataFrame): A dataframe default_key (str): The key to use when a single Index must be converted

to a 2-level index. This key is then used as a parent of all keys present in the original 1-level index.

coerce (bool): If True, try to force a 2+ level multiindex to a 2-level

multiindex.

Returns:

df (pandas DataFrame): A dataframe with a 2-layer multiindex.

Module contents