matminer.utils package¶
Subpackages¶
- matminer.utils.data_files package
- matminer.utils.tests package
- Submodules
- matminer.utils.tests.test_caching module
- matminer.utils.tests.test_data module
- matminer.utils.tests.test_flatten_dict module
- matminer.utils.tests.test_io module
- Module contents
Submodules¶
matminer.utils.caching module¶
Provides utility functions for caching the results of expensive operations, such as determining the nearest neighbors of atoms in a structure
- matminer.utils.caching.get_all_nearest_neighbors(method, structure)¶
Get the nearest neighbor list of a structure
- Args:
method (NearNeighbor) - Method used to compute nearest neighbors structure (IStructure) - Structure to study
- Returns:
Output of method.get_all_nn_info(structure)
- matminer.utils.caching.get_nearest_neighbors(method, structure, site_idx)¶
Get the nearest neighbor list of a particular site in a structure
- Args:
method (NearNeighbor) - Method used to compute nearest neighbors structure (Structure) - Structure to study site_idx (int) - Index of site to study
- Returns:
Output of method.get_nn_info(structure, site_idx)
matminer.utils.data module¶
Utility classes for retrieving elemental properties. Provides
a uniform interface to several different elemental property resources
including pymatgen
and Magpie
.
- class matminer.utils.data.AbstractData¶
Bases:
object
Abstract class for retrieving elemental properties
All classes must implement the get_elemental_property operation. These operations should return scalar values (ideally floats) and nan if a property does not exist
- get_elemental_properties(elems, property_name)¶
Get elemental properties for a list of elements
- Args:
elems - ([Element]) list of elements property_name - (str) property to be retrieved
- Returns:
[float], properties of elements
- abstract get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.CohesiveEnergyData¶
Bases:
AbstractData
Get the cohesive energy of an element.
Data is extracted from KnowledgeDoor Cohesive Energy Handbook online (http://www.knowledgedoor.com/2/elements_handbook/cohesive_energy.html), which in turn got the data from Introduction to Solid State Physics, 8th Edition, by Charles Kittel (ISBN 978-0-471-41526-8), 2005.
- __init__()¶
- get_elemental_property(elem, property_name='cohesive energy')¶
- Args:
elem: (Element) Element of interest property_name (str): unused, always returns cohesive energy
- Returns:
(float): cohesive energy of the element
- class matminer.utils.data.DemlData¶
Bases:
OxidationStateDependentData
,OxidationStatesMixin
Class to get data from Deml data file. See also: A.M. Deml, R. O’Hayre, C. Wolverton, V. Stevanovic, Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression, Phys. Rev. B - Condens. Matter Mater. Phys. 93 (2016).
The meanings of each feature in the data can be found in ./data_files/deml_elementdata.py
- __init__()¶
- get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.IUCrBondValenceData(interpolate_soft=True)¶
Bases:
object
Get empirical bond valence parameters.
Data come from International Union of Crystallography 2016 tables. (https://www.iucr.org/resources/data/datasets/bond-valence-parameters) Both the raw source CIF and cleaned csv file are made accessible here. Within the source CIF, there are citations for every set of parameters.
The copyright notice and disclaimer are reproduced below #*************************************************************** # COPYRIGHT NOTICE # This table may be used and distributed without fee for # non-profit purposes providing # 1) that this copyright notice is included and # 2) no fee is charged for the table and # 3) details of any changes made in this list by anyone other than # the copyright owner are suitably noted in the _audit_update record # Please consult the copyright owner regarding any other uses. # # The copyright is owned by I. David Brown, Brockhouse Institute for # Materials Research, McMaster University, Hamilton, Ontario Canada. # idbrown@mcmaster.ca # #*****************************DISCLAIMER************************ # # The values reported here are taken from the literature and # other sources and the author does not warrant their correctness # nor accept any responsibility for errors. Users are advised to # consult the primary sources. # #***************************************************************
- __init__(interpolate_soft=True)¶
Load bond valence parameters as pandas dataframe.
If interpolate_soft is True, fill in some missing values for anions such as I, Br, N, S, Se, etc. with the assumption that bond valence parameters of such anions don’t depend on cation oxidation state. This assumption comes from Brese and O’Keeffe, (1991), Acta Cryst. B47, 194, which states “with less electronegative anions, … R is not very different for different oxidation states in general.” In the original data source file, only one set of parameters is usually provided for those less electronegative anions in a 9+ oxidation state, indicating they can be used with all oxidation states.
- get_bv_params(cation, anion, cat_val, an_val)¶
Lookup bond valence parameters from IUPAC table. Args:
cation (Element): cation element anion (Element): anion element cat_val (Integer): cation formal oxidation state an_val (Integer): anion formal oxidation state
- Returns:
bond_val_list: dataframe of bond valence parameters
- interpolate_soft_anions()¶
Fill in missing parameters for oxidation states of soft anions.
- class matminer.utils.data.MEGNetElementData¶
Bases:
AbstractData
Class to get neural network embeddings of elements. These embeddings were generated using the Materials Graph Network (MEGNet) developed by the MaterialsVirtualLab at U.C. San Diego and described in the publication:
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong Chemistry of Materials 2019 31 (9), 3564-3572, https://doi.org/10.1021/acs.chemmater.9b01294
The code for MEGNet can be found at: https://github.com/materialsvirtuallab/megnet
The embeddings were generated by training the MEGNet Graph Network on 60,000 structures from the Materials Project for predicting formation energy, and may be an effective way of applying transfer learning to smaller datasets using crystal-graph-based networks.
The representations are learned during training to predict a specific property, though they may be useful for a range of properties.
- __init__()¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.MagpieData¶
Bases:
AbstractData
,OxidationStatesMixin
Class to get data from Magpie files. See also: L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials, Npj Comput. Mater. 2 (2016) 16028.
Finding the exact meaning of each of these features can be quite difficult. Reproduced in ./data_files/magpie_elementdata_feature_descriptions.txt.
- __init__()¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.MatscholarElementData¶
Bases:
AbstractData
Class to get word embedding vectors of elements. These word embeddings were generated using NLP + Neural Network techniques on more than 3 million scientific abstracts.
The data returned by this class are simply learned representations of the elements, taken from:
Tshitoyan, V., Dagdelen, J., Weston, L. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019). https://doi.org/10.1038/s41586-019-1335-8
- __init__()¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.MixingEnthalpy¶
Bases:
object
Values of \Delta H^{max}_{AB} for different pairs of elements.
- Based on the Miedema model. Tabulated by:
A. Takeuchi, A. Inoue, Classification of Bulk Metallic Glasses by Atomic Size Difference, Heat of Mixing and Period of Constituent Elements and Its Application to Characterization of the Main Alloying Element. Mater. Trans. 46, 2817–2829 (2005).
- Attributes:
- valid_element_list ([Element]): A list of elements for which the
mixing enthalpy parameters are defined (although no guarantees are provided that all combinations of this list will be available).
- __init__()¶
- get_mixing_enthalpy(elemA, elemB)¶
Get the mixing enthalpy between different elements
- Args:
elemA (Element): An element elemB (Element): Second element
- Returns:
(float) mixing enthalpy, nan if pair is not in a table
- class matminer.utils.data.OxidationStateDependentData¶
Bases:
AbstractData
Abstract class that also includes oxidation-state-dependent properties
- abstract get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_charge_dependent_property_from_specie(specie, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
specie - (Specie), Specie of interest property_name - (string), name of property
- Return:
(float) - Value of property
- class matminer.utils.data.OxidationStatesMixin¶
Bases:
object
Abstract class interface for retrieving the oxidation states of each element
- abstract get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.PymatgenData(use_common_oxi_states=True)¶
Bases:
OxidationStateDependentData
,OxidationStatesMixin
Class to get data from pymatgen. See also: S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, Comput. Mater. Sci. 68 (2013) 314-319.
Meanings of each feature can be obtained from the pymatgen.Composition documentation (attributes).
- __init__(use_common_oxi_states=True)¶
- get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Get the oxidation states of an element
- Args:
elem - (Element) target element common - (boolean), whether to return only the common oxidation states,
or all known oxidation states
- Returns:
[int] list of oxidation states
matminer.utils.flatten_dict module¶
- matminer.utils.flatten_dict.flatten_dict(nested_dict, lead_key=None, unwind_arrays=True)¶
Helper function to flatten nested dictionary, recursively walks through nested dictionary to get keys corresponding to dot-notation keys, e. g. converts {“a”: {“b”: 1, “c”: 2}} to {“a.b”: 1, “a.c”: 2}
- Args:
nested_dict ({}): nested dictionary to flatten unwind_arrays (bool): whether to flatten lists/tuples
with numerically indexed dot notation, defaults to True
- lead_key (str): string to append to front of all keys,
used primarily for recursion
- Returns:
non-nested dictionary
matminer.utils.io module¶
This module defines functions for writing and reading matminer related objects
- matminer.utils.io.load_dataframe_from_json(filename, pbar=True, decode=True)¶
Load pandas dataframe from a json file.
Automatically decodes and instantiates pymatgen objects in the dataframe.
- Args:
- filename (str): Path to json file. Can be a compressed file (gz and bz2)
are supported.
pbar (bool): If true, shows an ASCII progress bar for loading data from disk. decode (bool): If true, will automatically decode objects (slow, convenient).
If false, will return json representations of the objects (fast, inconvenient).
- Returns:
(Pandas.DataFrame): A pandas dataframe.
- matminer.utils.io.store_dataframe_as_json(dataframe, filename, compression=None, orient='split', pbar=True)¶
Store pandas dataframe as a json file.
Automatically encodes pymatgen objects as dictionaries.
- Args:
dataframe (Pandas.Dataframe): A pandas dataframe. filename (str): Path to json file. compression (str or None): A compression mode. Valid options are “gz”,
“bz2”, and None. Defaults to None. If the filename does not end in with the correct suffix it will be added automatically.
- orient (str): Determines the format in which the dictionary data is
stored. This takes the same set of arguments as the orient option in pandas.DataFrame.to_dict() function. ‘split’ is recommended as it is relatively space efficient and preserves the dtype of the index.
- pbar (bool): If True, shows a progress bar for encoding objects to
compatible json format (normally the rate-limiting step).
matminer.utils.kernels module¶
- matminer.utils.kernels.gaussian_kernel(arr0, arr1, SIGMA)¶
Returns a Gaussian kernel of the two arrays for use in KRR or other regressions using the kernel trick.
- matminer.utils.kernels.laplacian_kernel(arr0, arr1, SIGMA)¶
Returns a Laplacian kernel of the two arrays for use in KRR or other regressions using the kernel trick.
matminer.utils.pipeline module¶
- class matminer.utils.pipeline.DropExcluded(excluded)¶
Bases:
BaseEstimator
,TransformerMixin
Transformer for removing unwanted columns from a dataframe. Passes back the remaining columns.
Helper class for making sklearn pipelines with matminer.
- Args:
excluded (list of labels): A list of column labels to drop from the dataframe
- __init__(excluded)¶
- fit(x, y=None)¶
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') DropExcluded ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
x
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, df: bool | None | str = '$UNCHANGED$') DropExcluded ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
df
parameter intransform
.
Returns¶
- selfobject
The updated object.
- transform(df)¶
- class matminer.utils.pipeline.ItemSelector(label)¶
Bases:
BaseEstimator
,TransformerMixin
A utility for extracting a column from a DataFrame in a sklearn pipeline, for example in a FeatureUnion pipeline to featurize a dataset.
Helper class for making sklearn pipelines with matminer.
See (http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html)
- Args:
label : The label of the column to select.
- __init__(label)¶
- fit(x, y=None)¶
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') ItemSelector ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
x
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, dataframe: bool | None | str = '$UNCHANGED$') ItemSelector ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- dataframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dataframe
parameter intransform
.
Returns¶
- selfobject
The updated object.
- transform(dataframe)¶
matminer.utils.utils module¶
- matminer.utils.utils.homogenize_multiindex(df, default_key, coerce=False)¶
Homogenizes a dataframe column index to a 2-level multiindex.
- Args:
df (pandas DataFrame): A dataframe default_key (str): The key to use when a single Index must be converted
to a 2-level index. This key is then used as a parent of all keys present in the original 1-level index.
- coerce (bool): If True, try to force a 2+ level multiindex to a 2-level
multiindex.
- Returns:
df (pandas DataFrame): A dataframe with a 2-layer multiindex.