matminer.utils package

Subpackages

Submodules

matminer.utils.caching module

Provides utility functions for caching the results of expensive operations, such as determining the nearest neighbors of atoms in a structure

matminer.utils.caching.get_all_nearest_neighbors(method, structure)

Get the nearest neighbor list of a structure

Args:

method (NearNeighbor) - Method used to compute nearest neighbors structure (IStructure) - Structure to study

Returns:

Output of method.get_all_nn_info(structure)

matminer.utils.caching.get_nearest_neighbors(method, structure, site_idx)

Get the nearest neighbor list of a particular site in a structure

Args:

method (NearNeighbor) - Method used to compute nearest neighbors structure (Structure) - Structure to study site_idx (int) - Index of site to study

Returns:

Output of method.get_nn_info(structure, site_idx)

matminer.utils.data module

Utility classes for retrieving elemental properties. Provides a uniform interface to several different elemental property resources including pymatgen and Magpie.

class matminer.utils.data.AbstractData

Bases: object

Abstract class for retrieving elemental properties

All classes must implement the get_elemental_property operation. These operations should return scalar values (ideally floats) and nan if a property does not exist

get_elemental_properties(elems, property_name)

Get elemental properties for a list of elements

Args:

elems - ([Element]) list of elements property_name - (str) property to be retrieved

Returns:

[float], properties of elements

abstract get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.CohesiveEnergyData(impute_nan=False)

Bases: AbstractData

Get the cohesive energy of an element.

Data is extracted from KnowledgeDoor Cohesive Energy Handbook online (http://www.knowledgedoor.com/2/elements_handbook/cohesive_energy.html), which in turn got the data from Introduction to Solid State Physics, 8th Edition, by Charles Kittel (ISBN 978-0-471-41526-8), 2005.

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_elemental_property(elem, property_name='cohesive energy')
Args:

elem: (Element) Element of interest property_name (str): unused, always returns cohesive energy

Returns:

(float): cohesive energy of the element

class matminer.utils.data.DemlData(impute_nan=False)

Bases: OxidationStateDependentData, OxidationStatesMixin

Class to get data from Deml data file. See also: A.M. Deml, R. O’Hayre, C. Wolverton, V. Stevanovic, Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression, Phys. Rev. B - Condens. Matter Mater. Phys. 93 (2016).

The meanings of each feature in the data can be found in ./data_files/deml_elementdata.py

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.IUCrBondValenceData(interpolate_soft=True)

Bases: object

Get empirical bond valence parameters.

Data come from International Union of Crystallography 2016 tables. (https://www.iucr.org/resources/data/datasets/bond-valence-parameters) Both the raw source CIF and cleaned csv file are made accessible here. Within the source CIF, there are citations for every set of parameters.

The copyright notice and disclaimer are reproduced below #*************************************************************** # COPYRIGHT NOTICE # This table may be used and distributed without fee for # non-profit purposes providing # 1) that this copyright notice is included and # 2) no fee is charged for the table and # 3) details of any changes made in this list by anyone other than # the copyright owner are suitably noted in the _audit_update record # Please consult the copyright owner regarding any other uses. # # The copyright is owned by I. David Brown, Brockhouse Institute for # Materials Research, McMaster University, Hamilton, Ontario Canada. # idbrown@mcmaster.ca # #*****************************DISCLAIMER************************ # # The values reported here are taken from the literature and # other sources and the author does not warrant their correctness # nor accept any responsibility for errors. Users are advised to # consult the primary sources. # #***************************************************************

__init__(interpolate_soft=True)

Load bond valence parameters as pandas dataframe.

If interpolate_soft is True, fill in some missing values for anions such as I, Br, N, S, Se, etc. with the assumption that bond valence parameters of such anions don’t depend on cation oxidation state. This assumption comes from Brese and O’Keeffe, (1991), Acta Cryst. B47, 194, which states “with less electronegative anions, … R is not very different for different oxidation states in general.” In the original data source file, only one set of parameters is usually provided for those less electronegative anions in a 9+ oxidation state, indicating they can be used with all oxidation states.

get_bv_params(cation, anion, cat_val, an_val)

Lookup bond valence parameters from IUPAC table. Args:

cation (Element): cation element anion (Element): anion element cat_val (Integer): cation formal oxidation state an_val (Integer): anion formal oxidation state

Returns:

bond_val_list: dataframe of bond valence parameters

interpolate_soft_anions()

Fill in missing parameters for oxidation states of soft anions.

class matminer.utils.data.MEGNetElementData(impute_nan=False)

Bases: AbstractData

Class to get neural network embeddings of elements. These embeddings were generated using the Materials Graph Network (MEGNet) developed by the MaterialsVirtualLab at U.C. San Diego and described in the publication:

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong Chemistry of Materials 2019 31 (9), 3564-3572, https://doi.org/10.1021/acs.chemmater.9b01294

The code for MEGNet can be found at: https://github.com/materialsvirtuallab/megnet

The embeddings were generated by training the MEGNet Graph Network on 60,000 structures from the Materials Project for predicting formation energy, and may be an effective way of applying transfer learning to smaller datasets using crystal-graph-based networks.

The representations are learned during training to predict a specific property, though they may be useful for a range of properties.

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.MagpieData(impute_nan=False)

Bases: AbstractData, OxidationStatesMixin

Class to get data from Magpie files. See also: L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials, Npj Comput. Mater. 2 (2016) 16028.

Finding the exact meaning of each of these features can be quite difficult. Reproduced in ./data_files/magpie_elementdata_feature_descriptions.txt.

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.MatscholarElementData(impute_nan=False)

Bases: AbstractData

Class to get word embedding vectors of elements. These word embeddings were generated using NLP + Neural Network techniques on more than 3 million scientific abstracts.

The data returned by this class are simply learned representations of the elements, taken from:

Tshitoyan, V., Dagdelen, J., Weston, L. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019). https://doi.org/10.1038/s41586-019-1335-8

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.MixingEnthalpy(impute_nan=False)

Bases: object

Values of \Delta H^{max}_{AB} for different pairs of elements.

Based on the Miedema model. Tabulated by:

A. Takeuchi, A. Inoue, Classification of Bulk Metallic Glasses by Atomic Size Difference, Heat of Mixing and Period of Constituent Elements and Its Application to Characterization of the Main Alloying Element. Mater. Trans. 46, 2817–2829 (2005).

Attributes:
valid_element_list ([Element]): A list of elements for which the

mixing enthalpy parameters are defined (although no guarantees are provided that all combinations of this list will be available).

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(impute_nan=False)
get_mixing_enthalpy(elemA, elemB)

Get the mixing enthalpy between different elements

Args:

elemA (Element): An element elemB (Element): Second element

Returns:

(float) mixing enthalpy, nan if pair is not in a table

class matminer.utils.data.OpticalData(props=None, method='pseudo_inverse', min_wl=0.38, max_wl=0.78, n_wl=401, bins=10, saving_dir='~/.matminer/optical_props/', impute_nan=False)

Bases: AbstractData

Class to use optical data from https://www.refractiveindex.info The properties are the refractive index n, the extinction coefficient ĸ (measured or computed with DFT), and the reflectivity R as obtained from Fresnel’s equation. Data is by default considered if available from 380 to 780 nm, but other ranges can be chosen as well.

In case new data becomes available and needs to be added to the database, it should be added in matminer/utils/data_files/optical_polyanskiy/database, which should then be compressed in the tar.xz format. To add a file for a compound, follow any of the formats of refractiveindex.info.

The database is used to extract: 1) the properties of single elements when available. 2) the pseudo-inverse of the properties of single elements,

based on the data for ~200 compounds. These pseudo-inverse contributions correspond to the coefficients of a least-square fit from the compositions to the properties. This can allow to better take into account data from different compounds for a given element.

Using the pseudo-inverses (method=”pseudo_inverse”) instead of the elemental properties (method=”exact”) leads to better results as far as we have checked. Another possibility is to use method=”combined”, where the exact values are taken for compounds present as pure compounds in the database, and the pseudo-inverse is taken if the element is not present purely in the database.

n, ĸ, and R are spectra. These are composed of n_wl wavelengths, from min_wl to max_wl. We split these spectra into bins (initially 10) where their average values are taken. These averaged values are the final features. The wavelength corresponding to a given bin is its midpoint.

Args:
props: optical properties to include. Should be a list with

“refractive” and/or “extinction” and/or “reflectivity”.

method: type of values, either “exact”, “pseudo_inverse”, or “combined”. min_wl: minimum wavelength to include in the spectra (µm). max_wl : maximum wavelength to include in the spectra (µm). n_wl: number of wavelengths to include in the spectra. bins: number of bins to split the spectra. saving_dir: folder to save the data and csv file used for the featurization. Saving them helps fasten the

featurization.

impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(props=None, method='pseudo_inverse', min_wl=0.38, max_wl=0.78, n_wl=401, bins=10, saving_dir='~/.matminer/optical_props/', impute_nan=False)
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

class matminer.utils.data.OxidationStateDependentData

Bases: AbstractData

Abstract class that also includes oxidation-state-dependent properties

abstract get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_charge_dependent_property_from_specie(specie, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

specie - (Specie), Specie of interest property_name - (string), name of property

Return:

(float) - Value of property

class matminer.utils.data.OxidationStatesMixin

Bases: object

Abstract class interface for retrieving the oxidation states of each element

abstract get_oxidation_states(elem)

Retrieve the possible oxidation states of an element

Args:

elem - (Element), Target element

Returns:

[int] - oxidation states

class matminer.utils.data.PymatgenData(use_common_oxi_states=True, impute_nan=False)

Bases: OxidationStateDependentData, OxidationStatesMixin

Class to get data from pymatgen. See also: S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, Comput. Mater. Sci. 68 (2013) 314-319.

Meanings of each feature can be obtained from the pymatgen.Composition documentation (attributes).

Args:
impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(use_common_oxi_states=True, impute_nan=False)
get_charge_dependent_property(element, charge, property_name)

Retrieve a oxidation-state dependent elemental property

Args:

element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property

Return:

(float) - Value of property

get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

get_oxidation_states(elem)

Get the oxidation states of an element

Args:

elem - (Element) target element common - (boolean), whether to return only the common oxidation states,

or all known oxidation states

Returns:

[int] list of oxidation states

class matminer.utils.data.TransportData(props=None, method='pseudo_inverse', alpha=0, saving_dir='~/.matminer/transport_props/', impute_nan=False)

Bases: AbstractData

Class to use transport data from Ricci et al., see An ab initio electronic transport database for inorganic materials. Ricci, F., Chen, W., Aydemir, U., Snyder, G. J., Rignanese, G. M., Jain, A., & Hautier, G. (2017). Scientific data, 4(1), 1-13. https://doi.org/10.1038/sdata.2017.85

The database has been used to extract: 1) the properties of single elements when available.

These are stored in matminer/utils/data_files/mp_transport/transport_pure_elems.csv

  1. the pseudo-inverse of the properties of single elements. These pseudo-inverse contributions correspond to the coefficients of a least-square fit from the compositions to the properties. This can allow to better take into account data from different compounds for a given element.

Using the pseudo-inverses (method=”pseudo_inverse”) instead of the elemental properties (method=”exact”) leads to better results as far as we have checked. Another possibility is to use method=”combined”, where the exact values are taken for compounds present as pure compounds in the database, and the pseudo-inverse is taken if the element is not present purely in the database.

For the effective mass, the pseudo-inverse is obtained on 1/(alpha+m), then m is re-obtained for single elements. This is to avoid huge errors coming from the huge spread in data (12 orders of magnitude).

Args:
props: optical properties to include. Should be a (sub)list of

[“sigma_p”, “sigma_n”, “S_p”, “S_n”, “kappa_p”, “kappa_n”, “PF_p”, “PF_n”, “m_p”, “m_n”] for the hole (_p) and electron (_n) conductivity (sigma), Seebeck coefficient (S), thermal conductivity (kappa), power factor (PF) and effective mass (m).

method: type of values, either “exact”, “pseudo_inverse”, or “combined”. alpha: Value used to featurize the effective mass.

The values of the effective masses span 12 orders of magnitude, which makes the pseudo-inverse biased To overcome this, we use 1 / (alpha + m) for the pseudo-inversion. The value of alpha can be tested. A file for each of them is created, so that it is not computed each time. Defaults to 0, and used only if method != “exact”.

saving_dir: folder to save the data and csv file used for the featurization. Saving them helps fasten the

featurization.

impute_nan (bool): if True, the features for the elements

that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.

__init__(props=None, method='pseudo_inverse', alpha=0, saving_dir='~/.matminer/transport_props/', impute_nan=False)
get_elemental_property(elem, property_name)

Get a certain elemental property for a certain element.

Args:

elem - (Element) element to be assessed property_name - (str) property to be retrieved

Returns:

float, property of that element

matminer.utils.flatten_dict module

matminer.utils.flatten_dict.flatten_dict(nested_dict, lead_key=None, unwind_arrays=True)

Helper function to flatten nested dictionary, recursively walks through nested dictionary to get keys corresponding to dot-notation keys, e. g. converts {“a”: {“b”: 1, “c”: 2}} to {“a.b”: 1, “a.c”: 2}

Args:

nested_dict ({}): nested dictionary to flatten unwind_arrays (bool): whether to flatten lists/tuples

with numerically indexed dot notation, defaults to True

lead_key (str): string to append to front of all keys,

used primarily for recursion

Returns:

non-nested dictionary

matminer.utils.io module

This module defines functions for writing and reading matminer related objects

matminer.utils.io.load_dataframe_from_json(filename, pbar=True, decode=True)

Load pandas dataframe from a json file.

Automatically decodes and instantiates pymatgen objects in the dataframe.

Args:
filename (str): Path to json file. Can be a compressed file (gz and bz2)

are supported.

pbar (bool): If true, shows an ASCII progress bar for loading data from disk. decode (bool): If true, will automatically decode objects (slow, convenient).

If false, will return json representations of the objects (fast, inconvenient).

Returns:

(Pandas.DataFrame): A pandas dataframe.

matminer.utils.io.store_dataframe_as_json(dataframe, filename, compression=None, orient='split', pbar=True)

Store pandas dataframe as a json file.

Automatically encodes pymatgen objects as dictionaries.

Args:

dataframe (Pandas.Dataframe): A pandas dataframe. filename (str): Path to json file. compression (str or None): A compression mode. Valid options are “gz”,

“bz2”, and None. Defaults to None. If the filename does not end in with the correct suffix it will be added automatically.

orient (str): Determines the format in which the dictionary data is

stored. This takes the same set of arguments as the orient option in pandas.DataFrame.to_dict() function. ‘split’ is recommended as it is relatively space efficient and preserves the dtype of the index.

pbar (bool): If True, shows a progress bar for encoding objects to

compatible json format (normally the rate-limiting step).

matminer.utils.kernels module

matminer.utils.kernels.gaussian_kernel(arr0, arr1, SIGMA)

Returns a Gaussian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

matminer.utils.kernels.laplacian_kernel(arr0, arr1, SIGMA)

Returns a Laplacian kernel of the two arrays for use in KRR or other regressions using the kernel trick.

matminer.utils.pipeline module

class matminer.utils.pipeline.DropExcluded(excluded)

Bases: BaseEstimator, TransformerMixin

Transformer for removing unwanted columns from a dataframe. Passes back the remaining columns.

Helper class for making sklearn pipelines with matminer.

Args:

excluded (list of labels): A list of column labels to drop from the dataframe

__init__(excluded)
fit(x, y=None)
set_fit_request(*, x: bool | None | str = '$UNCHANGED$') DropExcluded

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for x parameter in fit.

Returns

selfobject

The updated object.

set_transform_request(*, df: bool | None | str = '$UNCHANGED$') DropExcluded

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for df parameter in transform.

Returns

selfobject

The updated object.

transform(df)
class matminer.utils.pipeline.ItemSelector(label)

Bases: BaseEstimator, TransformerMixin

A utility for extracting a column from a DataFrame in a sklearn pipeline, for example in a FeatureUnion pipeline to featurize a dataset.

Helper class for making sklearn pipelines with matminer.

See (http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html)

Args:

label : The label of the column to select.

__init__(label)
fit(x, y=None)
set_fit_request(*, x: bool | None | str = '$UNCHANGED$') ItemSelector

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for x parameter in fit.

Returns

selfobject

The updated object.

set_transform_request(*, dataframe: bool | None | str = '$UNCHANGED$') ItemSelector

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters

dataframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for dataframe parameter in transform.

Returns

selfobject

The updated object.

transform(dataframe)

matminer.utils.utils module

Utility module

matminer.utils.utils.get_elem_in_data(df, as_pure=False)

Looks for all elements present in the compounds forming the index of a dataframe.

Args:

df: DataFrame containing the compounds formula (as strings) as index as_pure: if True, consider only the pure compounds

Returns:

List of elements (str)

matminer.utils.utils.get_pseudo_inverse(df_init, cols=None)

Computes the pseudo-inverse matrix of a dataframe containing properties for multiple compositions. From a compositions matrix (containing the elemental fractions of each element present in the data), and a property column (or multiple properties gathered in a matrix), the pseudo-inverse column (or matrix if multiple properties are present) is defined by

compositions * pseudo-inverse = property

The pseudo-inverse coefficients are therefore average contributions of each element present in the data to the property of the compounds containing this element (in a least-square fit manner). This allows to take many compounds-property into consideration.

Note that the pseudo-inverse coefficients do not represent the property of single elements in their pure form: negative values may appear for single elements and for physically-positive properties, but this only reflects the fact that the presence of the element in compounds generally decreases the value of the property.

For elements that are not present in compounds of the data, nan are returned.

Args:
df_init: DataFrame with a column named “Composition” containing compositions

(anything that can be turned into a Pymatgen Composition object), and other columns containing properties to be inversed.

cols: list of columns of the dataframe giving the features to be pseudo-inversed.

Returns:

DataFrame with the pseudo-inverse coefficients for all elements present in the initial compositions and all properties.

matminer.utils.utils.homogenize_multiindex(df, default_key, coerce=False)

Homogenizes a dataframe column index to a 2-level multiindex.

Args:

df (pandas DataFrame): A dataframe default_key (str): The key to use when a single Index must be converted

to a 2-level index. This key is then used as a parent of all keys present in the original 1-level index.

coerce (bool): If True, try to force a 2+ level multiindex to a 2-level

multiindex.

Returns:

df (pandas DataFrame): A dataframe with a 2-layer multiindex.

matminer.utils.warnings module

Module contents