matminer.utils package¶
Subpackages¶
- matminer.utils.data_files package
- matminer.utils.tests package
- Submodules
- matminer.utils.tests.test_caching module
- matminer.utils.tests.test_data module
- matminer.utils.tests.test_flatten_dict module
- matminer.utils.tests.test_io module
- matminer.utils.tests.test_utils module
- Module contents
Submodules¶
matminer.utils.caching module¶
Provides utility functions for caching the results of expensive operations, such as determining the nearest neighbors of atoms in a structure
- matminer.utils.caching.get_all_nearest_neighbors(method, structure)¶
Get the nearest neighbor list of a structure
- Args:
method (NearNeighbor) - Method used to compute nearest neighbors structure (IStructure) - Structure to study
- Returns:
Output of method.get_all_nn_info(structure)
- matminer.utils.caching.get_nearest_neighbors(method, structure, site_idx)¶
Get the nearest neighbor list of a particular site in a structure
- Args:
method (NearNeighbor) - Method used to compute nearest neighbors structure (Structure) - Structure to study site_idx (int) - Index of site to study
- Returns:
Output of method.get_nn_info(structure, site_idx)
matminer.utils.data module¶
Utility classes for retrieving elemental properties. Provides
a uniform interface to several different elemental property resources
including pymatgen
and Magpie
.
- class matminer.utils.data.AbstractData¶
Bases:
object
Abstract class for retrieving elemental properties
All classes must implement the get_elemental_property operation. These operations should return scalar values (ideally floats) and nan if a property does not exist
- get_elemental_properties(elems, property_name)¶
Get elemental properties for a list of elements
- Args:
elems - ([Element]) list of elements property_name - (str) property to be retrieved
- Returns:
[float], properties of elements
- abstract get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.CohesiveEnergyData(impute_nan=False)¶
Bases:
AbstractData
Get the cohesive energy of an element.
Data is extracted from KnowledgeDoor Cohesive Energy Handbook online (http://www.knowledgedoor.com/2/elements_handbook/cohesive_energy.html), which in turn got the data from Introduction to Solid State Physics, 8th Edition, by Charles Kittel (ISBN 978-0-471-41526-8), 2005.
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_elemental_property(elem, property_name='cohesive energy')¶
- Args:
elem: (Element) Element of interest property_name (str): unused, always returns cohesive energy
- Returns:
(float): cohesive energy of the element
- class matminer.utils.data.DemlData(impute_nan=False)¶
Bases:
OxidationStateDependentData
,OxidationStatesMixin
Class to get data from Deml data file. See also: A.M. Deml, R. O’Hayre, C. Wolverton, V. Stevanovic, Predicting density functional theory total energies and enthalpies of formation of metal-nonmetal compounds by linear regression, Phys. Rev. B - Condens. Matter Mater. Phys. 93 (2016).
The meanings of each feature in the data can be found in ./data_files/deml_elementdata.py
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.IUCrBondValenceData(interpolate_soft=True)¶
Bases:
object
Get empirical bond valence parameters.
Data come from International Union of Crystallography 2016 tables. (https://www.iucr.org/resources/data/datasets/bond-valence-parameters) Both the raw source CIF and cleaned csv file are made accessible here. Within the source CIF, there are citations for every set of parameters.
The copyright notice and disclaimer are reproduced below #*************************************************************** # COPYRIGHT NOTICE # This table may be used and distributed without fee for # non-profit purposes providing # 1) that this copyright notice is included and # 2) no fee is charged for the table and # 3) details of any changes made in this list by anyone other than # the copyright owner are suitably noted in the _audit_update record # Please consult the copyright owner regarding any other uses. # # The copyright is owned by I. David Brown, Brockhouse Institute for # Materials Research, McMaster University, Hamilton, Ontario Canada. # idbrown@mcmaster.ca # #*****************************DISCLAIMER************************ # # The values reported here are taken from the literature and # other sources and the author does not warrant their correctness # nor accept any responsibility for errors. Users are advised to # consult the primary sources. # #***************************************************************
- __init__(interpolate_soft=True)¶
Load bond valence parameters as pandas dataframe.
If interpolate_soft is True, fill in some missing values for anions such as I, Br, N, S, Se, etc. with the assumption that bond valence parameters of such anions don’t depend on cation oxidation state. This assumption comes from Brese and O’Keeffe, (1991), Acta Cryst. B47, 194, which states “with less electronegative anions, … R is not very different for different oxidation states in general.” In the original data source file, only one set of parameters is usually provided for those less electronegative anions in a 9+ oxidation state, indicating they can be used with all oxidation states.
- get_bv_params(cation, anion, cat_val, an_val)¶
Lookup bond valence parameters from IUPAC table. Args:
cation (Element): cation element anion (Element): anion element cat_val (Integer): cation formal oxidation state an_val (Integer): anion formal oxidation state
- Returns:
bond_val_list: dataframe of bond valence parameters
- interpolate_soft_anions()¶
Fill in missing parameters for oxidation states of soft anions.
- class matminer.utils.data.MEGNetElementData(impute_nan=False)¶
Bases:
AbstractData
Class to get neural network embeddings of elements. These embeddings were generated using the Materials Graph Network (MEGNet) developed by the MaterialsVirtualLab at U.C. San Diego and described in the publication:
Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chi Chen, Weike Ye, Yunxing Zuo, Chen Zheng, and Shyue Ping Ong Chemistry of Materials 2019 31 (9), 3564-3572, https://doi.org/10.1021/acs.chemmater.9b01294
The code for MEGNet can be found at: https://github.com/materialsvirtuallab/megnet
The embeddings were generated by training the MEGNet Graph Network on 60,000 structures from the Materials Project for predicting formation energy, and may be an effective way of applying transfer learning to smaller datasets using crystal-graph-based networks.
The representations are learned during training to predict a specific property, though they may be useful for a range of properties.
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.MagpieData(impute_nan=False)¶
Bases:
AbstractData
,OxidationStatesMixin
Class to get data from Magpie files. See also: L. Ward, A. Agrawal, A. Choudhary, C. Wolverton, A general-purpose machine learning framework for predicting properties of inorganic materials, Npj Comput. Mater. 2 (2016) 16028.
Finding the exact meaning of each of these features can be quite difficult. Reproduced in ./data_files/magpie_elementdata_feature_descriptions.txt.
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.MatscholarElementData(impute_nan=False)¶
Bases:
AbstractData
Class to get word embedding vectors of elements. These word embeddings were generated using NLP + Neural Network techniques on more than 3 million scientific abstracts.
The data returned by this class are simply learned representations of the elements, taken from:
Tshitoyan, V., Dagdelen, J., Weston, L. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019). https://doi.org/10.1038/s41586-019-1335-8
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.MixingEnthalpy(impute_nan=False)¶
Bases:
object
Values of \Delta H^{max}_{AB} for different pairs of elements.
- Based on the Miedema model. Tabulated by:
A. Takeuchi, A. Inoue, Classification of Bulk Metallic Glasses by Atomic Size Difference, Heat of Mixing and Period of Constituent Elements and Its Application to Characterization of the Main Alloying Element. Mater. Trans. 46, 2817–2829 (2005).
- Attributes:
- valid_element_list ([Element]): A list of elements for which the
mixing enthalpy parameters are defined (although no guarantees are provided that all combinations of this list will be available).
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(impute_nan=False)¶
- get_mixing_enthalpy(elemA, elemB)¶
Get the mixing enthalpy between different elements
- Args:
elemA (Element): An element elemB (Element): Second element
- Returns:
(float) mixing enthalpy, nan if pair is not in a table
- class matminer.utils.data.OpticalData(props=None, method='pseudo_inverse', min_wl=0.38, max_wl=0.78, n_wl=401, bins=10, saving_dir='~/.matminer/optical_props/', impute_nan=False)¶
Bases:
AbstractData
Class to use optical data from https://www.refractiveindex.info The properties are the refractive index n, the extinction coefficient ĸ (measured or computed with DFT), and the reflectivity R as obtained from Fresnel’s equation. Data is by default considered if available from 380 to 780 nm, but other ranges can be chosen as well.
In case new data becomes available and needs to be added to the database, it should be added in matminer/utils/data_files/optical_polyanskiy/database, which should then be compressed in the tar.xz format. To add a file for a compound, follow any of the formats of refractiveindex.info.
The database is used to extract: 1) the properties of single elements when available. 2) the pseudo-inverse of the properties of single elements,
based on the data for ~200 compounds. These pseudo-inverse contributions correspond to the coefficients of a least-square fit from the compositions to the properties. This can allow to better take into account data from different compounds for a given element.
Using the pseudo-inverses (method=”pseudo_inverse”) instead of the elemental properties (method=”exact”) leads to better results as far as we have checked. Another possibility is to use method=”combined”, where the exact values are taken for compounds present as pure compounds in the database, and the pseudo-inverse is taken if the element is not present purely in the database.
n, ĸ, and R are spectra. These are composed of n_wl wavelengths, from min_wl to max_wl. We split these spectra into bins (initially 10) where their average values are taken. These averaged values are the final features. The wavelength corresponding to a given bin is its midpoint.
- Args:
- props: optical properties to include. Should be a list with
“refractive” and/or “extinction” and/or “reflectivity”.
method: type of values, either “exact”, “pseudo_inverse”, or “combined”. min_wl: minimum wavelength to include in the spectra (µm). max_wl : maximum wavelength to include in the spectra (µm). n_wl: number of wavelengths to include in the spectra. bins: number of bins to split the spectra. saving_dir: folder to save the data and csv file used for the featurization. Saving them helps fasten the
featurization.
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(props=None, method='pseudo_inverse', min_wl=0.38, max_wl=0.78, n_wl=401, bins=10, saving_dir='~/.matminer/optical_props/', impute_nan=False)¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- class matminer.utils.data.OxidationStateDependentData¶
Bases:
AbstractData
Abstract class that also includes oxidation-state-dependent properties
- abstract get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_charge_dependent_property_from_specie(specie, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
specie - (Specie), Specie of interest property_name - (string), name of property
- Return:
(float) - Value of property
- class matminer.utils.data.OxidationStatesMixin¶
Bases:
object
Abstract class interface for retrieving the oxidation states of each element
- abstract get_oxidation_states(elem)¶
Retrieve the possible oxidation states of an element
- Args:
elem - (Element), Target element
- Returns:
[int] - oxidation states
- class matminer.utils.data.PymatgenData(use_common_oxi_states=True, impute_nan=False)¶
Bases:
OxidationStateDependentData
,OxidationStatesMixin
Class to get data from pymatgen. See also: S.P. Ong, W.D. Richards, A. Jain, G. Hautier, M. Kocher, S. Cholia, et al., Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis, Comput. Mater. Sci. 68 (2013) 314-319.
Meanings of each feature can be obtained from the pymatgen.Composition documentation (attributes).
- Args:
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(use_common_oxi_states=True, impute_nan=False)¶
- get_charge_dependent_property(element, charge, property_name)¶
Retrieve a oxidation-state dependent elemental property
- Args:
element - (Element), Target element charge - (int), Oxidation state property_name - (string), name of property
- Return:
(float) - Value of property
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
- get_oxidation_states(elem)¶
Get the oxidation states of an element
- Args:
elem - (Element) target element common - (boolean), whether to return only the common oxidation states,
or all known oxidation states
- Returns:
[int] list of oxidation states
- class matminer.utils.data.TransportData(props=None, method='pseudo_inverse', alpha=0, saving_dir='~/.matminer/transport_props/', impute_nan=False)¶
Bases:
AbstractData
Class to use transport data from Ricci et al., see An ab initio electronic transport database for inorganic materials. Ricci, F., Chen, W., Aydemir, U., Snyder, G. J., Rignanese, G. M., Jain, A., & Hautier, G. (2017). Scientific data, 4(1), 1-13. https://doi.org/10.1038/sdata.2017.85
The database has been used to extract: 1) the properties of single elements when available.
These are stored in matminer/utils/data_files/mp_transport/transport_pure_elems.csv
the pseudo-inverse of the properties of single elements. These pseudo-inverse contributions correspond to the coefficients of a least-square fit from the compositions to the properties. This can allow to better take into account data from different compounds for a given element.
Using the pseudo-inverses (method=”pseudo_inverse”) instead of the elemental properties (method=”exact”) leads to better results as far as we have checked. Another possibility is to use method=”combined”, where the exact values are taken for compounds present as pure compounds in the database, and the pseudo-inverse is taken if the element is not present purely in the database.
For the effective mass, the pseudo-inverse is obtained on 1/(alpha+m), then m is re-obtained for single elements. This is to avoid huge errors coming from the huge spread in data (12 orders of magnitude).
- Args:
- props: optical properties to include. Should be a (sub)list of
[“sigma_p”, “sigma_n”, “S_p”, “S_n”, “kappa_p”, “kappa_n”, “PF_p”, “PF_n”, “m_p”, “m_n”] for the hole (_p) and electron (_n) conductivity (sigma), Seebeck coefficient (S), thermal conductivity (kappa), power factor (PF) and effective mass (m).
method: type of values, either “exact”, “pseudo_inverse”, or “combined”. alpha: Value used to featurize the effective mass.
The values of the effective masses span 12 orders of magnitude, which makes the pseudo-inverse biased To overcome this, we use 1 / (alpha + m) for the pseudo-inversion. The value of alpha can be tested. A file for each of them is created, so that it is not computed each time. Defaults to 0, and used only if method != “exact”.
- saving_dir: folder to save the data and csv file used for the featurization. Saving them helps fasten the
featurization.
- impute_nan (bool): if True, the features for the elements
that are missing from the data_source or are NaNs are replaced by the average of each features over the available elements.
- __init__(props=None, method='pseudo_inverse', alpha=0, saving_dir='~/.matminer/transport_props/', impute_nan=False)¶
- get_elemental_property(elem, property_name)¶
Get a certain elemental property for a certain element.
- Args:
elem - (Element) element to be assessed property_name - (str) property to be retrieved
- Returns:
float, property of that element
matminer.utils.flatten_dict module¶
- matminer.utils.flatten_dict.flatten_dict(nested_dict, lead_key=None, unwind_arrays=True)¶
Helper function to flatten nested dictionary, recursively walks through nested dictionary to get keys corresponding to dot-notation keys, e. g. converts {“a”: {“b”: 1, “c”: 2}} to {“a.b”: 1, “a.c”: 2}
- Args:
nested_dict ({}): nested dictionary to flatten unwind_arrays (bool): whether to flatten lists/tuples
with numerically indexed dot notation, defaults to True
- lead_key (str): string to append to front of all keys,
used primarily for recursion
- Returns:
non-nested dictionary
matminer.utils.io module¶
This module defines functions for writing and reading matminer related objects
- matminer.utils.io.load_dataframe_from_json(filename, pbar=True, decode=True)¶
Load pandas dataframe from a json file.
Automatically decodes and instantiates pymatgen objects in the dataframe.
- Args:
- filename (str): Path to json file. Can be a compressed file (gz and bz2)
are supported.
pbar (bool): If true, shows an ASCII progress bar for loading data from disk. decode (bool): If true, will automatically decode objects (slow, convenient).
If false, will return json representations of the objects (fast, inconvenient).
- Returns:
(Pandas.DataFrame): A pandas dataframe.
- matminer.utils.io.store_dataframe_as_json(dataframe, filename, compression=None, orient='split', pbar=True)¶
Store pandas dataframe as a json file.
Automatically encodes pymatgen objects as dictionaries.
- Args:
dataframe (Pandas.Dataframe): A pandas dataframe. filename (str): Path to json file. compression (str or None): A compression mode. Valid options are “gz”,
“bz2”, and None. Defaults to None. If the filename does not end in with the correct suffix it will be added automatically.
- orient (str): Determines the format in which the dictionary data is
stored. This takes the same set of arguments as the orient option in pandas.DataFrame.to_dict() function. ‘split’ is recommended as it is relatively space efficient and preserves the dtype of the index.
- pbar (bool): If True, shows a progress bar for encoding objects to
compatible json format (normally the rate-limiting step).
matminer.utils.kernels module¶
- matminer.utils.kernels.gaussian_kernel(arr0, arr1, SIGMA)¶
Returns a Gaussian kernel of the two arrays for use in KRR or other regressions using the kernel trick.
- matminer.utils.kernels.laplacian_kernel(arr0, arr1, SIGMA)¶
Returns a Laplacian kernel of the two arrays for use in KRR or other regressions using the kernel trick.
matminer.utils.pipeline module¶
- class matminer.utils.pipeline.DropExcluded(excluded)¶
Bases:
BaseEstimator
,TransformerMixin
Transformer for removing unwanted columns from a dataframe. Passes back the remaining columns.
Helper class for making sklearn pipelines with matminer.
- Args:
excluded (list of labels): A list of column labels to drop from the dataframe
- __init__(excluded)¶
- fit(x, y=None)¶
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') DropExcluded ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
x
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, df: bool | None | str = '$UNCHANGED$') DropExcluded ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- dfstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
df
parameter intransform
.
Returns¶
- selfobject
The updated object.
- transform(df)¶
- class matminer.utils.pipeline.ItemSelector(label)¶
Bases:
BaseEstimator
,TransformerMixin
A utility for extracting a column from a DataFrame in a sklearn pipeline, for example in a FeatureUnion pipeline to featurize a dataset.
Helper class for making sklearn pipelines with matminer.
See (http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html)
- Args:
label : The label of the column to select.
- __init__(label)¶
- fit(x, y=None)¶
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') ItemSelector ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- xstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
x
parameter infit
.
Returns¶
- selfobject
The updated object.
- set_transform_request(*, dataframe: bool | None | str = '$UNCHANGED$') ItemSelector ¶
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.Parameters¶
- dataframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dataframe
parameter intransform
.
Returns¶
- selfobject
The updated object.
- transform(dataframe)¶
matminer.utils.utils module¶
Utility module
- matminer.utils.utils.get_elem_in_data(df, as_pure=False)¶
Looks for all elements present in the compounds forming the index of a dataframe.
- Args:
df: DataFrame containing the compounds formula (as strings) as index as_pure: if True, consider only the pure compounds
- Returns:
List of elements (str)
- matminer.utils.utils.get_pseudo_inverse(df_init, cols=None)¶
Computes the pseudo-inverse matrix of a dataframe containing properties for multiple compositions. From a compositions matrix (containing the elemental fractions of each element present in the data), and a property column (or multiple properties gathered in a matrix), the pseudo-inverse column (or matrix if multiple properties are present) is defined by
compositions * pseudo-inverse = property
The pseudo-inverse coefficients are therefore average contributions of each element present in the data to the property of the compounds containing this element (in a least-square fit manner). This allows to take many compounds-property into consideration.
Note that the pseudo-inverse coefficients do not represent the property of single elements in their pure form: negative values may appear for single elements and for physically-positive properties, but this only reflects the fact that the presence of the element in compounds generally decreases the value of the property.
For elements that are not present in compounds of the data, nan are returned.
- Args:
- df_init: DataFrame with a column named “Composition” containing compositions
(anything that can be turned into a Pymatgen Composition object), and other columns containing properties to be inversed.
cols: list of columns of the dataframe giving the features to be pseudo-inversed.
- Returns:
DataFrame with the pseudo-inverse coefficients for all elements present in the initial compositions and all properties.
- matminer.utils.utils.homogenize_multiindex(df, default_key, coerce=False)¶
Homogenizes a dataframe column index to a 2-level multiindex.
- Args:
df (pandas DataFrame): A dataframe default_key (str): The key to use when a single Index must be converted
to a 2-level index. This key is then used as a parent of all keys present in the original 1-level index.
- coerce (bool): If True, try to force a 2+ level multiindex to a 2-level
multiindex.
- Returns:
df (pandas DataFrame): A dataframe with a 2-layer multiindex.