matminer.datasets package

Subpackages

Submodules

matminer.datasets.convenience_loaders module

matminer.datasets.convenience_loaders.load_boltztrap_mp(data_home=None, download_if_missing=True)

Convenience function for loading the boltztrap_mp dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_brgoch_superhard_training(subset='all', drop_suspect=False, data_home=None, download_if_missing=True)

Convenience function for loading the expt_formation_enthalpy dataset.

Args:
subset (str): Identifier for subset of data to return,
all: all possible columns including metadata, engineered features,

and basic descriptors

brgoch_features: only features from reference paper and targets basic_descriptors: only composition/structure columns and targets

drop_suspect (bool): Whether to drop values with possibly incorrect

elastic data and materials that could not be verified

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_castelli_perovskites(data_home=None, download_if_missing=True)

Convenience function for loading the castelli_perovskites dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_citrine_thermal_conductivity(room_temperature=True, data_home=None, download_if_missing=True)

Convenience function for loading the citrine thermal conductivity dataset.

Args:
room_temperature (bool) Whether or not to only return items with room

temperature k_condition. True by default.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_dielectric_constant(include_metadata=False, data_home=None, download_if_missing=True)

Convenience function for loading the dielectric_constant dataset.

Args:
include_metadata (bool): Whether or not to include the cif, meta,

and poscar dataset columns. False by default.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_double_perovskites_gap(return_lumo=False, data_home=None, download_if_missing=True)

Convenience function for loading the double_perovskites_gap dataset.

Args:
return_lumo (bool) Whether or not to provide LUMO energy dataframe in

addition to gap dataframe. Defaults to False.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame, tuple)

matminer.datasets.convenience_loaders.load_double_perovskites_gap_lumo(data_home=None, download_if_missing=True)

Convenience function for loading the double_perovskites_gap_lumo dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_elastic_tensor(version='2015', include_metadata=False, data_home=None, download_if_missing=True)

Convenience function for loading the elastic_tensor dataset.

Args:
version (str): Version of the elastic_tensor dataset to load

(defaults to 2015)

include_metadata (bool): Whether or not to include the cif, meta,

and poscar dataset columns. False by default.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_expt_formation_enthalpy(data_home=None, download_if_missing=True)

Convenience function for loading the expt_formation_enthalpy dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_expt_gap(data_home=None, download_if_missing=True)

Convenience function for loading the expt_gap dataset.me

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_flla(data_home=None, download_if_missing=True)

Convenience function for loading the flla dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_glass_binary(version='v2', data_home=None, download_if_missing=True)

Convenience function for loading the glass_binary dataset.

Args:
version (str): Version identifier for dataset, see dataset description

for explanation of each. Defaults to v2

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_glass_ternary_hipt(system='all', data_home=None, download_if_missing=True)

Convenience function for loading the glass_ternary_hipt dataset.

Args:
system (str, list): return items only from the requested system(s)

options are: “CoFeZr”, “CoTiZr”, “CoVZr”, “FeTiNb”

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_glass_ternary_landolt(processing='all', unique_composition=True, data_home=None, download_if_missing=True)

Convenience function for loading the glass_ternary_landolt dataset.

Args:
processing (str): return only items with a specified processing method

defaults to all, options are sputtering and meltspin

unique_composition (bool): Whether or not to combine compositions with

the same formula

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_heusler_magnetic(data_home=None, download_if_missing=True)

Convenience function for loading the heusler magnetic dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_jarvis_dft_2d(drop_nan_columns=None, data_home=None, download_if_missing=True)

Convenience function for loading the jarvis dft 2d dataset.

Args:

drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_jarvis_dft_3d(drop_nan_columns=None, data_home=None, download_if_missing=True)

Convenience function for loading the jarvis dft 3d dataset.

Args:

drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_jarvis_ml_dft_training(drop_nan_columns=None, data_home=None, download_if_missing=True)

Convenience function for loading the jarvis ml dft training dataset.

Args:

drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_m2ax(data_home=None, download_if_missing=True)

Convenience function for loading the m2ax dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_mp(include_structures=False, data_home=None, download_if_missing=True)

Convenience function for loading the materials project dataset.

Args:
include_structures (bool) Whether or not to load the full mp

structure data. False by default.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_phonon_dielectric_mp(data_home=None, download_if_missing=True)

Convenience function for loading the phonon_dielectric_mp dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_piezoelectric_tensor(include_metadata=False, data_home=None, download_if_missing=True)

Convenience function for loading the piezoelectric_tensor dataset.

Args:
include_metadata (bool): Whether or not to include the cif, meta,

and poscar dataset columns. False by default.

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_steel_strength(data_home=None, download_if_missing=True)

Convenience function for loading the steel strength dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.convenience_loaders.load_wolverton_oxides(data_home=None, download_if_missing=True)

Convenience function for loading the wolverton oxides dataset.

Args:

data_home (str, None): Where to look for and store the loaded dataset

download_if_missing (bool): Whether or not to download the dataset if

it isn’t on disk

Returns: (pd.DataFrame)

matminer.datasets.dataset_retrieval module

matminer.datasets.dataset_retrieval.get_all_dataset_info(dataset_name)
Helper function to get all info for a particular dataset, including:
  • Citation info

  • Bibtex-formatted references

  • Dataset columns and their descriptions

  • The dataset description

  • The number of entries in the dataset

Args:

dataset_name (str): Name of the dataset querying info

Returns:
output_str (str): All metadata associated with the dataset, in a

formatted string.

matminer.datasets.dataset_retrieval.get_available_datasets(print_format='medium', sort_method='alphabetical')

Function for retrieving the datasets available within matminer.

Args:
print_format (None, str): None, “short”, “medium”, or “long”:

None: Don’t print anything “short”: only the dataset names “medium”: dataset names and their descriptions “long”: All dataset info associated with the dataset

sort_method (str): By what metric to sort the datasets when retrieving

their information.

alphabetical: sorts by dataset name, num_entries: sorts by number of dataset entries

Returns: (list)

matminer.datasets.dataset_retrieval.get_dataset_attribute(dataset_name, attrib_key)

Helper function for getting generic attributes of the dataset

Args:

dataset_name (str): Name of the dataset querying info from

attrib_key (str): Name of attribute to pull

Returns: Dataset attribute

matminer.datasets.dataset_retrieval.get_dataset_citations(dataset_name)

Convenience function for getting dataset citations

Args:

dataset_name (str): name of the dataset being queried

Returns: (list)

matminer.datasets.dataset_retrieval.get_dataset_column_description(dataset_name, dataset_column)

Convenience function for getting dataset column description

Args:

dataset_name (str): name of the dataset being queried dataset_column (str): name of the column to get description from

Returns: (str)

matminer.datasets.dataset_retrieval.get_dataset_columns(dataset_name)

Convenience function for getting dataset column list

Args:

dataset_name (str): name of the dataset being queried

Returns: (list)

matminer.datasets.dataset_retrieval.get_dataset_description(dataset_name)

Convenience function for getting dataset description

Args:

dataset_name (str): name of the dataset being queried

Returns: (str)

matminer.datasets.dataset_retrieval.get_dataset_num_entries(dataset_name)

Convenience function for getting dataset number of entries

Args:

dataset_name (str): name of the dataset being queried

Returns: (int)

matminer.datasets.dataset_retrieval.get_dataset_reference(dataset_name)

Convenience function for getting dataset reference

Args:

dataset_name (str): name of the dataset being queried

Returns: (str)

matminer.datasets.dataset_retrieval.load_dataset(name, data_home=None, download_if_missing=True, pbar=False)

Loads a dataframe containing the dataset specified with the ‘name’ field.

Dataset file is stored/loaded from data_home if specified, otherwise at the MATMINER_DATA environment variable if set or at matminer/datasets by default.

Args:
name (str): keyword specifying what dataset to load, run

matminer.datasets.get_available_datasets() for options

data_home (str): path to folder to look for dataset file

download_if_missing (bool): whether to download the dataset if is not

found on disk

pbar (bool): If true, show progress bar for loading dataset.

Returns: (pd.DataFrame,

tuple -> (pd.DataFrame, pd.DataFrame) if return_lumo = True)

matminer.datasets.utils module

Module contents