matminer.datasets package¶
Subpackages¶
Submodules¶
matminer.datasets.convenience_loaders module¶
-
matminer.datasets.convenience_loaders.
load_boltztrap_mp
(data_home=None, download_if_missing=True)¶ Convenience function for loading the boltztrap_mp dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_brgoch_superhard_training
(subset='all', drop_suspect=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the expt_formation_enthalpy dataset.
- Args:
- subset (str): Identifier for subset of data to return,
- all: all possible columns including metadata, engineered features,
and basic descriptors
brgoch_features: only features from reference paper and targets basic_descriptors: only composition/structure columns and targets
- drop_suspect (bool): Whether to drop values with possibly incorrect
elastic data and materials that could not be verified
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_castelli_perovskites
(data_home=None, download_if_missing=True)¶ Convenience function for loading the castelli_perovskites dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_citrine_thermal_conductivity
(room_temperature=True, data_home=None, download_if_missing=True)¶ Convenience function for loading the citrine thermal conductivity dataset.
- Args:
- room_temperature (bool) Whether or not to only return items with room
temperature k_condition. True by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_dielectric_constant
(include_metadata=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the dielectric_constant dataset.
- Args:
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_double_perovskites_gap
(return_lumo=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the double_perovskites_gap dataset.
- Args:
- return_lumo (bool) Whether or not to provide LUMO energy dataframe in
addition to gap dataframe. Defaults to False.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame, tuple)
-
matminer.datasets.convenience_loaders.
load_double_perovskites_gap_lumo
(data_home=None, download_if_missing=True)¶ Convenience function for loading the double_perovskites_gap_lumo dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_elastic_tensor
(version='2015', include_metadata=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the elastic_tensor dataset.
- Args:
- version (str): Version of the elastic_tensor dataset to load
(defaults to 2015)
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_expt_formation_enthalpy
(data_home=None, download_if_missing=True)¶ Convenience function for loading the expt_formation_enthalpy dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_expt_gap
(data_home=None, download_if_missing=True)¶ Convenience function for loading the expt_gap dataset.me
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_flla
(data_home=None, download_if_missing=True)¶ Convenience function for loading the flla dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_glass_binary
(version='v2', data_home=None, download_if_missing=True)¶ Convenience function for loading the glass_binary dataset.
- Args:
- version (str): Version identifier for dataset, see dataset description
for explanation of each. Defaults to v2
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_glass_ternary_hipt
(system='all', data_home=None, download_if_missing=True)¶ Convenience function for loading the glass_ternary_hipt dataset.
- Args:
- system (str, list): return items only from the requested system(s)
options are: “CoFeZr”, “CoTiZr”, “CoVZr”, “FeTiNb”
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_glass_ternary_landolt
(processing='all', unique_composition=True, data_home=None, download_if_missing=True)¶ Convenience function for loading the glass_ternary_landolt dataset.
- Args:
- processing (str): return only items with a specified processing method
defaults to all, options are sputtering and meltspin
- unique_composition (bool): Whether or not to combine compositions with
the same formula
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_heusler_magnetic
(data_home=None, download_if_missing=True)¶ Convenience function for loading the heusler magnetic dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_jarvis_dft_2d
(drop_nan_columns=None, data_home=None, download_if_missing=True)¶ Convenience function for loading the jarvis dft 2d dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_jarvis_dft_3d
(drop_nan_columns=None, data_home=None, download_if_missing=True)¶ Convenience function for loading the jarvis dft 3d dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_jarvis_ml_dft_training
(drop_nan_columns=None, data_home=None, download_if_missing=True)¶ Convenience function for loading the jarvis ml dft training dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_m2ax
(data_home=None, download_if_missing=True)¶ Convenience function for loading the m2ax dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_mp
(include_structures=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the materials project dataset.
- Args:
- include_structures (bool) Whether or not to load the full mp
structure data. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_phonon_dielectric_mp
(data_home=None, download_if_missing=True)¶ Convenience function for loading the phonon_dielectric_mp dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_piezoelectric_tensor
(include_metadata=False, data_home=None, download_if_missing=True)¶ Convenience function for loading the piezoelectric_tensor dataset.
- Args:
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_steel_strength
(data_home=None, download_if_missing=True)¶ Convenience function for loading the steel strength dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
-
matminer.datasets.convenience_loaders.
load_wolverton_oxides
(data_home=None, download_if_missing=True)¶ Convenience function for loading the wolverton oxides dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
matminer.datasets.dataset_retrieval module¶
-
matminer.datasets.dataset_retrieval.
get_all_dataset_info
(dataset_name)¶ - Helper function to get all info for a particular dataset, including:
Citation info
Bibtex-formatted references
Dataset columns and their descriptions
The dataset description
The number of entries in the dataset
- Args:
dataset_name (str): Name of the dataset querying info
- Returns:
- output_str (str): All metadata associated with the dataset, in a
formatted string.
-
matminer.datasets.dataset_retrieval.
get_available_datasets
(print_format='medium', sort_method='alphabetical')¶ Function for retrieving the datasets available within matminer.
- Args:
- print_format (None, str): None, “short”, “medium”, or “long”:
None: Don’t print anything “short”: only the dataset names “medium”: dataset names and their descriptions “long”: All dataset info associated with the dataset
- sort_method (str): By what metric to sort the datasets when retrieving
their information.
alphabetical: sorts by dataset name, num_entries: sorts by number of dataset entries
Returns: (list)
-
matminer.datasets.dataset_retrieval.
get_dataset_attribute
(dataset_name, attrib_key)¶ Helper function for getting generic attributes of the dataset
- Args:
dataset_name (str): Name of the dataset querying info from
attrib_key (str): Name of attribute to pull
Returns: Dataset attribute
-
matminer.datasets.dataset_retrieval.
get_dataset_citations
(dataset_name)¶ Convenience function for getting dataset citations
- Args:
dataset_name (str): name of the dataset being queried
Returns: (list)
-
matminer.datasets.dataset_retrieval.
get_dataset_column_description
(dataset_name, dataset_column)¶ Convenience function for getting dataset column description
- Args:
dataset_name (str): name of the dataset being queried dataset_column (str): name of the column to get description from
Returns: (str)
-
matminer.datasets.dataset_retrieval.
get_dataset_columns
(dataset_name)¶ Convenience function for getting dataset column list
- Args:
dataset_name (str): name of the dataset being queried
Returns: (list)
-
matminer.datasets.dataset_retrieval.
get_dataset_description
(dataset_name)¶ Convenience function for getting dataset description
- Args:
dataset_name (str): name of the dataset being queried
Returns: (str)
-
matminer.datasets.dataset_retrieval.
get_dataset_num_entries
(dataset_name)¶ Convenience function for getting dataset number of entries
- Args:
dataset_name (str): name of the dataset being queried
Returns: (int)
-
matminer.datasets.dataset_retrieval.
get_dataset_reference
(dataset_name)¶ Convenience function for getting dataset reference
- Args:
dataset_name (str): name of the dataset being queried
Returns: (str)
-
matminer.datasets.dataset_retrieval.
load_dataset
(name, data_home=None, download_if_missing=True)¶ Loads a dataframe containing the dataset specified with the ‘name’ field.
Dataset file is stored/loaded from data_home if specified, otherwise at the MATMINER_DATA environment variable if set or at matminer/datasets by default.
- Args:
- name (str): keyword specifying what dataset to load, run
matminer.datasets.get_available_datasets() for options
data_home (str): path to folder to look for dataset file
- download_if_missing (bool): whether to download the dataset if is not
found on disk
- Returns: (pd.DataFrame,
tuple -> (pd.DataFrame, pd.DataFrame) if return_lumo = True)