matminer.datasets package¶
Subpackages¶
- matminer.datasets.tests package
- Submodules
- matminer.datasets.tests.base module
- matminer.datasets.tests.test_convenience_loaders module
- matminer.datasets.tests.test_dataset_retrieval module
DataRetrievalTest
DataRetrievalTest.test_get_all_dataset_info()
DataRetrievalTest.test_get_dataset_attribute()
DataRetrievalTest.test_get_dataset_citations()
DataRetrievalTest.test_get_dataset_column_descriptions()
DataRetrievalTest.test_get_dataset_columns()
DataRetrievalTest.test_get_dataset_description()
DataRetrievalTest.test_get_dataset_num_entries()
DataRetrievalTest.test_get_dataset_reference()
DataRetrievalTest.test_load_dataset()
DataRetrievalTest.test_print_available_datasets()
- matminer.datasets.tests.test_datasets module
DataSetsTest
MatbenchDatasetsTest
MatminerDatasetsTest
MatminerDatasetsTest.test_boltztrap_mp()
MatminerDatasetsTest.test_brgoch_superhard_training()
MatminerDatasetsTest.test_castelli_perovskites()
MatminerDatasetsTest.test_citrine_thermal_conductivity()
MatminerDatasetsTest.test_dielectric_constant()
MatminerDatasetsTest.test_double_perovskites_gap()
MatminerDatasetsTest.test_double_perovskites_gap_lumo()
MatminerDatasetsTest.test_elastic_tensor_2015()
MatminerDatasetsTest.test_expt_formation_enthalpy()
MatminerDatasetsTest.test_expt_formation_enthalpy_kingsbury()
MatminerDatasetsTest.test_expt_gap()
MatminerDatasetsTest.test_expt_gap_kingsbury()
MatminerDatasetsTest.test_flla()
MatminerDatasetsTest.test_glass_binary()
MatminerDatasetsTest.test_glass_binary_v2()
MatminerDatasetsTest.test_glass_ternary_hipt()
MatminerDatasetsTest.test_glass_ternary_landolt()
MatminerDatasetsTest.test_heusler_magnetic()
MatminerDatasetsTest.test_jarvis_dft_2d()
MatminerDatasetsTest.test_jarvis_dft_3d()
MatminerDatasetsTest.test_jarvis_ml_dft_training()
MatminerDatasetsTest.test_m2ax()
MatminerDatasetsTest.test_mp_all_20181018()
MatminerDatasetsTest.test_mp_nostruct_20181018()
MatminerDatasetsTest.test_phonon_dielectric_mp()
MatminerDatasetsTest.test_piezoelectric_tensor()
MatminerDatasetsTest.test_ricci_boltztrap_mp_tabular()
MatminerDatasetsTest.test_steel_strength()
MatminerDatasetsTest.test_superconductivity2018()
MatminerDatasetsTest.test_tholander_nitrides_e_form()
MatminerDatasetsTest.test_ucsb_thermoelectrics()
MatminerDatasetsTest.test_wolverton_oxides()
- matminer.datasets.tests.test_utils module
- Module contents
Submodules¶
matminer.datasets.convenience_loaders module¶
- matminer.datasets.convenience_loaders.load_boltztrap_mp(data_home=None, download_if_missing=True)¶
Convenience function for loading the boltztrap_mp dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_brgoch_superhard_training(subset='all', drop_suspect=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the expt_formation_enthalpy dataset.
- Args:
- subset (str): Identifier for subset of data to return,
- all: all possible columns including metadata, engineered features,
and basic descriptors
brgoch_features: only features from reference paper and targets basic_descriptors: only composition/structure columns and targets
- drop_suspect (bool): Whether to drop values with possibly incorrect
elastic data and materials that could not be verified
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_castelli_perovskites(data_home=None, download_if_missing=True)¶
Convenience function for loading the castelli_perovskites dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_citrine_thermal_conductivity(room_temperature=True, data_home=None, download_if_missing=True)¶
Convenience function for loading the citrine thermal conductivity dataset.
- Args:
- room_temperature (bool) Whether or not to only return items with room
temperature k_condition. True by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_dielectric_constant(include_metadata=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the dielectric_constant dataset.
- Args:
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_double_perovskites_gap(return_lumo=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the double_perovskites_gap dataset.
- Args:
- return_lumo (bool) Whether or not to provide LUMO energy dataframe in
addition to gap dataframe. Defaults to False.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame, tuple)
- matminer.datasets.convenience_loaders.load_double_perovskites_gap_lumo(data_home=None, download_if_missing=True)¶
Convenience function for loading the double_perovskites_gap_lumo dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_elastic_tensor(version='2015', include_metadata=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the elastic_tensor dataset.
- Args:
- version (str): Version of the elastic_tensor dataset to load
(defaults to 2015)
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_expt_formation_enthalpy(data_home=None, download_if_missing=True)¶
Convenience function for loading the expt_formation_enthalpy dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_expt_gap(data_home=None, download_if_missing=True)¶
Convenience function for loading the expt_gap dataset.me
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_flla(data_home=None, download_if_missing=True)¶
Convenience function for loading the flla dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_glass_binary(version='v2', data_home=None, download_if_missing=True)¶
Convenience function for loading the glass_binary dataset.
- Args:
- version (str): Version identifier for dataset, see dataset description
for explanation of each. Defaults to v2
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_glass_ternary_hipt(system='all', data_home=None, download_if_missing=True)¶
Convenience function for loading the glass_ternary_hipt dataset.
- Args:
- system (str, list): return items only from the requested system(s)
options are: “CoFeZr”, “CoTiZr”, “CoVZr”, “FeTiNb”
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_glass_ternary_landolt(processing='all', unique_composition=True, data_home=None, download_if_missing=True)¶
Convenience function for loading the glass_ternary_landolt dataset.
- Args:
- processing (str): return only items with a specified processing method
defaults to all, options are sputtering and meltspin
- unique_composition (bool): Whether or not to combine compositions with
the same formula
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_heusler_magnetic(data_home=None, download_if_missing=True)¶
Convenience function for loading the heusler magnetic dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_jarvis_dft_2d(drop_nan_columns=None, data_home=None, download_if_missing=True)¶
Convenience function for loading the jarvis dft 2d dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_jarvis_dft_3d(drop_nan_columns=None, data_home=None, download_if_missing=True)¶
Convenience function for loading the jarvis dft 3d dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_jarvis_ml_dft_training(drop_nan_columns=None, data_home=None, download_if_missing=True)¶
Convenience function for loading the jarvis ml dft training dataset.
- Args:
drop_nan_columns (list, str): Column or columns to drop rows containing NaN values from
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_m2ax(data_home=None, download_if_missing=True)¶
Convenience function for loading the m2ax dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_mp(include_structures=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the materials project dataset.
- Args:
- include_structures (bool) Whether or not to load the full mp
structure data. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_phonon_dielectric_mp(data_home=None, download_if_missing=True)¶
Convenience function for loading the phonon_dielectric_mp dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_piezoelectric_tensor(include_metadata=False, data_home=None, download_if_missing=True)¶
Convenience function for loading the piezoelectric_tensor dataset.
- Args:
- include_metadata (bool): Whether or not to include the cif, meta,
and poscar dataset columns. False by default.
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_steel_strength(data_home=None, download_if_missing=True)¶
Convenience function for loading the steel strength dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
- matminer.datasets.convenience_loaders.load_wolverton_oxides(data_home=None, download_if_missing=True)¶
Convenience function for loading the wolverton oxides dataset.
- Args:
data_home (str, None): Where to look for and store the loaded dataset
- download_if_missing (bool): Whether or not to download the dataset if
it isn’t on disk
Returns: (pd.DataFrame)
matminer.datasets.dataset_retrieval module¶
- matminer.datasets.dataset_retrieval.get_all_dataset_info(dataset_name)¶
- Helper function to get all info for a particular dataset, including:
Citation info
Bibtex-formatted references
Dataset columns and their descriptions
The dataset description
The number of entries in the dataset
- Args:
dataset_name (str): Name of the dataset querying info
- Returns:
- output_str (str): All metadata associated with the dataset, in a
formatted string.
- matminer.datasets.dataset_retrieval.get_available_datasets(print_format='medium', sort_method='alphabetical')¶
Function for retrieving the datasets available within matminer.
- Args:
- print_format (None, str): None, “short”, “medium”, or “long”:
None: Don’t print anything “short”: only the dataset names “medium”: dataset names and their descriptions “long”: All dataset info associated with the dataset
- sort_method (str): By what metric to sort the datasets when retrieving
their information.
alphabetical: sorts by dataset name, num_entries: sorts by number of dataset entries
Returns: (list)
- matminer.datasets.dataset_retrieval.get_dataset_attribute(dataset_name, attrib_key)¶
Helper function for getting generic attributes of the dataset
- Args:
dataset_name (str): Name of the dataset querying info from
attrib_key (str): Name of attribute to pull
Returns: Dataset attribute
- matminer.datasets.dataset_retrieval.get_dataset_citations(dataset_name)¶
Convenience function for getting dataset citations
- Args:
dataset_name (str): name of the dataset being queried
Returns: (list)
- matminer.datasets.dataset_retrieval.get_dataset_column_description(dataset_name, dataset_column)¶
Convenience function for getting dataset column description
- Args:
dataset_name (str): name of the dataset being queried dataset_column (str): name of the column to get description from
Returns: (str)
- matminer.datasets.dataset_retrieval.get_dataset_columns(dataset_name)¶
Convenience function for getting dataset column list
- Args:
dataset_name (str): name of the dataset being queried
Returns: (list)
- matminer.datasets.dataset_retrieval.get_dataset_description(dataset_name)¶
Convenience function for getting dataset description
- Args:
dataset_name (str): name of the dataset being queried
Returns: (str)
- matminer.datasets.dataset_retrieval.get_dataset_num_entries(dataset_name)¶
Convenience function for getting dataset number of entries
- Args:
dataset_name (str): name of the dataset being queried
Returns: (int)
- matminer.datasets.dataset_retrieval.get_dataset_reference(dataset_name)¶
Convenience function for getting dataset reference
- Args:
dataset_name (str): name of the dataset being queried
Returns: (str)
- matminer.datasets.dataset_retrieval.load_dataset(name, data_home=None, download_if_missing=True, pbar=False)¶
Loads a dataframe containing the dataset specified with the ‘name’ field.
Dataset file is stored/loaded from data_home if specified, otherwise at the MATMINER_DATA environment variable if set or at matminer/datasets by default.
- Args:
- name (str): keyword specifying what dataset to load, run
matminer.datasets.get_available_datasets() for options
data_home (str): path to folder to look for dataset file
- download_if_missing (bool): whether to download the dataset if is not
found on disk
pbar (bool): If true, show progress bar for loading dataset.
- Returns: (pd.DataFrame,
tuple -> (pd.DataFrame, pd.DataFrame) if return_lumo = True)