Table of Datasets¶
Find a table of all 45 datasets available in matminer here.
Name |
Description |
Entries |
---|---|---|
|
Effective mass and thermoelectric properties of 8924 compounds in The Materials Project database that are calculated by the BoltzTraP software package run on the GGA-PBE or GGA+U density functional theory calculation results |
8924 |
|
2574 materials used for training regressors that predict shear and bulk modulus. |
2574 |
|
18,928 perovskites generated with ABX combinatorics, calculating gllbsc band gap and pbe structure, and also reporting absolute band edge positions and heat of formation. |
18928 |
|
Thermal conductivity of 872 compounds measured experimentally and retrieved from Citrine database from various references |
872 |
|
1,056 structures with dielectric properties, calculated with DFPT-PBE. |
1056 |
|
Band gap of 1306 double perovskites (a_1-b_1-a_2-b_2-O6) calculated using Gritsenko, van Leeuwen, van Lenthe and Baerends potential (gllbsc) in GPAW. |
1306 |
|
Supplementary lumo data of 55 atoms for the double_perovskites_gap dataset. |
55 |
|
1,181 structures with elastic properties calculated with DFT-PBE. |
1181 |
|
Experimental formation enthalpies for inorganic compounds, collected from years of calorimetric experiments |
1276 |
|
Dataset containing experimental standard formation enthalpies for solids |
2135 |
|
Experimental band gap of 6354 inorganic semiconductors. |
6354 |
|
Identical to the matbench_expt_gap dataset, except that Materials Project database IDs (mp-ids) have been associated with each material using the same method as described for the expt_formation_enthalpy_kingsbury dataset |
4604 |
|
3938 structures and computed formation energies from “Crystal Structure Representations for Machine Learning Models of Formation Energies.” |
3938 |
|
Metallic glass formation data for binary alloys, collected from various experimental techniques such as melt-spinning or mechanical alloying |
5959 |
|
Identical to glass_binary dataset, but with duplicate entries merged |
5483 |
|
Metallic glass formation dataset for ternary alloys, collected from the high-throughput sputtering experiments measuring whether it is possible to form a glass using sputtering |
5170 |
|
Metallic glass formation dataset for ternary alloys, collected from the “Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys,’ a volume of the Landolt– Börnstein collection |
7191 |
|
1153 Heusler alloys with DFT-calculated magnetic and electronic properties |
1153 |
|
Various properties of 636 2D materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database. |
636 |
|
Various properties of 25,923 bulk materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database. |
25923 |
|
Various properties of 24,759 bulk and 2D materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database. |
24759 |
|
Elastic properties of 223 stable M2AX compounds from “A comprehensive survey of M2AX phase elastic properties” by Cover et al |
223 |
|
Matbench v0.1 test dataset for predicting refractive index from structure |
4764 |
|
Matbench v0.1 test dataset for predicting experimental band gap from composition alone |
4604 |
|
Matbench v0.1 test dataset for classifying metallicity from composition alone |
4921 |
|
Matbench v0.1 test dataset for predicting full bulk metallic glass formation ability from chemical formula |
5680 |
|
Matbench v0.1 test dataset for predicting exfoliation energies from crystal structure (computed with the OptB88vdW and TBmBJ functionals) |
636 |
|
Matbench v0.1 test dataset for predicting DFT log10 VRH-average shear modulus from structure |
10987 |
|
Matbench v0.1 test dataset for predicting DFT log10 VRH-average bulk modulus from structure |
10987 |
|
Matbench v0.1 test dataset for predicting DFT formation energy from structure |
132752 |
|
Matbench v0.1 test dataset for predicting DFT PBE band gap from structure |
106113 |
|
Matbench v0.1 test dataset for predicting DFT metallicity from structure |
106113 |
|
Matbench v0.1 test dataset for predicting formation energy from crystal structure |
18928 |
|
Matbench v0.1 test dataset for predicting vibration properties from crystal structure |
1265 |
|
Matbench v0.1 test dataset for predicting steel yield strengths from chemical composition alone |
312 |
|
A complete copy of the Materials Project database as of 10/18/2018 |
83989 |
|
A complete copy of the Materials Project database as of 10/18/2018 |
83989 |
|
Phonon (lattice/atoms vibrations) and dielectric properties of 1296 compounds computed via ABINIT software package in the harmonic approximation based on density functional perturbation theory. |
1296 |
|
941 structures with piezoelectric properties, calculated with DFT-PBE. |
941 |
|
Ab-initio electronic transport database for inorganic materials |
47737 |
|
312 steels with experimental yield strength and ultimate tensile strength, extracted and cleaned (including de-duplicating) from Citrine. |
312 |
|
Dataset of ~16,000 experimental superconductivity records (critical temperatures) from Stanev et al., originally from the Japanese National Institute for Materials Science |
16414 |
|
A challenging data set for quantum machine learning containing a diverse set of 12.8k polymorphs in the Zn-Ti-N, Zn-Zr-N and Zn-Hf-N chemical systems |
12815 |
|
Database of ~1,100 experimental thermoelectric materials from UCSB aggregated from 108 source publications and personal communications |
1093 |
|
4,914 perovskite oxides containing composition data, lattice constants, and formation + vacancy formation energies |
4914 |
Dataset info¶
boltztrap_mp¶
Effective mass and thermoelectric properties of 8924 compounds in The Materials Project database that are calculated by the BoltzTraP software package run on the GGA-PBE or GGA+U density functional theory calculation results. The properties are reported at the temperature of 300 Kelvin and the carrier concentration of 1e18 1/cm3.
Number of entries: 8924
Column |
Description |
---|---|
|
Chemical formula of the entry |
|
n-type/conduction band effective mass. Units: m_e where m_e is the mass of an electron; i.e. m_n is a unitless ratio |
|
p-type/valence band effective mass. |
|
Materials Project identifier |
|
n-type thermoelectric power factor in uW/cm2.K where uW is microwatts and a constant relaxation time of 1e-14 assumed. |
|
p-type power factor in uW/cm2.K |
|
n-type Seebeck coefficient in micro Volts per Kelvin |
|
p-type Seebeck coefficient in micro Volts per Kelvin |
|
pymatgen Structure object describing the crystal structure of the material |
Reference
Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4:170085 doi: 10.1038/sdata.2017.85 (2017). Ricci F, Chen W, Aydemir U, Snyder J, Rignanese G, Jain A, Hautier G (2017) Data from: An ab initio electronic transport database for inorganic materials. Dryad Digital Repository. https://doi.org/10.5061/dryad.gn001
Bibtex Formatted Citations
@Article{Ricci2017, author={Ricci, Francesco and Chen, Wei and Aydemir, Umut and Snyder, G. Jeffrey and Rignanese, Gian-Marco and Jain, Anubhav and Hautier, Geoffroy}, title={An ab initio electronic transport database for inorganic materials}, journal={Scientific Data}, year={2017}, month={Jul}, day={04}, publisher={The Author(s)}, volume={4}, pages={170085}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2017.85} }
@misc{dryad_gn001, title = {Data from: An ab initio electronic transport database for inorganic materials}, author = {Ricci, F and Chen, W and Aydemir, U and Snyder, J and Rignanese, G and Jain, A and Hautier, G}, year = {2017}, journal = {Scientific Data}, URL = {https://doi.org/10.5061/dryad.gn001}, doi = {doi:10.5061/dryad.gn001}, publisher = {Dryad Digital Repository} }
brgoch_superhard_training¶
2574 materials used for training regressors that predict shear and bulk modulus.
Number of entries: 2574
Column |
Description |
---|---|
|
features used in brgoch study compressed to a dictionary |
|
VRH bulk modulus |
|
pymatgen composition object |
|
Chemical formula as a string |
|
materials project id |
|
pymatgen structure object |
|
VRH shear modulus |
|
True if bulk or shear value did not closely match (within 5%/1GPa of MP) materials project value at time of cross reference or if no material could be found |
Reference
Machine Learning Directed Search for Ultraincompressible, Superhard Materials Aria Mansouri Tehrani, Anton O. Oliynyk, Marcus Parry, Zeshan Rizvi, Samantha Couper, Feng Lin, Lowell Miyagi, Taylor D. Sparks, and Jakoah Brgoch Journal of the American Chemical Society 2018 140 (31), 9844-9853 DOI: 10.1021/jacs.8b02717
Bibtex Formatted Citations
@article{doi:10.1021/jacs.8b02717, author = {Mansouri Tehrani, Aria and Oliynyk, Anton O. and Parry, Marcus and Rizvi, Zeshan and Couper, Samantha and Lin, Feng and Miyagi, Lowell and Sparks, Taylor D. and Brgoch, Jakoah}, title = {Machine Learning Directed Search for Ultraincompressible, Superhard Materials}, journal = {Journal of the American Chemical Society}, volume = {140}, number = {31}, pages = {9844-9853}, year = {2018}, doi = {10.1021/jacs.8b02717}, note ={PMID: 30010335}, URL = { https://doi.org/10.1021/jacs.8b02717 }, eprint = { https://doi.org/10.1021/jacs.8b02717 } }
castelli_perovskites¶
18,928 perovskites generated with ABX combinatorics, calculating gllbsc band gap and pbe structure, and also reporting absolute band edge positions and heat of formation.
Number of entries: 18928
Column |
Description |
---|---|
|
similar to vbm but for conduction band |
|
heat of formation in eV, Note the reference state for oxygen was computed from oxygen’s chemical potential in water vapor, not as oxygen molecules, to reflect the application which these perovskites were studied for. |
|
the thermodynamic work required to add one electron to the body in eV |
|
fermi bandwidth |
|
Chemical formula of the material |
|
electronic band gap in eV calculated via gllbsc functional |
|
boolean indicator for direct gap |
|
magnetic moment in terms of Bohr magneton |
|
crystal structure represented by pymatgen Structure object |
|
absolute value of valence band edge calculated via gllbsc |
Reference
Ivano E. Castelli, David D. Landis, Kristian S. Thygesen, Søren Dahl, Ib Chorkendorff, Thomas F. Jaramillo and Karsten W. Jacobsen (2012) New cubic perovskites for one- and two-photon water splitting using the computational materials repository. Energy Environ. Sci., 2012,5, 9034-9043 https://doi.org/10.1039/C2EE22341D
Bibtex Formatted Citations
@Article{C2EE22341D, author ="Castelli, Ivano E. and Landis, David D. and Thygesen, Kristian S. and Dahl, Søren and Chorkendorff, Ib and Jaramillo, Thomas F. and Jacobsen, Karsten W.", title ="New cubic perovskites for one- and two-photon water splitting using the computational materials repository", journal ="Energy Environ. Sci.", year ="2012", volume ="5", issue ="10", pages ="9034-9043", publisher ="The Royal Society of Chemistry", doi ="10.1039/C2EE22341D", url ="http://dx.doi.org/10.1039/C2EE22341D", abstract ="A new efficient photoelectrochemical cell (PEC) is one of the possible solutions to the energy and climate problems of our time. Such a device requires development of new semiconducting materials with tailored properties with respect to stability and light absorption. Here we perform computational screening of around 19 000 oxides{,} oxynitrides{,} oxysulfides{,} oxyfluorides{,} and oxyfluoronitrides in the cubic perovskite structure with PEC applications in mind. We address three main applications: light absorbers for one- and two-photon water splitting and high-stability transparent shields to protect against corrosion. We end up with 20{,} 12{,} and 15 different combinations of oxides{,} oxynitrides and oxyfluorides{,} respectively{,} inviting further experimental investigation."}
citrine_thermal_conductivity¶
Thermal conductivity of 872 compounds measured experimentally and retrieved from Citrine database from various references. The reported values are measured at various temperatures of which 295 are at room temperature.
Number of entries: 872
Column |
Description |
---|---|
|
Chemical formula of the dataset entry |
|
units of thermal conductivity |
|
Temperature description of testing conditions |
|
units of testing condition temperature representation |
|
the experimentally measured thermal conductivity in SI units of W/m.K |
Reference
Bibtex Formatted Citations
@misc{Citrine Informatics, title = {Citrination}, howpublished = {\url{https://www.citrination.com/}}, }
dielectric_constant¶
1,056 structures with dielectric properties, calculated with DFPT-PBE.
Number of entries: 1056
Column |
Description |
---|---|
|
Measure of the conductivity of a material |
|
optional: Description string for structure |
|
electronic contribution to dielectric tensor |
|
Total dielectric tensor incorporating both electronic and ionic contributions |
|
Chemical formula of the material |
|
Materials Project ID of the material |
|
optional, metadata descriptor of the datapoint |
|
Refractive Index |
|
The # of atoms in the unit cell of the calculation. |
|
the average of the eigenvalues of the electronic contribution to the dielectric tensor |
|
the average of the eigenvalues of the total (electronic and ionic) contributions to the dielectric tensor |
|
optional: Poscar metadata |
|
Whether the material is potentially ferroelectric |
|
Integer specifying the crystallographic structure of the material |
|
pandas Series defining the structure of the material |
|
Volume of the unit cell in cubic angstroms, For supercell calculations, this quantity refers to the volume of the full supercell. |
Reference
Petousis, I., Mrdjenovich, D., Ballouz, E., Liu, M., Winston, D., Chen, W., Graf, T., Schladt, T. D., Persson, K. A. & Prinz, F. B. High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials. Sci. Data 4, 160134 (2017).
Bibtex Formatted Citations
@Article{Petousis2017, author={Petousis, Ioannis and Mrdjenovich, David and Ballouz, Eric and Liu, Miao and Winston, Donald and Chen, Wei and Graf, Tanja and Schladt, Thomas D. and Persson, Kristin A. and Prinz, Fritz B.}, title={High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials}, journal={Scientific Data}, year={2017}, month={Jan}, day={31}, publisher={The Author(s)}, volume={4}, pages={160134}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2016.134} }
double_perovskites_gap¶
Band gap of 1306 double perovskites (a_1-b_1-a_2-b_2-O6) calculated using Gritsenko, van Leeuwen, van Lenthe and Baerends potential (gllbsc) in GPAW.
Number of entries: 1306
Column |
Description |
---|---|
|
Species occupying the a1 perovskite site |
|
Species occupying the a2 site |
|
Species occupying the b1 site |
|
Species occupying the b2 site |
|
Chemical formula of the entry |
|
electronic band gap (in eV) calculated via gllbsc |
Reference
Dataset discussed in: Pilania, G. et al. Machine learning bandgaps of double perovskites. Sci. Rep. 6, 19375; doi: 10.1038/srep19375 (2016). Dataset sourced from: https://cmr.fysik.dtu.dk/
Bibtex Formatted Citations
@Article{Pilania2016, author={Pilania, G. and Mannodi-Kanakkithodi, A. and Uberuaga, B. P. and Ramprasad, R. and Gubernatis, J. E. and Lookman, T.}, title={Machine learning bandgaps of double perovskites}, journal={Scientific Reports}, year={2016}, month={Jan}, day={19}, publisher={The Author(s)}, volume={6}, pages={19375}, note={Article}, url={http://dx.doi.org/10.1038/srep19375} }
@misc{Computational Materials Repository, title = {Computational Materials Repository}, howpublished = {\url{https://cmr.fysik.dtu.dk/}}, }
double_perovskites_gap_lumo¶
Supplementary lumo data of 55 atoms for the double_perovskites_gap dataset.
Number of entries: 55
Column |
Description |
---|---|
|
Name of the atom whos lumo is listed |
|
Lowest unoccupied molecular obital energy level (in eV) |
Reference
Dataset discussed in: Pilania, G. et al. Machine learning bandgaps of double perovskites. Sci. Rep. 6, 19375; doi: 10.1038/srep19375 (2016). Dataset sourced from: https://cmr.fysik.dtu.dk/
Bibtex Formatted Citations
@Article{Pilania2016, author={Pilania, G. and Mannodi-Kanakkithodi, A. and Uberuaga, B. P. and Ramprasad, R. and Gubernatis, J. E. and Lookman, T.}, title={Machine learning bandgaps of double perovskites}, journal={Scientific Reports}, year={2016}, month={Jan}, day={19}, publisher={The Author(s)}, volume={6}, pages={19375}, note={Article}, url={http://dx.doi.org/10.1038/srep19375} }
@misc{Computational Materials Repository, title = {Computational Materials Repository}, howpublished = {\url{https://cmr.fysik.dtu.dk/}}, }
elastic_tensor_2015¶
1,181 structures with elastic properties calculated with DFT-PBE.
Number of entries: 1181
Column |
Description |
---|---|
|
Lower bound on shear modulus for polycrystalline material |
|
Average of G_Reuss and G_Voigt |
|
Upper bound on shear modulus for polycrystalline material |
|
Lower bound on bulk modulus for polycrystalline material |
|
Average of K_Reuss and K_Voigt |
|
Upper bound on bulk modulus for polycrystalline material |
|
optional: Description string for structure |
|
Tensor describing elastic behavior |
|
measure of directional dependence of the materials elasticity, metric is always >= 0 |
|
Tensor describing elastic behavior corresponding to IEEE orientation, symmetrized to crystal structure |
|
Tensor describing elastic behavior, unsymmetrized, corresponding to POSCAR conventional standard cell orientation |
|
Chemical formula of the material |
|
optional: Sampling parameter from calculation |
|
Materials Project ID of the material |
|
The # of atoms in the unit cell of the calculation. |
|
Describes lateral response to loading |
|
optional: Poscar metadata |
|
Integer specifying the crystallographic structure of the material |
|
pandas Series defining the structure of the material |
|
Volume of the unit cell in cubic angstroms, For supercell calculations, this quantity refers to the volume of the full supercell. |
Reference
Jong, M. De, Chen, W., Angsten, T., Jain, A., Notestine, R., Gamst, A., Sluiter, M., Ande, C. K., Zwaag, S. Van Der, Plata, J. J., Toher, C., Curtarolo, S., Ceder, G., Persson, K. and Asta, M., “Charting the complete elastic properties of inorganic crystalline compounds”, Scientific Data volume 2, Article number: 150009 (2015)
Bibtex Formatted Citations
@Article{deJong2015, author={de Jong, Maarten and Chen, Wei and Angsten, Thomas and Jain, Anubhav and Notestine, Randy and Gamst, Anthony and Sluiter, Marcel and Krishna Ande, Chaitanya and van der Zwaag, Sybrand and Plata, Jose J. and Toher, Cormac and Curtarolo, Stefano and Ceder, Gerbrand and Persson, Kristin A. and Asta, Mark}, title={Charting the complete elastic properties of inorganic crystalline compounds}, journal={Scientific Data}, year={2015}, month={Mar}, day={17}, publisher={The Author(s)}, volume={2}, pages={150009}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2015.9} }
expt_formation_enthalpy¶
Experimental formation enthalpies for inorganic compounds, collected from years of calorimetric experiments. There are 1,276 entries in this dataset, mostly binary compounds. Matching mpids or oqmdids as well as the DFT-computed formation energies are also added (if any).
Number of entries: 1276
Column |
Description |
---|---|
|
experimental formation enthalpy (in eV/atom) |
|
formation enthalpy from Materials Project (in eV/atom) |
|
formation enthalpy from OQMD (in eV/atom) |
|
chemical formula |
|
materials project id |
|
OQMD id |
|
pearson symbol of the structure |
|
space group of the structure |
Reference
https://www.nature.com/articles/sdata2017162
Bibtex Formatted Citations
@Article{Kim2017, author={Kim, George and Meschel, S. V. and Nash, Philip and Chen, Wei}, title={Experimental formation enthalpies for intermetallic phases and other inorganic compounds}, journal={Scientific Data}, year={2017}, month={Oct}, day={24}, publisher={The Author(s)}, volume={4}, pages={170162}, note={Data Descriptor}, url={https://doi.org/10.1038/sdata.2017.162}}
@misc{kim_meschel_nash_chen_2017, title={Experimental formation enthalpies for intermetallic phases and other inorganic compounds}, url={https://figshare.com/collections/Experimental_formation_enthalpies_for_intermetallic_phases_and_other_inorganic_compounds/3822835/1}, DOI={10.6084/m9.figshare.c.3822835.v1}, abstractNote={The standard enthalpy of formation of a compound is the energy associated with the reaction to form the compound from its component elements. The standard enthalpy of formation is a fundamental thermodynamic property that determines its phase stability, which can be coupled with other thermodynamic data to calculate phase diagrams. Calorimetry provides the only direct method by which the standard enthalpy of formation is experimentally measured. However, the measurement is often a time and energy intensive process. We present a dataset of enthalpies of formation measured by high-temperature calorimetry. The phases measured in this dataset include intermetallic compounds with transition metal and rare-earth elements, metal borides, metal carbides, and metallic silicides. These measurements were collected from over 50 years of calorimetric experiments. The dataset contains 1,276 entries on experimental enthalpy of formation values and structural information. Most of the entries are for binary compounds but ternary and quaternary compounds are being added as they become available. The dataset also contains predictions of enthalpy of formation from first-principles calculations for comparison.}, publisher={figshare}, author={Kim, George and Meschel, Susan and Nash, Philip and Chen, Wei}, year={2017}, month={Oct}}
expt_formation_enthalpy_kingsbury¶
Dataset containing experimental standard formation enthalpies for solids. Formation enthalpies were compiled primarily from Kim et al., Kubaschewski, and the NIST JANAF tables (see references). Elements, liquids, and gases were excluded. Data were deduplicated such that each material is associated with a single formation enthalpy value. Refer to Wang et al. (see references) for a complete desciption of the methods used. Materials Project database IDs (mp-ids) were assigned to materials from among computed materials in the Materials Project database (version 2021.03.22) that were 1) not marked ‘theoretical’, 2) had structures matching at least one ICSD material, and 3) were within 200 meV of the DFT-computed stable energy hull (e_above_hull < 0.2 eV). Among these candidates, we chose the mp-id with the lowest e_above_hull that matched the reported spacegroup (where available).
Number of entries: 2135
Column |
Description |
---|---|
|
Chemical formula. |
|
Experimental standard formation enthalpy (298 K), in eV/atom. |
|
Uncertainty reported in the experimental formation energy, in eV/atom. |
|
Description of the material’s crystal structure or space group. |
|
Reference to the original data source. |
|
Materials Project database ID (mp-id) most likely associated with each material. |
Reference
Wang, A., Kingsbury, R., McDermott, M., Horton, M., Jain. A., Ong, S.P., Dwaraknath, S., Persson, K. A framework for quantifying uncertainty in DFT energy corrections. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.14593476.v1
Bibtex Formatted Citations
@article{Kim2017,doi={10.1038/sdata.2017.162},url={https://doi.org/10.1038/sdata.2017.162},year={2017},month=oct,publisher={Springer Science and Business Media {LLC}}, volume = {4}, number = {1}, author = {George Kim and S. V. Meschel and Philip Nash and Wei Chen},title ={Experimental formation enthalpies for intermetallic phases and other inorganic compounds},journal={Scientific Data}}
@misc{kim_meschel_nash_chen_2017, title={Experimental formation enthalpies for intermetallic phases and other inorganic compounds}, url={https://springernature.figshare.com/collections/Experimental_formation_enthalpies_for_intermetallic_phases_and_other_inorganic_compounds/3822835/1}, DOI={10.6084/m9.figshare.c.3822835.v1}, publisher={figshare},author={Kim, George and Meschel, Susan and Nash, Philip and Chen, Wei}, year={2017}, month={Oct} }
@article{Kim2017, doi = {10.1038/sdata.2017.162}, url = {https://doi.org/10.1038/sdata.2017.162}, year = {2017}, month = oct, publisher = {Springer Science and Business Media LLC}}, volume = {4}, number = {1},author = {George Kim and S. V. Meschel and Philip Nash and Wei Chen},title = {Experimental formation enthalpies for intermetallic phases and other inorganic compounds},journal = {Scientific Data}}
@book{Kubaschewski1993,author={Kubaschewski, O. and Alcock, C.B. and Spencer, P.J.},edition={6th},isbn={0080418880},publisher={Pergamon Press},title={{Materials Thermochemistry}},year = {1993}}
@misc{NIST,doi = {10.18434/T42S31},url = {http://kinetics.nist.gov/janaf/},author = {Malcolm W. Chase}, title = {NIST-JANAF Thermochemical Tables}, publisher = {National Institute of Standards and Technology}, year = {1998}, url={https://janaf.nist.org}}
@article{RZYMAN2000309,title = {Enthalpies of formation of AlFe: Experiment versus theory},journal = {Calphad},volume = {24},number = {3},pages = {309-318},year = {2000}, issn = {0364-5916},doi = {https://doi.org/10.1016/S0364-5916(01)00007-4}, url = {https://www.sciencedirect.com/science/article/pii/S0364591601000074}, author = {K. Rzyman and Z. Moser and A.P. Miodownik and L. Kaufman and R.E. Watson and M. Weinert}}
@book{CRC2007,asin = {0849304881},author = {{CRC Handbook}},dewey = {530},ean = {9780849304880},edition = 88,interhash = {da6394e1a9c5f450ed705c32ec82bb08},intrahash = {5ff8f541915536461697300e8727f265},isbn = {0849304881},keywords = {crc_handbook},publisher = {CRC Press},title = {CRC Handbook of Chemistry and Physics, 88th Edition}, year = 2007}
@article{Grindy2013,author = {Grindy, Scott and Meredig, Bryce and Kirklin, Scott and Saal, James E. and Wolverton, C.},doi = {10.1103/PhysRevB.87.075150},issn = {10980121},journal = {Physical Review B - Condensed Matter and Materials Physics},number = {7},pages = {1--8},title = {{Approaching chemical accuracy with density functional calculations: Diatomic energy corrections}},volume = {87},year = {2013}}
expt_gap¶
Experimental band gap of 6354 inorganic semiconductors.
Number of entries: 6354
Column |
Description |
---|---|
|
chemical formula |
|
band gap (in eV) measured experimentally |
Reference
https://pubs.acs.org/doi/suppl/10.1021/acs.jpclett.8b00124
Bibtex Formatted Citations
@article{doi:10.1021/acs.jpclett.8b00124, author = {Zhuo, Ya and Mansouri Tehrani, Aria and Brgoch, Jakoah}, title = {Predicting the Band Gaps of Inorganic Solids by Machine Learning}, journal = {The Journal of Physical Chemistry Letters}, volume = {9}, number = {7}, pages = {1668-1673}, year = {2018}, doi = {10.1021/acs.jpclett.8b00124}, note ={PMID: 29532658}, eprint = { https://doi.org/10.1021/acs.jpclett.8b00124 }}
expt_gap_kingsbury¶
Identical to the matbench_expt_gap dataset, except that Materials Project database IDs (mp-ids) have been associated with each material using the same method as described for the expt_formation_enthalpy_kingsbury dataset. Columns have also been renamed for consistency with the formation enthalpy data.
Number of entries: 4604
Column |
Description |
---|---|
|
Chemical formula. |
|
Experimentally measured bandgap, in eV. |
|
Materials Project database ID (mp-id) most likely associated with each material. |
Reference
Kingsbury, R., Bartel., C., Dwaraknath, S., Gupta, A., Horton, M., Munro, J., Jain. A., Ong, S.P., Persson, K. Comparison of r$^2$SCAN and SCAN metaGGA functionals via an automated, high-throughput computational workflow. In preparation.
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{doi:10.1021/acs.jpclett.8b00124, author = {Zhuo, Ya and Mansouri Tehrani, Aria and Brgoch, Jakoah}, title = {Predicting the Band Gaps of Inorganic Solids by Machine Learning}, journal = {The Journal of Physical Chemistry Letters}, volume = {9}, number = {7}, pages = {1668-1673}, year = {2018}, doi = {10.1021/acs.jpclett.8b00124}, note ={PMID: 29532658}, eprint = { https://doi.org/10.1021/acs.jpclett.8b00124 }}
flla¶
3938 structures and computed formation energies from “Crystal Structure Representations for Machine Learning Models of Formation Energies.”
Number of entries: 3938
Column |
Description |
---|---|
|
The energy of decomposition of this material into the set of most stable materials at this chemical composition, in eV/atom. |
|
Computed formation energy at 0K, 0atm using a reference state of zero for the pure elements. |
|
See formation_energy |
|
Chemical formula of the material |
|
Materials Project ID of the material |
|
The # of atoms in the unit cell of the calculation. |
|
pandas Series defining the structure of the material |
Reference
1) F. Faber, A. Lindmaa, O.A. von Lilienfeld, R. Armiento, “Crystal structure representations for machine learning models of formation energies”, Int. J. Quantum Chem. 115 (2015) 1094–1101. doi:10.1002/qua.24917.
(raw data) 2) Jain, A., Ong, S. P., Hautier, G., Chen, W., Richards, W. D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G. & Persson, K. A. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).
Bibtex Formatted Citations
@article{doi:10.1002/qua.24917, author = {Faber, Felix and Lindmaa, Alexander and von Lilienfeld, O. Anatole and Armiento, Rickard}, title = {Crystal structure representations for machine learning models of formation energies}, journal = {International Journal of Quantum Chemistry}, volume = {115}, number = {16}, pages = {1094-1101}, keywords = {machine learning, formation energies, representations, crystal structure, periodic systems}, doi = {10.1002/qua.24917}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/qua.24917}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/qua.24917}, abstract = {We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a dataset of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) for the respective representations. © 2015 Wiley Periodicals, Inc.} }
@article{doi:10.1063/1.4812323, author = {Jain,Anubhav and Ong,Shyue Ping and Hautier,Geoffroy and Chen,Wei and Richards,William Davidson and Dacek,Stephen and Cholia,Shreyas and Gunter,Dan and Skinner,David and Ceder,Gerbrand and Persson,Kristin A. }, title = {Commentary: The Materials Project: A materials genome approach to accelerating materials innovation}, journal = {APL Materials}, volume = {1}, number = {1}, pages = {011002}, year = {2013}, doi = {10.1063/1.4812323}, URL = {https://doi.org/10.1063/1.4812323}, eprint = {https://doi.org/10.1063/1.4812323} }
glass_binary¶
Metallic glass formation data for binary alloys, collected from various experimental techniques such as melt-spinning or mechanical alloying. This dataset covers all compositions with an interval of 5 at. % in 59 binary systems, containing a total of 5959 alloys in the dataset. The target property of this dataset is the glass forming ability (GFA), i.e. whether the composition can form monolithic glass or not, which is either 1 for glass forming or 0 for non-full glass forming.
Number of entries: 5959
Column |
Description |
---|---|
|
chemical formula |
|
glass forming ability, correlated with the phase column, designating whether the composition can form monolithic glass or not, 1: glass forming (“AM”), 0: non-full-forming(“CR”) |
Reference
https://pubs.acs.org/doi/10.1021/acs.jpclett.7b01046
Bibtex Formatted Citations
@article{doi:10.1021/acs.jpclett.7b01046, author = {Sun, Y. T. and Bai, H. Y. and Li, M. Z. and Wang, W. H.}, title = {Machine Learning Approach for Prediction and Understanding of Glass-Forming Ability}, journal = {The Journal of Physical Chemistry Letters}, volume = {8}, number = {14}, pages = {3434-3439}, year = {2017}, doi = {10.1021/acs.jpclett.7b01046}, note ={PMID: 28697303}, eprint = { https://doi.org/10.1021/acs.jpclett.7b01046 }}
glass_binary_v2¶
Identical to glass_binary dataset, but with duplicate entries merged. If there was a disagreement in gfa when merging the class was defaulted to 1.
Number of entries: 5483
Column |
Description |
---|---|
|
chemical formula |
|
glass forming ability, correlated with the phase column, designating whether the composition can form monolithic glass or not, 1: glass forming (“AM”), 0: non-full-forming(“CR”) |
Reference
https://pubs.acs.org/doi/10.1021/acs.jpclett.7b01046
Bibtex Formatted Citations
@article{doi:10.1021/acs.jpclett.7b01046, author = {Sun, Y. T. and Bai, H. Y. and Li, M. Z. and Wang, W. H.}, title = {Machine Learning Approach for Prediction and Understanding of Glass-Forming Ability}, journal = {The Journal of Physical Chemistry Letters}, volume = {8}, number = {14}, pages = {3434-3439}, year = {2017}, doi = {10.1021/acs.jpclett.7b01046}, note ={PMID: 28697303}, eprint = { https://doi.org/10.1021/acs.jpclett.7b01046 }}
glass_ternary_hipt¶
Metallic glass formation dataset for ternary alloys, collected from the high-throughput sputtering experiments measuring whether it is possible to form a glass using sputtering. The hipt experimental data are of the Co-Fe-Zr, Co-Ti-Zr, Co-V-Zr and Fe-Ti-Nb ternary systems.
Number of entries: 5170
Column |
Description |
---|---|
|
Chemical formula of the entry |
|
Glass forming ability: 1 means glass forming and coresponds to AM, 0 means non glass forming and corresponds to CR |
|
AM: amorphous phase or CR: crystalline phase |
|
How the point was processed, always sputtering for this dataset |
|
System of dataset experiment, one of: CoFeZr, CoTiZr, CoVZr, or FeTiNb |
Reference
Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments By Fang Ren, Logan Ward, Travis Williams, Kevin J. Laws, Christopher Wolverton, Jason Hattrick-Simpers, Apurva Mehta Science Advances 13 Apr 2018 : eaaq1566
Bibtex Formatted Citations
@article {Reneaaq1566, author = {Ren, Fang and Ward, Logan and Williams, Travis and Laws, Kevin J. and Wolverton, Christopher and Hattrick-Simpers, Jason and Mehta, Apurva}, title = {Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments}, volume = {4}, number = {4}, year = {2018}, doi = {10.1126/sciadv.aaq1566}, publisher = {American Association for the Advancement of Science}, abstract = {With more than a hundred elements in the periodic table, a large number of potential new materials exist to address the technological and societal challenges we face today; however, without some guidance, searching through this vast combinatorial space is frustratingly slow and expensive, especially for materials strongly influenced by processing. We train a machine learning (ML) model on previously reported observations, parameters from physiochemical theories, and make it synthesis method{\textendash}dependent to guide high-throughput (HiTp) experiments to find a new system of metallic glasses in the Co-V-Zr ternary. Experimental observations are in good agreement with the predictions of the model, but there are quantitative discrepancies in the precise compositions predicted. We use these discrepancies to retrain the ML model. The refined model has significantly improved accuracy not only for the Co-V-Zr system but also across all other available validation data. We then use the refined model to guide the discovery of metallic glasses in two additional previously unreported ternaries. Although our approach of iterative use of ML and HiTp experiments has guided us to rapid discovery of three new glass-forming systems, it has also provided us with a quantitatively accurate, synthesis method{\textendash}sensitive predictor for metallic glasses that improves performance with use and thus promises to greatly accelerate discovery of many new metallic glasses. We believe that this discovery paradigm is applicable to a wider range of materials and should prove equally powerful for other materials and properties that are synthesis path{\textendash}dependent and that current physiochemical theories find challenging to predict.}, URL = {http://advances.sciencemag.org/content/4/4/eaaq1566}, eprint = {http://advances.sciencemag.org/content/4/4/eaaq1566.full.pdf}, journal = {Science Advances} }
glass_ternary_landolt¶
Metallic glass formation dataset for ternary alloys, collected from the “Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys,’ a volume of the Landolt– Börnstein collection. This dataset contains experimental measurements of whether it is possible to form a glass using a variety of processing techniques at thousands of compositions from hundreds of ternary systems. The processing techniques are designated in the “processing” column. There are originally 7191 experiments in this dataset, will be reduced to 6203 after deduplicated, and will be further reduced to 6118 if combining multiple data for one composition. There are originally 6780 melt-spinning experiments in this dataset, will be reduced to 5800 if deduplicated, and will be further reduced to 5736 if combining multiple experimental data for one composition.
Number of entries: 7191
Column |
Description |
---|---|
|
Chemical formula of the entry |
|
Glass forming ability: 1 means glass forming and corresponds to AM, 0 means non full glass forming and corresponds to CR AC or QC |
|
“AM”: amorphous phase. “CR”: crystalline phase. “AC”: amorphous-crystalline composite phase. “QC”: quasi-crystalline phase. Phases obtained from glass producing experiments |
|
processing method, meltspin or sputtering |
Reference
Y. Kawazoe, T. Masumoto, A.-P. Tsai, J.-Z. Yu, T. Aihara Jr. (1997) Y. Kawazoe, J.-Z. Yu, A.-P. Tsai, T. Masumoto (ed.) SpringerMaterials Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys · 1 Introduction Landolt-Börnstein - Group III Condensed Matter 37A (Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys) https://www.springer.com/gp/book/9783540605072 (Springer-Verlag Berlin Heidelberg © 1997) Accessed: 03-09-2019
Bibtex Formatted Citations
@Misc{LandoltBornstein1997:sm_lbs_978-3-540-47679-5_2, author="Kawazoe, Y. and Masumoto, T. and Tsai, A.-P. and Yu, J.-Z. and Aihara Jr., T.", editor="Kawazoe, Y. and Yu, J.-Z. and Tsai, A.-P. and Masumoto, T.", title="Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys {\textperiodcentered} 1 Introduction: Datasheet from Landolt-B{\"o}rnstein - Group III Condensed Matter {\textperiodcentered} Volume 37A: ``Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys'' in SpringerMaterials (https://dx.doi.org/10.1007/10510374{\_}2)", publisher="Springer-Verlag Berlin Heidelberg", note="Copyright 1997 Springer-Verlag Berlin Heidelberg", note="Part of SpringerMaterials", note="accessed 2018-10-23", doi="10.1007/10510374_2", url="https://materials.springer.com/lb/docs/sm_lbs_978-3-540-47679-5_2" }
@Article{Ward2016, author={Ward, Logan and Agrawal, Ankit and Choudhary, Alok and Wolverton, Christopher}, title={A general-purpose machine learning framework for predicting properties of inorganic materials}, journal={Npj Computational Materials}, year={2016}, month={Aug}, day={26}, publisher={The Author(s)}, volume={2}, pages={16028}, note={Article}, url={http://dx.doi.org/10.1038/npjcompumats.2016.28} }
heusler_magnetic¶
1153 Heusler alloys with DFT-calculated magnetic and electronic properties. The 1153 alloys include 576 full, 449 half and 128 inverse Heusler alloys. The data are extracted and cleaned (including de-duplicating) from Citrine.
Number of entries: 1153
Column |
Description |
---|---|
|
Formation energy in eV/atom |
|
Chemical formula of the entry |
|
Full, Half, or Inverse Heusler |
|
Lattice constant |
|
Magnetic moment |
|
Saturation magnetization in emu/cc |
|
Number of electrons per formula unit |
|
Polarization at Fermi level in % |
|
Structure type |
|
Tetragonality, i.e. c/a |
Reference
https://citrination.com/datasets/150561/
Bibtex Formatted Citations
@misc{Citrine Informatics, title = {University of Alabama Heusler database}, howpublished = {\url{https://citrination.com/datasets/150561/}}, }
jarvis_dft_2d¶
Various properties of 636 2D materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database.
Number of entries: 636
Column |
Description |
---|---|
|
A Pymatgen Composition descriptor of the composition of the material |
|
formation energy per atom, in eV/atom |
|
Static dielectric function in x direction calculated with OptB88vDW functional. |
|
Static dielectric function in x direction calculuated with TBMBJ functional. |
|
Static dielectric function in y direction calculated with OptB88vDW functional. |
|
Static dielectric function in y direction calculuated with TBMBJ functional. |
|
Static dielectric function in z direction calculated with OptB88vDW functional. |
|
Static dielectric function in z direction calculuated with TBMBJ functional. |
|
Exfoliation energy (monolayer formation E) in meV/atom |
|
Band gap calculated with OptB88vDW functional, in eV |
|
Band gap calculated with TBMBJ functional, in eV |
|
JARVIS ID |
|
Materials Project ID |
|
A description of the crystal structure of the material |
|
Initial structure description of the crystal structure of the material |
Reference
2D Dataset discussed in: High-throughput Identification and Characterization of Two dimensional Materials using Density functional theory Kamal Choudhary, Irina Kalish, Ryan Beams & Francesca Tavazza Scientific Reports volume 7, Article number: 5179 (2017) Original 2D Data file sourced from: choudhary, kamal; https://orcid.org/0000-0001-9737-8074 (2018): jdft_2d-7-7-2018.json. figshare. Dataset.
Bibtex Formatted Citations
@Article{Choudhary2017, author={Choudhary, Kamal and Kalish, Irina and Beams, Ryan and Tavazza, Francesca}, title={High-throughput Identification and Characterization of Two-dimensional Materials using Density functional theory}, journal={Scientific Reports}, year={2017}, volume={7}, number={1}, pages={5179}, abstract={We introduce a simple criterion to identify two-dimensional (2D) materials based on the comparison between experimental lattice constants and lattice constants mainly obtained from Materials-Project (MP) density functional theory (DFT) calculation repository. Specifically, if the relative difference between the two lattice constants for a specific material is greater than or equal to 5%, we predict them to be good candidates for 2D materials. We have predicted at least 1356 such 2D materials. For all the systems satisfying our criterion, we manually create single layer systems and calculate their energetics, structural, electronic, and elastic properties for both the bulk and the single layer cases. Currently the database consists of 1012 bulk and 430 single layer materials, of which 371 systems are common to bulk and single layer. The rest of calculations are underway. To validate our criterion, we calculated the exfoliation energy of the suggested layered materials, and we found that in 88.9% of the cases the currently accepted criterion for exfoliation was satisfied. Also, using molybdenum telluride as a test case, we performed X-ray diffraction and Raman scattering experiments to benchmark our calculations and understand their applicability and limitations. The data is publicly available at the website http://www.ctcms.nist.gov/{ extasciitilde}knc6/JVASP.html.}, issn={2045-2322}, doi={10.1038/s41598-017-05402-0}, url={https://doi.org/10.1038/s41598-017-05402-0} }
@misc{choudhary__2018, title={jdft_2d-7-7-2018.json}, url={https://figshare.com/articles/jdft_2d-7-7-2018_json/6815705/1}, DOI={10.6084/m9.figshare.6815705.v1}, abstractNote={2D materials}, publisher={figshare}, author={choudhary, kamal and https://orcid.org/0000-0001-9737-8074}, year={2018}, month={Jul}}
jarvis_dft_3d¶
Various properties of 25,923 bulk materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database.
Number of entries: 25923
Column |
Description |
---|---|
|
VRH average calculation of bulk modulus |
|
A Pymatgen Composition descriptor of the composition of the material |
|
formation energy per atom, in eV/atom |
|
Static dielectric function in x direction calculated with OptB88vDW functional. |
|
Static dielectric function in x direction calculuated with TBMBJ functional. |
|
Static dielectric function in y direction calculated with OptB88vDW functional. |
|
Static dielectric function in y direction calculuated with TBMBJ functional. |
|
Static dielectric function in z direction calculated with OptB88vDW functional. |
|
Static dielectric function in z direction calculuated with TBMBJ functional. |
|
Band gap calculated with OptB88vDW functional, in eV |
|
Band gap calculated with TBMBJ functional, in eV |
|
JARVIS ID |
|
Materials Project ID |
|
VRH average calculation of shear modulus |
|
A description of the crystal structure of the material |
|
Initial structure description of the crystal structure of the material |
Reference
3D Dataset discussed in: Elastic properties of bulk and low-dimensional materials using van der Waals density functional Kamal Choudhary, Gowoon Cheon, Evan Reed, and Francesca Tavazza Phys. Rev. B 98, 014107 Original 3D Data file sourced from: choudhary, kamal; https://orcid.org/0000-0001-9737-8074 (2018): jdft_3d.json. figshare. Dataset.
Bibtex Formatted Citations
@article{PhysRevB.98.014107, title = {Elastic properties of bulk and low-dimensional materials using van der Waals density functional}, author = {Choudhary, Kamal and Cheon, Gowoon and Reed, Evan and Tavazza, Francesca}, journal = {Phys. Rev. B}, volume = {98}, issue = {1}, pages = {014107}, numpages = {12}, year = {2018}, month = {Jul}, publisher = {American Physical Society}, doi = {10.1103/PhysRevB.98.014107}, url = {https://link.aps.org/doi/10.1103/PhysRevB.98.014107} }
@misc{choudhary__2018, title={jdft_3d.json}, url={https://figshare.com/articles/jdft_3d-7-7-2018_json/6815699/2}, DOI={10.6084/m9.figshare.6815699.v2}, abstractNote={https://jarvis.nist.gov/ The Density functional theory section of JARVIS (JARVIS-DFT) consists of thousands of VASP based calculations for 3D-bulk, single layer (2D), nanowire (1D) and molecular (0D) systems. Most of the calculations are carried out with optB88vDW functional. JARVIS-DFT includes materials data such as: energetics, diffraction pattern, radial distribution function, band-structure, density of states, carrier effective mass, temperature and carrier concentration dependent thermoelectric properties, elastic constants and gamma-point phonons.}, publisher={figshare}, author={choudhary, kamal and https://orcid.org/0000-0001-9737-8074}, year={2018}, month={Jul}}
jarvis_ml_dft_training¶
Various properties of 24,759 bulk and 2D materials computed with the OptB88vdW and TBmBJ functionals taken from the JARVIS DFT database.
Number of entries: 24759
Column |
Description |
---|---|
|
VRH average calculation of bulk modulus |
|
A descriptor of the composition of the material |
|
Effective electron mass in x direction (BoltzTraP) |
|
Effective electron mass in y direction (BoltzTraP) |
|
Effective electron mass in z direction (BoltzTraP) |
|
exfoliation energy per atom in eV/atom |
|
formation energy per atom, in eV/atom |
|
Static dielectric function in x direction calculated with OptB88vDW functional. |
|
Static dielectric function in x direction calculated with TBMBJ functional. |
|
Static dielectric function in y direction calculated with OptB88vDW functional. |
|
Static dielectric function in y direction calculated with TBMBJ functional. |
|
Static dielectric function in z direction calculated with OptB88vDW functional. |
|
Static dielectric function in z direction calculated with TBMBJ functional. |
|
Band gap calculated with OptB88vDW functional, in eV |
|
Band gap calculated with TBMBJ functional, in eV |
|
Effective hole mass in x direction (BoltzTraP) |
|
Effective hole mass in y direction (BoltzTraP) |
|
Effective hole mass in z direction (BoltzTraP) |
|
JARVIS ID |
|
Materials Project ID |
|
Magnetic moment, in Bohr Magneton |
|
VRH average calculation of shear modulus |
|
A Pymatgen Structure object describing the crystal structure of the material |
Reference
Dataset discussed in: Machine learning with force-field-inspired descriptors for materials: Fast screening and mapping energy landscape Kamal Choudhary, Brian DeCost, and Francesca Tavazza Phys. Rev. Materials 2, 083801
Original Data file sourced from: choudhary, kamal (2018): JARVIS-ML-CFID-descriptors and material properties. figshare. Dataset.
Bibtex Formatted Citations
@article{PhysRevMaterials.2.083801, title = {Machine learning with force-field-inspired descriptors for materials: Fast screening and mapping energy landscape}, author = {Choudhary, Kamal and DeCost, Brian and Tavazza, Francesca}, journal = {Phys. Rev. Materials}, volume = {2}, issue = {8}, pages = {083801}, numpages = {8}, year = {2018}, month = {Aug}, publisher = {American Physical Society}, doi = {10.1103/PhysRevMaterials.2.083801}, url = {https://link.aps.org/doi/10.1103/PhysRevMaterials.2.083801} }
@misc{choudhary_2018, title={JARVIS-ML-CFID-descriptors and material properties}, url={https://figshare.com/articles/JARVIS-ML-CFID-descriptors_and_material_properties/6870101/1}, DOI={10.6084/m9.figshare.6870101.v1}, abstractNote={Classical force-field inspired descriptors (CFID) for more than 25000 materials and their material properties such as bandgap, formation energies, modulus of elasticity etc. See JARVIS-ML: https://jarvis.nist.gov/}, publisher={figshare}, author={choudhary, kamal}, year={2018}, month={Jul}}
m2ax¶
Elastic properties of 223 stable M2AX compounds from “A comprehensive survey of M2AX phase elastic properties” by Cover et al. Calculations are PAW PW91.
Number of entries: 223
Column |
Description |
---|---|
|
Lattice parameter a, in A (angstrom) |
|
In GPa |
|
lattice parameter c, in A (angstrom) |
|
Elastic constants of the M2AX material. These are specific to hexagonal materials. |
|
Elastic constants of the M2AX material. These are specific to hexagonal materials. |
|
Elastic constants of the M2AX material. These are specific to hexagonal materials. |
|
Elastic constants of the M2AX material. These are specific to hexagonal materials. |
|
Elastic constants of the M2AX material. These are specific to hexagonal materials. |
|
distance from the M atom to the A atom |
|
distance from the M atom to the X atom |
|
In GPa |
|
chemical formula |
|
In GPa |
Reference
http://iopscience.iop.org/article/10.1088/0953-8984/21/30/305403/meta
Bibtex Formatted Citations
@article{M F Cover, author={M F Cover and O Warschkow and M M M Bilek and D R McKenzie}, title={A comprehensive survey of M 2 AX phase elastic properties}, journal={Journal of Physics: Condensed Matter}, volume={21}, number={30}, pages={305403}, url={http://stacks.iop.org/0953-8984/21/i=30/a=305403}, year={2009}, abstract={M 2 AX phases are a family of nanolaminate, ternary alloys that are composed of slabs of transition metal carbide or nitride (M 2 X) separated by single atomic layers of a main group element. In this combination, they manifest many of the beneficial properties of both ceramic and metallic compounds, making them attractive for many technological applications. We report here the results of a large scale computational survey of the elastic properties of all 240 elemental combinations using first-principles density functional theory calculations. We found correlations revealing the governing role of the A element and its interaction with the M element on the c axis compressibility and shearability of the material. The role of the X element is relatively minor, with the strongest effect seen in the in-plane constants C 11 and C 12 . We identify several elemental compositions with extremal properties such as W 2 SnC, which has by far the lowest value of C 44 , suggesting potential applications as a...}}
matbench_dielectric¶
Matbench v0.1 test dataset for predicting refractive index from structure. Adapted from Materials Project database. Removed entries having a formation energy (or energy above the convex hull) more than 150meV and those having refractive indices less than 1 and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 4764
Column |
Description |
---|---|
|
Target variable. Refractive index (unitless). |
|
Pymatgen Structure of the material. |
Reference
Petousis, I., Mrdjenovich, D., Ballouz, E., Liu, M., Winston, D., Chen, W., Graf, T., Schladt, T. D., Persson, K. A. & Prinz, F. B. High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials. Sci. Data 4, 160134 (2017).
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
@article{Petousis2017, author={Petousis, Ioannis and Mrdjenovich, David and Ballouz, Eric and Liu, Miao and Winston, Donald and Chen, Wei and Graf, Tanja and Schladt, Thomas D. and Persson, Kristin A. and Prinz, Fritz B.}, title={High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials}, journal={Scientific Data}, year={2017}, month={Jan}, day={31}, publisher={The Author(s)}, volume={4}, pages={160134}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2016.134} }
matbench_expt_gap¶
Matbench v0.1 test dataset for predicting experimental band gap from composition alone. Retrieved from Zhuo et al. supplementary information. Deduplicated according to composition, removing compositions with reported band gaps spanning more than a 0.1eV range; remaining compositions were assigned values based on the closest experimental value to the mean experimental value for that composition among all reports. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 4604
Column |
Description |
---|---|
|
Chemical formula. |
|
Target variable. Experimentally measured gap, in eV. |
Reference
Zhuo, A. Masouri Tehrani, J. Brgoch (2018) Predicting the Band Gaps of Inorganic Solids by Machine Learning J. Phys. Chem. Lett. 2018, 9, 7, 1668-1673 https:doi.org/10.1021/acs.jpclett.8b00124.
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{doi:10.1021/acs.jpclett.8b00124, author = {Zhuo, Ya and Mansouri Tehrani, Aria and Brgoch, Jakoah}, title = {Predicting the Band Gaps of Inorganic Solids by Machine Learning}, journal = {The Journal of Physical Chemistry Letters}, volume = {9}, number = {7}, pages = {1668-1673}, year = {2018}, doi = {10.1021/acs.jpclett.8b00124}, note ={PMID: 29532658}, eprint = { https://doi.org/10.1021/acs.jpclett.8b00124 }}
matbench_expt_is_metal¶
Matbench v0.1 test dataset for classifying metallicity from composition alone. Retrieved from Zhuo et al. supplementary information. Deduplicated according to composition, ensuring no conflicting reports were entered for any compositions (i.e., no reported compositions were both metal and nonmetal). For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 4921
Column |
Description |
---|---|
|
Chemical formula. |
|
Target variable. 1 if is a metal, 0 if nonmetal. |
Reference
Zhuo, A. Masouri Tehrani, J. Brgoch (2018) Predicting the Band Gaps of Inorganic Solids by Machine Learning J. Phys. Chem. Lett. 2018, 9, 7, 1668-1673
https//:doi.org/10.1021/acs.jpclett.8b00124.
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{doi:10.1021/acs.jpclett.8b00124, author = {Zhuo, Ya and Mansouri Tehrani, Aria and Brgoch, Jakoah}, title= {Predicting the Band Gaps of Inorganic Solids by Machine Learning}, journal = {The Journal of Physical Chemistry Letters}, volume = {9}, number = {7}, pages = {1668-1673}, year = {2018}, doi = {10.1021/acs.jpclett.8b00124}, note ={PMID: 29532658}, eprint = { https://doi.org/10.1021/acs.jpclett.8b00124 }}
matbench_glass¶
Matbench v0.1 test dataset for predicting full bulk metallic glass formation ability from chemical formula. Retrieved from “Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys,’ a volume of the Landolt– Börnstein collection. Deduplicated according to composition, ensuring no compositions were reported as both GFA and not GFA (i.e., all reports agreed on the classification designation). For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 5680
Column |
Description |
---|---|
|
Chemical formula. |
|
Target variable. Glass forming ability: 1 means glass forming and corresponds to amorphous, 0 means non full glass forming. |
Reference
Y. Kawazoe, T. Masumoto, A.-P. Tsai, J.-Z. Yu, T. Aihara Jr. (1997) Y. Kawazoe, J.-Z. Yu, A.-P. Tsai, T. Masumoto (ed.) SpringerMaterials Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys · 1 Introduction Landolt-Börnstein - Group III Condensed Matter 37A (Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys) https://www.springer.com/gp/book/9783540605072 (Springer-Verlag Berlin Heidelberg © 1997) Accessed: 03-09-2019
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Misc{LandoltBornstein1997:sm_lbs_978-3-540-47679-5_2, author="Kawazoe, Y. and Masumoto, T. and Tsai, A.-P. and Yu, J.-Z. and Aihara Jr., T.", editor="Kawazoe, Y. and Yu, J.-Z. and Tsai, A.-P. and Masumoto, T.", title="Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys {\textperiodcentered} 1 Introduction: Datasheet from Landolt-B{\"o}rnstein - Group III Condensed Matter {\textperiodcentered} Volume 37A: ``Nonequilibrium Phase Diagrams of Ternary Amorphous Alloys'' in SpringerMaterials (https://dx.doi.org/10.1007/10510374{\_}2)", publisher="Springer-Verlag Berlin Heidelberg", note="Copyright 1997 Springer-Verlag Berlin Heidelberg", note="Part of SpringerMaterials", note="accessed 2018-10-23", doi="10.1007/10510374_2", url="https://materials.springer.com/lb/docs/sm_lbs_978-3-540-47679-5_2" }
@Article{Ward2016, author={Ward, Logan and Agrawal, Ankit and Choudhary, Alok and Wolverton, Christopher}, title={A general-purpose machine learning framework for predicting properties of inorganic materials}, journal={Npj Computational Materials}, year={2016}, month={Aug}, day={26}, publisher={The Author(s)}, volume={2}, pages={16028}, note={Article}, url={http://dx.doi.org/10.1038/npjcompumats.2016.28} }
matbench_jdft2d¶
Matbench v0.1 test dataset for predicting exfoliation energies from crystal structure (computed with the OptB88vdW and TBmBJ functionals). Adapted from the JARVIS DFT database. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 636
Column |
Description |
---|---|
|
Target variable. Exfoliation energy (meV/atom). |
|
Pymatgen Structure of the material. |
Reference
2D Dataset discussed in: High-throughput Identification and Characterization of Two dimensional Materials using Density functional theory Kamal Choudhary, Irina Kalish, Ryan Beams & Francesca Tavazza Scientific Reports volume 7, Article number: 5179 (2017) Original 2D Data file sourced from: choudhary, kamal; https://orcid.org/0000-0001-9737-8074 (2018): jdft_2d-7-7-2018.json. figshare. Dataset.
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Article{Choudhary2017, author={Choudhary, Kamal and Kalish, Irina and Beams, Ryan and Tavazza, Francesca}, title={High-throughput Identification and Characterization of Two-dimensional Materials using Density functional theory}, journal={Scientific Reports}, year={2017}, volume={7}, number={1}, pages={5179}, abstract={We introduce a simple criterion to identify two-dimensional (2D) materials based on the comparison between experimental lattice constants and lattice constants mainly obtained from Materials-Project (MP) density functional theory (DFT) calculation repository. Specifically, if the relative difference between the two lattice constants for a specific material is greater than or equal to 5%, we predict them to be good candidates for 2D materials. We have predicted at least 1356 such 2D materials. For all the systems satisfying our criterion, we manually create single layer systems and calculate their energetics, structural, electronic, and elastic properties for both the bulk and the single layer cases. Currently the database consists of 1012 bulk and 430 single layer materials, of which 371 systems are common to bulk and single layer. The rest of calculations are underway. To validate our criterion, we calculated the exfoliation energy of the suggested layered materials, and we found that in 88.9% of the cases the currently accepted criterion for exfoliation was satisfied. Also, using molybdenum telluride as a test case, we performed X-ray diffraction and Raman scattering experiments to benchmark our calculations and understand their applicability and limitations. The data is publicly available at the website http://www.ctcms.nist.gov/{ extasciitilde}knc6/JVASP.html.}, issn={2045-2322}, doi={10.1038/s41598-017-05402-0}, url={https://doi.org/10.1038/s41598-017-05402-0} }
@misc{choudhary__2018, title={jdft_2d-7-7-2018.json}, url={https://figshare.com/articles/jdft_2d-7-7-2018_json/6815705/1}, DOI={10.6084/m9.figshare.6815705.v1}, abstractNote={2D materials}, publisher={figshare}, author={choudhary, kamal and https://orcid.org/0000-0001-9737-8074}, year={2018}, month={Jul}}
matbench_log_gvrh¶
Matbench v0.1 test dataset for predicting DFT log10 VRH-average shear modulus from structure. Adapted from Materials Project database. Removed entries having a formation energy (or energy above the convex hull) more than 150meV and those having negative G_Voigt, G_Reuss, G_VRH, K_Voigt, K_Reuss, or K_VRH and those failing G_Reuss <= G_VRH <= G_Voigt or K_Reuss <= K_VRH <= K_Voigt and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 10987
Column |
Description |
---|---|
|
Target variable. Base 10 logarithm of the DFT Voigt-Reuss-Hill average shear moduli in GPa |
|
Pymatgen Structure of the material. |
Reference
Jong, M. De, Chen, W., Angsten, T., Jain, A., Notestine, R., Gamst, A., Sluiter, M., Ande, C. K., Zwaag, S. Van Der, Plata, J. J., Toher, C., Curtarolo, S., Ceder, G., Persson, K. and Asta, M., “Charting the complete elastic properties of inorganic crystalline compounds”, Scientific Data volume 2, Article number: 150009 (2015)
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Article{deJong2015, author={de Jong, Maarten and Chen, Wei and Angsten, Thomas and Jain, Anubhav and Notestine, Randy and Gamst, Anthony and Sluiter, Marcel and Krishna Ande, Chaitanya and van der Zwaag, Sybrand and Plata, Jose J. and Toher, Cormac and Curtarolo, Stefano and Ceder, Gerbrand and Persson, Kristin A. and Asta, Mark}, title={Charting the complete elastic properties of inorganic crystalline compounds}, journal={Scientific Data}, year={2015}, month={Mar}, day={17}, publisher={The Author(s)}, volume={2}, pages={150009}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2015.9} }
matbench_log_kvrh¶
Matbench v0.1 test dataset for predicting DFT log10 VRH-average bulk modulus from structure. Adapted from Materials Project database. Removed entries having a formation energy (or energy above the convex hull) more than 150meV and those having negative G_Voigt, G_Reuss, G_VRH, K_Voigt, K_Reuss, or K_VRH and those failing G_Reuss <= G_VRH <= G_Voigt or K_Reuss <= K_VRH <= K_Voigt and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 10987
Column |
Description |
---|---|
|
Target variable. Base 10 logarithm of the DFT Voigt-Reuss-Hill average bulk moduli in GPa. |
|
Pymatgen Structure of the material. |
Reference
Jong, M. De, Chen, W., Angsten, T., Jain, A., Notestine, R., Gamst, A., Sluiter, M., Ande, C. K., Zwaag, S. Van Der, Plata, J. J., Toher, C., Curtarolo, S., Ceder, G., Persson, K. and Asta, M., “Charting the complete elastic properties of inorganic crystalline compounds”, Scientific Data volume 2, Article number: 150009 (2015)
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Article{deJong2015, author={de Jong, Maarten and Chen, Wei and Angsten, Thomas and Jain, Anubhav and Notestine, Randy and Gamst, Anthony and Sluiter, Marcel and Krishna Ande, Chaitanya and van der Zwaag, Sybrand and Plata, Jose J. and Toher, Cormac and Curtarolo, Stefano and Ceder, Gerbrand and Persson, Kristin A. and Asta, Mark}, title={Charting the complete elastic properties of inorganic crystalline compounds}, journal={Scientific Data}, year={2015}, month={Mar}, day={17}, publisher={The Author(s)}, volume={2}, pages={150009}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2015.9} }
matbench_mp_e_form¶
Matbench v0.1 test dataset for predicting DFT formation energy from structure. Adapted from Materials Project database. Removed entries having formation energy more than 2.5eV and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 132752
Column |
Description |
---|---|
|
Target variable. Formation energy in eV as calculated by the Materials Project. |
|
Pymatgen Structure of the material. |
Reference
A. Jain*, S.P. Ong*, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson (*=equal contributions) The Materials Project: A materials genome approach to accelerating materials innovation APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
matbench_mp_gap¶
Matbench v0.1 test dataset for predicting DFT PBE band gap from structure. Adapted from Materials Project database. Removed entries having a formation energy (or energy above the convex hull) more than 150meV and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 106113
Column |
Description |
---|---|
|
Target variable. The band gap as calculated by PBE DFT from the Materials Project, in eV. |
|
Pymatgen Structure of the material. |
Reference
A. Jain*, S.P. Ong*, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson (*=equal contributions) The Materials Project: A materials genome approach to accelerating materials innovation APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
matbench_mp_is_metal¶
Matbench v0.1 test dataset for predicting DFT metallicity from structure. Adapted from Materials Project database. Removed entries having a formation energy (or energy above the convex hull) more than 150meV and those containing noble gases. Retrieved April 2, 2019. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 106113
Column |
Description |
---|---|
|
Target variable. 1 if the compound is a metal, 0 if the compound is not a metal. Metallicity determined with pymatgen |
|
Pymatgen Structure of the material. |
Reference
A. Jain*, S.P. Ong*, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson (*=equal contributions) The Materials Project: A materials genome approach to accelerating materials innovation APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
matbench_perovskites¶
Matbench v0.1 test dataset for predicting formation energy from crystal structure. Adapted from an original dataset generated by Castelli et al. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 18928
Column |
Description |
---|---|
|
Target variable. Heat of formation of the entire 5-atom perovskite cell, in eV as calculated by RPBE GGA-DFT. Note the reference state for oxygen was computed from oxygen’s chemical potential in water vapor, not as oxygen molecules, to reflect the application which these perovskites were studied for. |
|
Pymatgen Structure of the material. |
Reference
Ivano E. Castelli, David D. Landis, Kristian S. Thygesen, Søren Dahl, Ib Chorkendorff, Thomas F. Jaramillo and Karsten W. Jacobsen (2012) New cubic perovskites for one- and two-photon water splitting using the computational materials repository. Energy Environ. Sci., 2012,5, 9034-9043 https://doi.org/10.1039/C2EE22341D
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Article{C2EE22341D, author ="Castelli, Ivano E. and Landis, David D. and Thygesen, Kristian S. and Dahl, Søren and Chorkendorff, Ib and Jaramillo, Thomas F. and Jacobsen, Karsten W.", title ="New cubic perovskites for one- and two-photon water splitting using the computational materials repository", journal ="Energy Environ. Sci.", year ="2012", volume ="5", issue ="10", pages ="9034-9043", publisher ="The Royal Society of Chemistry", doi ="10.1039/C2EE22341D", url ="http://dx.doi.org/10.1039/C2EE22341D", abstract ="A new efficient photoelectrochemical cell (PEC) is one of the possible solutions to the energy and climate problems of our time. Such a device requires development of new semiconducting materials with tailored properties with respect to stability and light absorption. Here we perform computational screening of around 19 000 oxides{,} oxynitrides{,} oxysulfides{,} oxyfluorides{,} and oxyfluoronitrides in the cubic perovskite structure with PEC applications in mind. We address three main applications: light absorbers for one- and two-photon water splitting and high-stability transparent shields to protect against corrosion. We end up with 20{,} 12{,} and 15 different combinations of oxides{,} oxynitrides and oxyfluorides{,} respectively{,} inviting further experimental investigation."}
matbench_phonons¶
Matbench v0.1 test dataset for predicting vibration properties from crystal structure. Original data retrieved from Petretto et al. Original calculations done via ABINIT in the harmonic approximation based on density functional perturbation theory. Removed entries having a formation energy (or energy above the convex hull) more than 150meV. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 1265
Column |
Description |
---|---|
|
Target variable. Frequency of the highest frequency optical phonon mode peak, in units of 1/cm; ; may be used as an estimation of dominant longitudinal optical phonon frequency. |
|
Pymatgen Structure of the material. |
Reference
Petretto, G. et al. High-throughput density functional perturbation theory phonons for inorganic materials. Sci. Data 5:180065 doi: 10.1038/sdata.2018.65 (2018). Petretto, G. et al. High-throughput density functional perturbation theory phonons for inorganic materials. (2018). figshare. Collection.
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@Article{Petretto2018, author={Petretto, Guido and Dwaraknath, Shyam and P.C. Miranda, Henrique and Winston, Donald and Giantomassi, Matteo and van Setten, Michiel J. and Gonze, Xavier and Persson, Kristin A. and Hautier, Geoffroy and Rignanese, Gian-Marco}, title={High-throughput density-functional perturbation theory phonons for inorganic materials}, journal={Scientific Data}, year={2018}, month={May}, day={01}, publisher={The Author(s)}, volume={5}, pages={180065}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2018.65} }
@misc{petretto_dwaraknath_miranda_winston_giantomassi_rignanese_van setten_gonze_persson_hautier_2018, title={High-throughput Density-Functional Perturbation Theory phonons for inorganic materials}, url={https://figshare.com/collections/High-throughput_Density-Functional_Perturbation_Theory_phonons_for_inorganic_materials/3938023/1}, DOI={10.6084/m9.figshare.c.3938023.v1}, abstractNote={The knowledge of the vibrational properties of a material is of key importance to understand physical phenomena such as thermal conductivity, superconductivity, and ferroelectricity among others. However, detailed experimental phonon spectra are available only for a limited number of materials which hinders the large-scale analysis of vibrational properties and their derived quantities. In this work, we perform ab initio calculations of the full phonon dispersion and vibrational density of states for 1521 semiconductor compounds in the harmonic approximation based on density functional perturbation theory. The data is collected along with derived dielectric and thermodynamic properties. We present the procedure used to obtain the results, the details of the provided database and a validation based on the comparison with experimental data.}, publisher={figshare}, author={Petretto, Guido and Dwaraknath, Shyam and Miranda, Henrique P. C. and Winston, Donald and Giantomassi, Matteo and Rignanese, Gian-Marco and Van Setten, Michiel J. and Gonze, Xavier and Persson, Kristin A and Hautier, Geoffroy}, year={2018}, month={Apr}}
matbench_steels¶
Matbench v0.1 test dataset for predicting steel yield strengths from chemical composition alone. Retrieved from Citrine informatics. Deduplicated. For benchmarking w/ nested cross validation, the order of the dataset must be identical to the retrieved data; refer to the Automatminer/Matbench publication for more details.
Number of entries: 312
Column |
Description |
---|---|
|
Chemical formula. |
|
Target variable. Experimentally measured steel yield strengths, in MPa. |
Reference
https://citrination.com/datasets/153092/
Bibtex Formatted Citations
@Article{Dunn2020, author={Dunn, Alexander and Wang, Qi and Ganose, Alex and Dopp, Daniel and Jain, Anubhav}, title={Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm}, journal={npj Computational Materials}, year={2020}, month={Sep}, day={15}, volume={6}, number={1}, pages={138}, abstract={We present a benchmark test suite and an automated machine learning procedure for evaluating supervised machine learning (ML) models for predicting properties of inorganic bulk materials. The test suite, Matbench, is a set of 13{\thinspace}ML tasks that range in size from 312 to 132k samples and contain data from 10 density functional theory-derived and experimental sources. Tasks include predicting optical, thermal, electronic, thermodynamic, tensile, and elastic properties given a material's composition and/or crystal structure. The reference algorithm, Automatminer, is a highly-extensible, fully automated ML pipeline for predicting materials properties from materials primitives (such as composition and crystal structure) without user intervention or hyperparameter tuning. We test Automatminer on the Matbench test suite and compare its predictive power with state-of-the-art crystal graph neural networks and a traditional descriptor-based Random Forest model. We find Automatminer achieves the best performance on 8 of 13 tasks in the benchmark. We also show our test suite is capable of exposing predictive advantages of each algorithm---namely, that crystal graph methods appear to outperform traditional machine learning methods given {\textasciitilde}104 or greater data points. We encourage evaluating materials ML algorithms on the Matbench benchmark and comparing them against the latest version of Automatminer.}, issn={2057-3960}, doi={10.1038/s41524-020-00406-3}, url={https://doi.org/10.1038/s41524-020-00406-3} }
@misc{Citrine Informatics, title = {Mechanical properties of some steels}, howpublished = {\url{https://citrination.com/datasets/153092/}, }
mp_all_20181018¶
A complete copy of the Materials Project database as of 10/18/2018. mp_all files contain structure data for each material while mp_nostruct does not.
Number of entries: 83989
Column |
Description |
---|---|
|
in GPa, average of Voight, Reuss, and Hill |
|
Formation energy per atom (eV) |
|
The calculated energy above the convex hull, in eV per atom |
|
The ratio of elastic anisotropy. |
|
The chemical formula of the MP entry |
|
The band gap in eV calculated with PBE-DFT functional |
|
A Pymatgen Structure object describing the material crystal structure prior to relaxation |
|
(input): The Materials Project mpid, as a string. |
|
The total magnetization of the unit cell. |
|
in GPa, average of Voight, Reuss, and Hill |
|
A Pymatgen Structure object describing the material crystal structure |
Reference
A. Jain*, S.P. Ong*, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson (*=equal contributions) The Materials Project: A materials genome approach to accelerating materials innovation APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323
Bibtex Formatted Citations
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
mp_nostruct_20181018¶
A complete copy of the Materials Project database as of 10/18/2018. mp_all files contain structure data for each material while mp_nostruct does not.
Number of entries: 83989
Column |
Description |
---|---|
|
in GPa, average of Voight, Reuss, and Hill |
|
Formation energy per atom (eV) |
|
The calculated energy above the convex hull, in eV per atom |
|
The ratio of elastic anisotropy. |
|
The chemical formula of the MP entry |
|
The band gap in eV calculated with PBE-DFT functional |
|
(input): The Materials Project mpid, as a string. |
|
The total magnetization of the unit cell. |
|
in GPa, average of Voight, Reuss, and Hill |
Reference
A. Jain*, S.P. Ong*, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, K.A. Persson (*=equal contributions) The Materials Project: A materials genome approach to accelerating materials innovation APL Materials, 2013, 1(1), 011002. doi:10.1063/1.4812323
Bibtex Formatted Citations
@article{Jain2013, author = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.}, doi = {10.1063/1.4812323}, issn = {2166532X}, journal = {APL Materials}, number = {1}, pages = {011002}, title = {{The Materials Project: A materials genome approach to accelerating materials innovation}}, url = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\&Agg=doi}, volume = {1}, year = {2013} }
phonon_dielectric_mp¶
Phonon (lattice/atoms vibrations) and dielectric properties of 1296 compounds computed via ABINIT software package in the harmonic approximation based on density functional perturbation theory.
Number of entries: 1296
Column |
Description |
---|---|
|
A target variable of the dataset, electronic contribution to the calculated dielectric constant; unitless. |
|
A target variable of the dataset, total calculated dielectric constant. Unitless: it is a ratio over the dielectric constant at vacuum. |
|
The chemical formula of the material |
|
A target variable of the dataset, the frequency of the last calculated phonon density of states in 1/cm; may be used as an estimation of dominant longitudinal optical phonon frequency, a descriptor. |
|
The Materials Project identifier for the material |
|
A pymatgen Structure object describing the chemical strucutre of the material |
Reference
Petretto, G. et al. High-throughput density functional perturbation theory phonons for inorganic materials. Sci. Data 5:180065 doi: 10.1038/sdata.2018.65 (2018). Petretto, G. et al. High-throughput density functional perturbation theory phonons for inorganic materials. (2018). figshare. Collection.
Bibtex Formatted Citations
@Article{Petretto2018, author={Petretto, Guido and Dwaraknath, Shyam and P.C. Miranda, Henrique and Winston, Donald and Giantomassi, Matteo and van Setten, Michiel J. and Gonze, Xavier and Persson, Kristin A. and Hautier, Geoffroy and Rignanese, Gian-Marco}, title={High-throughput density-functional perturbation theory phonons for inorganic materials}, journal={Scientific Data}, year={2018}, month={May}, day={01}, publisher={The Author(s)}, volume={5}, pages={180065}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2018.65} }
@misc{petretto_dwaraknath_miranda_winston_giantomassi_rignanese_van setten_gonze_persson_hautier_2018, title={High-throughput Density-Functional Perturbation Theory phonons for inorganic materials}, url={https://figshare.com/collections/High-throughput_Density-Functional_Perturbation_Theory_phonons_for_inorganic_materials/3938023/1}, DOI={10.6084/m9.figshare.c.3938023.v1}, abstractNote={The knowledge of the vibrational properties of a material is of key importance to understand physical phenomena such as thermal conductivity, superconductivity, and ferroelectricity among others. However, detailed experimental phonon spectra are available only for a limited number of materials which hinders the large-scale analysis of vibrational properties and their derived quantities. In this work, we perform ab initio calculations of the full phonon dispersion and vibrational density of states for 1521 semiconductor compounds in the harmonic approximation based on density functional perturbation theory. The data is collected along with derived dielectric and thermodynamic properties. We present the procedure used to obtain the results, the details of the provided database and a validation based on the comparison with experimental data.}, publisher={figshare}, author={Petretto, Guido and Dwaraknath, Shyam and Miranda, Henrique P. C. and Winston, Donald and Giantomassi, Matteo and Rignanese, Gian-Marco and Van Setten, Michiel J. and Gonze, Xavier and Persson, Kristin A and Hautier, Geoffroy}, year={2018}, month={Apr}}
piezoelectric_tensor¶
941 structures with piezoelectric properties, calculated with DFT-PBE.
Number of entries: 941
Column |
Description |
---|---|
|
optional: Description string for structure |
|
Piezoelectric modulus |
|
Chemical formula of the material |
|
Materials Project ID of the material |
|
optional, metadata descriptor of the datapoint |
|
The # of atoms in the unit cell of the calculation. |
|
Tensor describing the piezoelectric properties of the material |
|
Descriptor of crystallographic structure of the material |
|
optional: Poscar metadata |
|
Integer specifying the crystallographic structure of the material |
|
pandas Series defining the structure of the material |
|
Crystallographic direction |
|
Volume of the unit cell in cubic angstroms, For supercell calculations, this quantity refers to the volume of the full supercell. |
Reference
de Jong, M., Chen, W., Geerlings, H., Asta, M. & Persson, K. A. A database to enable discovery and design of piezoelectric materials. Sci. Data 2, 150053 (2015)
Bibtex Formatted Citations
@Article{deJong2015, author={de Jong, Maarten and Chen, Wei and Geerlings, Henry and Asta, Mark and Persson, Kristin Aslaug}, title={A database to enable discovery and design of piezoelectric materials}, journal={Scientific Data}, year={2015}, month={Sep}, day={29}, publisher={The Author(s)}, volume={2}, pages={150053}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2015.53} }
ricci_boltztrap_mp_tabular¶
Ab-initio electronic transport database for inorganic materials. Complex multivariable BoltzTraP simulation data is condensed down into tabular form of two main motifs: average eigenvalues at set moderate carrier concentrations and temperatures, and optimal values among all carrier concentrations and temperatures within certain ranges. Here are reported the average of the eigenvalues of conductivity effective mass (mₑᶜᵒⁿᵈ), the Seebeck coefficient (S), the conductivity (σ), the electronic thermal conductivity (κₑ), and the Power Factor (PF) at a doping level of 10¹⁸ cm⁻³ and at a temperature of 300 K for n- and p-type. Also, the maximum values for S, σ, PF, and the minimum value for κₑ chosen among the temperatures [100, 1300] K, the doping levels [10¹⁶, 10²¹] cm⁻³, and doping types are reported. The properties that depend on the relaxation time are reported divided by the constant value 10⁻¹⁴. The average of the eigenvalues for all the properties at all the temperatures, doping levels, and doping types are reported in the tables for each entry. Data is indexed by materials project id (mpid)
Number of entries: 47737
Column |
Description |
---|---|
|
Materials project task_id. |
|
Type of DFT functional (GGA=generalized gradient approximation, GGA+U=GGA + U approximation) |
|
If True, crystal is a metal. |
|
Band gap, in eV. |
|
Unit cell volume, in cubic angstrom. |
|
Average eigenvalue of the Seebeck coefficient with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K, in microVolts/Kelvin. |
|
Average eigenvalue of the Seebeck coefficient with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K, in microVolts/Kelvin. |
|
Value of p-type Seebeck coefficient at maximum average eigenvalue of Seebeck coefficient chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to Sᵉ.p.v [µV/K] (max p-type Seebeck), in Kelvin. |
|
Carrier concentration corresponding to Sᵉ.p.v [µV/K] (max p-type Seebeck), in cm^-3 |
|
Value of n-type Seebeck coefficient at maximum average eigenvalue of Seebeck coefficient chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to Sᵉ.n.v [µV/K] (max n-type Seebeck), in Kelvin. |
|
Carrier concentration corresponding to Sᵉ.n.v [µV/K] (max n-type Seebeck), in cm^-3 |
|
Average eigenvalue of the conductivity with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K, in 1/Ω/m/s. |
|
Average eigenvalue of the conductivity with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K, in 1/Ω/m/s. |
|
Average eigenvalue of the power factor with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K, in µW/cm/K²/s. |
|
Average eigenvalue of the power factor with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K, in µW/cm/K²/s. |
|
Value of p-type conductivity at maximum average eigenvalue of conductivity chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to σᵉ.p.T [1/Ω/m/s], in Kelvin. |
|
Carrier concentration corresponding to σᵉ.p.T [1/Ω/m/s], in cm^-3. |
|
Value of n-type conductivity at maximum average eigenvalue of conductivity chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to σᵉ.n.T [1/Ω/m/s], in Kelvin. |
|
Carrier concentration corresponding to σᵉ.n.T [1/Ω/m/s], in cm^-3. |
|
Value of p-type power factor at maximum average eigenvalue of power factor chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to PFᵉ.p.v [µW/cm/K²/s], in Kelvin. |
|
Carrier concentration corresponding to PFᵉ.p.v [µW/cm/K²/s], in cm^-3. |
|
Value of n-type power factor at maximum average eigenvalue of power factor chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to PFᵉ.n.v [µW/cm/K²/s], in Kelvin. |
|
Carrier concentration corresponding to PFᵉ.n.v [µW/cm/K²/s], in cm^-3. |
|
Average eigenvalue of electrical thermal conductivity with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K, in [W/K/m/s]. |
|
Average eigenvalue of electrical thermal conductivity with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K, in [W/K/m/s]. |
|
Value of p-type electrical thermal conductivity at maximum average eigenvalue of electrical thermal conductivity chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to κₑᵉ.p.v [W/K/m/s], in Kelvin. |
|
Carrier concentration corresponding to κₑᵉ.p.v [W/K/m/s], in cm^-3. |
|
Value of n-type electrical thermal conductivity at maximum average eigenvalue of electrical thermal conductivity chosen among temperatures 100-1300K, doping levels 10^16-10^21cm^-3. |
|
Temperature corresponding to κₑᵉ.n.v [W/K/m/s], in Kelvin. |
|
Carrier concentration corresponding to κₑᵉ.n.v [W/K/m/s], in cm^-3. |
|
Average (ε̄) eigenvalue of conductivity effective mass with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K. |
|
1st eigenvalue of conductivity effective mass with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K. |
|
2nd eigenvalue of conductivity effective mass with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K. |
|
3rd eigenvalue of conductivity effective mass with hole concentration of 10^-18 carriers/cm^-3 (p-type) at 300K. |
|
Average (ε̄) eigenvalue of conductivity effective mass with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K. |
|
1st eigenvalue of conductivity effective mass with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K. |
|
2nd eigenvalue of conductivity effective mass with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K. |
|
3rd eigenvalue of conductivity effective mass with electron concentration of 10^-18 carriers/cm^-3 (n-type) at 300K. |
|
Pymatgen structure, taken from Materials Project April 2021 |
|
Formula for composition corresponding to MPID. |
Reference
Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4:170085 doi: 10.1038/sdata.2017.85 (2017). Ricci F, Chen W, Aydemir U, Snyder J, Rignanese G, Jain A, Hautier G (2017) Data from: An ab initio electronic transport database for inorganic materials. Dryad Digital Repository. https://doi.org/10.5061/dryad.gn001
Bibtex Formatted Citations
@Article{Ricci2017, author={Ricci, Francesco and Chen, Wei and Aydemir, Umut and Snyder, G. Jeffrey and Rignanese, Gian-Marco and Jain, Anubhav and Hautier, Geoffroy}, title={An ab initio electronic transport database for inorganic materials}, journal={Scientific Data}, year={2017}, month={Jul}, day={04}, publisher={The Author(s)}, volume={4}, pages={170085}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2017.85} }
@misc{dryad_gn001, title = {Data from: An ab initio electronic transport database for inorganic materials}, author = {Ricci, F and Chen, W and Aydemir, U and Snyder, J and Rignanese, G and Jain, A and Hautier, G}, year = {2017}, journal = {Scientific Data}, URL = {https://doi.org/10.5061/dryad.gn001}, doi = {doi:10.5061/dryad.gn001}, publisher = {Dryad Digital Repository} }
steel_strength¶
312 steels with experimental yield strength and ultimate tensile strength, extracted and cleaned (including de-duplicating) from Citrine.
Number of entries: 312
Column |
Description |
---|---|
|
weight percent of Al |
|
weight percent of C |
|
weight percent of Co |
|
weight percent of Cr |
|
elongation in % |
|
Chemical formula of the entry |
|
weight percent of Mn |
|
weight percent of Mo |
|
weight percent of N |
|
weight percent of Nb |
|
weight percent of Ni |
|
weight percent of Si |
|
ultimate tensile strength in MPa |
|
weight percent of Ti |
|
weight percent of V |
|
weight percent of W |
|
yield strength in MPa |
Reference
https://citrination.com/datasets/153092/
Bibtex Formatted Citations
@misc{Citrine Informatics, title = {Mechanical properties of some steels}, howpublished = {\url{https://citrination.com/datasets/153092/}, }
superconductivity2018¶
Dataset of ~16,000 experimental superconductivity records (critical temperatures) from Stanev et al., originally from the Japanese National Institute for Materials Science. Does not include structural data. Includes ~300 measurements from materials found without superconductivity (Tc=0). No modifications were made to the core dataset, aside from basic file type change to json for (un)packaging with matminer. Reproduced under the Creative Commons 4.0 license, which can be found here: http://creativecommons.org/licenses/by/4.0/.
Number of entries: 16414
Column |
Description |
---|---|
|
Chemical formula. |
|
Experimental superconducting temperature, in K. |
Reference
https://doi.org/10.1038/s41524-018-0085-8
Bibtex Formatted Citations
@article{Stanev2018, doi = {10.1038/s41524-018-0085-8}, url = {https://doi.org/10.1038/s41524-018-0085-8}, year = {2018}, month = jun, publisher = {Springer Science and Business Media {LLC}}, volume = {4}, number = {1}, author = {Valentin Stanev and Corey Oses and A. Gilad Kusne and Efrain Rodriguez and Johnpierre Paglione and Stefano Curtarolo and Ichiro Takeuchi}, title = {Machine learning modeling of superconducting critical temperature}, journal = {npj Computational Materials} }
@misc{NIMSSuperCon, howpublished={http://supercon.nims.go.jp/index_en.html}, title={SuperCon}, author={National Institute of Materials Science, Materials Information Station}}
tholander_nitrides¶
A challenging data set for quantum machine learning containing a diverse set of 12.8k polymorphs in the Zn-Ti-N, Zn-Zr-N and Zn-Hf-N chemical systems. The phase diagrams of the Ti-Zn-N, Zr-Zn-N, and Hf-Zn-N systems are determined using large-scale high-throughput density functional calculations (DFT-GGA) (PBE). In total 12,815 relaxed structures are shared alongside their energy calculated using the VASP DFT code. The High-Throughput Toolkit was used to manage the calculations. Data adapted and deduplicated from the original data on Zenodo at https://zenodo.org/record/5530535#.YjJ3ZhDMJLQ, published under MIT licence. Collated from separate files of chemical systems and deduplicated according to identical structures matching ht_ids. Prepared in collaboration with Rhys Goodall.
Number of entries: 12815
Column |
Description |
---|---|
|
Human readable identifier for each material. |
|
Unique identifier to track the calculation in httk |
|
A pymatgen structure object representing the structure before relaxation. |
|
A pymatgen structure object representing the structure after relaxation. |
|
The VASP calculated energy per atom for the final structure, in eV/atom |
|
The chemical system represented by the atoms actually contained in the structure |
Reference
https://zenodo.org/record/5530535#.YjJ3ZhDMJLQ
Bibtex Formatted Citations
@article{tholander2016strong, title={Strong piezoelectric response in stable TiZnN2, ZrZnN2, and HfZnN2 found by ab initio high-throughput approach}, author={Tholander, Christopher and Andersson, CBA and Armiento, Rickard and Tasnadi, Ferenc and Alling, Bj{\"o}rn}, journal={Journal of Applied Physics}, volume={120}, number={22}, pages={225102}, year={2016}, publisher={AIP Publishing LLC} }
ucsb_thermoelectrics¶
Database of ~1,100 experimental thermoelectric materials from UCSB aggregated from 108 source publications and personal communications. Downloaded from Citrine. Source UCSB webpage is http://www.mrl.ucsb.edu:8080/datamine/thermoelectric.jsp. See reference for more information on original data aggregation. No duplicate entries are present, but each src may result in multiple measurements of the same materials’ properties at different temperatures or conditions.
Number of entries: 1093
Column |
Description |
---|---|
|
Chemical formula. |
|
Either single crystal, polycrystalline, or nanoparticles. |
|
Brief string describing the synthesis method |
|
Spacegroup number, if available |
|
Electrical resistivity, in ohm.cm |
|
Seebeck coefficient, in microVolts/K, if available |
|
Thermoelectric power factor, conductivity * Seebeck^2, in [W/mK^2] if available |
|
Thermoelectric figure of merit, PF * T/K, unitless, if available |
|
Thermal conductivity in Watt/ meter * Kelvin, if available |
|
Electrical conductivity, in Siemens/cm, if available |
|
Temperature in Kelvin at which these properties were obtained, if available |
|
Original source of the recording. To cite the aggregator of the data, see the bibtext_refs section of this metadata. |
Reference
https://citrination.com/datasets/150557/
Bibtex Formatted Citations
@article{Gaultois2013, doi = {10.1021/cm400893e}, url = {https://doi.org/10.1021/cm400893e}, year = {2013}, month = may, publisher = {American Chemical Society ({ACS})}, volume = {25}, number = {15}, pages = {2911--2920}, author = {Michael W. Gaultois and Taylor D. Sparks and Christopher K. H. Borg and Ram Seshadri and William D. Bonificio and David R. Clarke}, title = {Data-Driven Review of Thermoelectric Materials: Performance and Resource Considerations}, journal = {Chemistry of Materials} }
@misc{Citrine Informatics, title = {UCSB Thermoelectrics Database}, howpublished = {\url{https://citrination.com/datasets/150557/}, }
wolverton_oxides¶
4,914 perovskite oxides containing composition data, lattice constants, and formation + vacancy formation energies. All perovskites are of the form ABO3. Adapted from a dataset presented by Emery and Wolverton.
Number of entries: 4914
Column |
Description |
---|---|
|
Lattice parameter a, in A (angstrom) |
|
Lattice angle alpha, in degrees |
|
The atom in the ‘A’ site of the pervoskite. |
|
The atom in the ‘B’ site of the perovskite. |
|
Lattice parameter b, in A (angstrom) |
|
Lattice angle beta, in degrees |
|
Lattice parameter c, in A (angstrom) |
|
Formation energy in eV |
|
Formation energy of oxygen vacancy (eV) |
|
Energy above convex hull, wrt. OQMD db (eV) |
|
Chemical formula of the entry |
|
Lattice angle gamma, in degrees |
|
Bandgap in eV from PBE calculations |
|
Local distortion crystal structure with lowest energy among all considered distortions. |
|
Magnetic moment |
|
Volume per atom (A^3/atom) |
Reference
Emery, A. A. & Wolverton, C. High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites. Sci. Data 4:170153 doi: 10.1038/sdata.2017.153 (2017). Emery, A. A., & Wolverton, C. Figshare http://dx.doi.org/10.6084/m9.figshare.5334142 (2017)
Bibtex Formatted Citations
@Article{Emery2017, author={Emery, Antoine A. and Wolverton, Chris}, title={High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites}, journal={Scientific Data}, year={2017}, month={Oct}, day={17}, publisher={The Author(s)}, volume={4}, pages={170153}, note={Data Descriptor}, url={http://dx.doi.org/10.1038/sdata.2017.153} }
@misc{emery_2017, title={High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites}, url={https://figshare.com/articles/High-throughput_DFT_calculations_of_formation_energy_stability_and_oxygen_vacancy_formation_energy_of_ABO3_perovskites/5334142/1}, DOI={10.6084/m9.figshare.5334142.v1}, abstractNote={ABO3 perovskites are oxide materials that are used for a variety of applications such as solid oxide fuel cells, piezo-, ferro-electricity and water splitting. Due to their remarkable stability with respect to cation substitution, new compounds for such applications potentially await discovery. In this work, we present an exhaustive dataset of formation energies of 5,329 cubic and distorted perovskites that were calculated using first-principles density functional theory. In addition to formation energies, several additional properties such as oxidation states, band gap, oxygen vacancy formation energy, and thermodynamic stability with respect to all phases in the Open Quantum Materials Database are also made publicly available. This large dataset for this ubiquitous crystal structure type contains 395 perovskites that are predicted to be thermodynamically stable, of which many have not yet been experimentally reported, and therefore represent theoretical predictions. The dataset thus opens avenues for future use, including materials discovery in many research-active areas.}, publisher={figshare}, author={Emery, Antoine}, year={2017}, month={Aug}}