MatBench v0.1 benchmark¶
Overview¶
MatBench is an ImageNet for materials science; a set of 13 supervised, pre-cleaned, ready-to-use ML tasks for benchmarking and fair comparison. The tasks span a wide domain of inorganic materials science applications.
You can find details and results on the benchmark in our paper Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm. Please consider citing this paper if you use Matbench v0.1 for benchmarking, comparison, or prototyping.
Leaderboard¶
task name |
verified top score (MAE or ROCAUC) |
algorithm name, config, |
is algorithm general purpose? (same config on all problems) |
---|---|---|---|
|
0.299 (unitless) |
Automatminer express v1.0.3.2019111 |
yes |
|
0.416 eV |
Automatminer express v1.0.3.2019111 |
yes |
|
0.92 |
Automatminer express v1.0.3.2019111 |
yes |
|
0.861 |
Automatminer express v1.0.3.2019111 |
yes |
|
38.6 meV/atom |
Automatminer express v1.0.3.2019111 |
yes |
|
0.0849 log(GPa) |
Automatminer express v1.0.3.2019111 |
yes |
|
0.0679 log(GPa) |
Automatminer express v1.0.3.2019111 |
yes |
|
0.0327 eV/atom |
MEGNet v0.2.2 |
yes, structure only |
|
0.228 eV |
CGCNN (2019) |
yes, structure only |
|
0.977 |
MEGNet v0.2.2 |
yes, structure only |
|
0.0417 |
MEGNet v0.2.2 |
yes, structure only |
|
36.9 cm^-1 |
MEGNet v0.2.2 |
yes, structure only |
|
95.2 MPa |
Automatminer express v1.0.3.2019111 |
yes |
Accessing the ML tasks¶
There are three ways to access the Matbench problems:
Programmatically, via the matminer datasets repository. Recommended for benchmarking and test usage. See the code examples in the following sections for details on this process.
Interactively, through the Materials Project MPContribs-ML Deployment; links to each dataset are in the table below.
Via static download links (given in table).
Here’s a full list of the 13 datasets in Matbench v0.1:
task name |
target column (unit) |
number of samples |
task type |
links |
---|---|---|---|---|
|
|
4764 |
regression |
|
|
|
4604 |
regression |
|
|
|
4921 |
classification |
|
|
|
5680 |
classification |
|
|
|
636 |
regression |
|
|
|
10987 |
regression |
|
|
|
10987 |
regression |
|
|
|
132752 |
regression |
|
|
|
106113 |
regression |
|
|
|
106113 |
classification |
|
|
|
18928 |
regression |
|
|
|
1265 |
regression |
|
|
|
312 |
regression |
Getting dataset info¶
You can get more info (such as the meaning of column names, brief cleaning
procedures, etc.) on a dataset with matminer.datasets.get_all_dataset_info
:
from matminer.datasets import get_all_dataset_info
# Get dataset info from matminer
info = get_all_dataset_info("matbench_steels")
# Check out the info about the dataset.
print(info)
Dataset: matbench_steels
Description: Matbench v0.1 dataset for predicting steel yield strengths from chemical composition alone. Retrieved from Citrine informatics. Deduplicated.
Columns:
composition: Chemical formula.
yield strength: Target variable. Experimentally measured steel yield strengths, in MPa.
Num Entries: 312
Reference: https://citrination.com/datasets/153092/
Bibtex citations: ['@misc{Citrine Informatics,\ntitle = {Mechanical properties of some steels},\nhowpublished = {\\url{https://citrination.com/datasets/153092/},\n}']
File type: json.gz
Figshare URL: https://ml.materialsproject.org/projects/matbench_steels.json.gz
You can also view all the Matbench datasets on the matminer Dataset Summary page (search for “matbench”).
(Down)loading datasets¶
While you can download the zipped json datasets via the download links above, we recommend using matminer’s tools to load datasets. Matminer intelligently manages the dataset downloads in its central folder and provides methods for robustly loading dataframes containing pymatgen primitives such as structures.
You can load the datasets with the matminer.datasets.load_dataset
function; the function accepts the dataset name as an argument.
Here’s an example of loading the Matbench task for predicting refractive index (calculated with
DFPT) from crystal structure.
from matminer.datasets import load_dataset
# Download and load the dataset
# The dataset is stored locally after being downloaded the first time
df = load_dataset("matbench_dielectric")
# Check out the downloaded dataframe
print(df)
structure n
0 [[4.29304147 2.4785886 1.07248561] S, [4.2930... 1.752064
1 [[3.95051434 4.51121437 0.28035002] K, [4.3099... 1.652859
2 [[-1.78688104 4.79604117 1.53044621] Rb, [-1... 1.867858
3 [[4.51438064 4.51438064 0. ] Mn, [0.133... 2.676887
4 [[-4.36731958 6.8886097 0.50929706] Li, [-2... 1.793232
... ...
4759 [[ 2.79280881 0.12499663 -1.84045389] Ca, [-2... 2.136837
4760 [[0. 5.50363806 3.84192106] O, [4.7662... 2.690619
4761 [[0. 0. 0.] Ba, [ 0.23821924 4.32393487 -0.35... 2.811494
4762 [[0. 0.18884638 0. ] K, [0. ... 1.832887
4763 [[0. 0. 0.] Cs, [2.80639641 2.80639641 2.80639... 2.559279
[4764 rows x 2 columns]
This loads the dataframe in this format:
df
(matbench_dielectric
)
|
|
---|---|
|
1.752064 |
|
1.652859 |
|
1.867858 |
… |
… |
Note: Larger datasets will take several minutes to load.
Benchmarking and reporting your algorithm¶
Benchmarking on Matbench v0.1 is done exclusively with nested cross validation (NCV). See more details on NCV on the Advanced Usage page and the original publication.
If you want to evaluate your own (algorithm outside of the Automatminer framework) and compare to the scores on this page, please use the following steps:
Download the dataset programmatically through matminer (instructions above). Note the dataset must be used in the exact order in which it was downloaded.
Generate test folds: Use the scikit-learn
KFold
(5 splits, shuffled, random seed 18012019) for regression problems andStratifiedKFold
(5 splits, shuffled, random seed 18012019) for classification problems.- For each fold:
Train, validate, and select your best model using this fold’s set of training data only. After training and validating, no modiications may be made to the model based on the test set of this fold.
Remove the target variable column from the test set. Use this model to predict the test set. Note: this test data is for reporting only, and cannot be used for validation or training within this fold.
Record the mean MAE or ROC-AUC for each fold’s test set. Save the test fold data.
Save your model.
Post your results for verification. Make a post on the discussion forum with the tag [Matbench] in the title. Once your results are verified, your algorithm will appear on the leaderboard!
If you are benchmarking a general-purpose algorithm, please include results for all Matbench v0.1 datasets.