MatBench benchmark

Overview

MatBench is an ImageNet for materials science; a set of 13 benchmarking ML problems for fair comparison, across a wide domain of inorganic materials science applications.

matbench

Details on the benchmark are coming soon in a publication we have submitted. Stay tuned for more details on the evaluation procedure, best scores, and more!

For now, you can still access the benchmark datasets. See the “Accessing MatBench” section for more info.

Accessing MatBench

We have made the MatBench benchmark publicly available via the matminer datasets repository (and also via Figshare).

You can download the datasets with the matminer.datasets.load_dataset function; the names of the datasets are named matbench-* where * is the name of the benchmark problem.

Here’s the MatBench benchmark for predicting refractive index (calculated with DFPT) from crystal structure.

from matminer.datasets import load_dataset

# Download and load the dataset
# The dataset is stored locally after being downloaded the first time
df = load_dataset("matbench_dielectric")

# Check out the downloaded dataframe
print(df)

df (matbench_dielectric)

structure

n

<structure object>

1.752064

<structure object>

1.652859

<structure object>

1.867858

Find all the MatBench problem names and info here (search for “matbench”).

Note: Larger datasets will take several minutes to load.