Automatminer is a tool for automatically creating complete machine learning pipelines for materials science, including automatic featurization with matminer, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.

How it works

Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline. Once a pipeline has been fit, it can be summarized in a text file, saved to disk, or used to make predictions on new materials.


Automatminer uses pandas dataframes for all of its working objects. Put dataframes in, get dataframes out.


Here’s an example of training on known data, and extending the model to out of sample data.

from automatminer.pipeline import MatPipe

# Fit a pipeline to training data to predict band gap
pipe = MatPipe(), "band gap")

# Predict bandgap of some unknown materials
predicted_df = pipe.predict(unknown_df)


Automatminer can work with many kinds of data:

  • both computational and experimental data

  • small (~100 samples) to moderate (~100k samples) sized datasets

  • crystalline datasets

  • composition-only (i.e., unknown phases) datasets

  • datasets containing electronic bandstructures or density of states

Many kinds of target properties:

  • electronic

  • mechanical

  • thermodynamic

  • any other kind of property

And many featurization (descriptor) techniques:

See matminer’s Table of Featurizers for a full (and growing) list.

Automatminer is designed to be easy to use and reproducible

  • Save pipelines which are portable across machines

  • Fit a complete pipeline with 1 line of code

  • Predict on new samples with 1 line of code

  • Presets for easy setup

Automatminer is automatic and accurate

  • No hand tuning required

  • Comparable in accuracy to hand-tuned models in benchmark tests

What’s new?

Track changes to automatminer through the changelog.

Contributing / Contact / Support

Want to see something added or changed? Some ways to get involved are:

  • Help us improve the documentation – tell us where you got stuck and improve the install process for everyone.

  • Let us know if you’d like to see certain features.

  • Point us to areas of the code that are difficult to understand or use.

  • Contribute code! You can do this by forking Automatminer on Github and submitting a pull request.

  • Post to our support forum. Don’t be shy, we look forward to feedback!

See our contribution guidelines for more inspect. For a list of contributors, see our GitHub page

Citing Automatminer or MatBench

If you find Automatminer or the MatBench benchmarks helpful in your research, please consider citing our publication in npj Computational Materials:

Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A. Benchmarking Materials Property Prediction
Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Computational Materials
6, 138 (2020).

API documentation

Autogenerated API documentation. Beware! Only for the brave.