Automatminer is a tool for automatically creating complete machine learning pipelines for materials science, including automatic featurization with matminer, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.
How it works¶
Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline. Once a pipeline has been fit, it can be summarized in a text file, saved to disk, or used to make predictions on new materials.
Automatminer uses pandas dataframes for all of its working objects. Put dataframes in, get dataframes out.
Here’s an example of training on known data, and extending the model to out of sample data.
from automatminer.pipeline import MatPipe
# Fit a pipeline to training data to predict band gap
pipe = MatPipe()
pipe.fit(train_df, "band gap")
# Predict bandgap of some unknown materials
predicted_df = pipe.predict(unknown_df)
Overview¶
Automatminer can work with many kinds of data:
both computational and experimental data
small (~100 samples) to moderate (~100k samples) sized datasets
crystalline datasets
composition-only (i.e., unknown phases) datasets
datasets containing electronic bandstructures or density of states
Many kinds of target properties:
electronic
mechanical
thermodynamic
any other kind of property
And many featurization (descriptor) techniques:
See matminer’s Table of Featurizers for a full (and growing) list.
Automatminer is designed to be easy to use and reproducible
Save pipelines which are portable across machines
Fit a complete pipeline with 1 line of code
Predict on new samples with 1 line of code
Presets for easy setup
Automatminer is automatic and accurate
No hand tuning required
Comparable in accuracy to hand-tuned models in benchmark tests
User manual¶
Contributing / Contact / Support¶
Want to see something added or changed? Some ways to get involved are:
Help us improve the documentation – tell us where you got stuck and improve the install process for everyone.
Let us know if you’d like to see certain features.
Point us to areas of the code that are difficult to understand or use.
Contribute code! You can do this by forking Automatminer on Github and submitting a pull request.
Post to our support forum. Don’t be shy, we look forward to feedback!
See our contribution guidelines for more inspect. For a list of contributors, see our GitHub page
Citing Automatminer or MatBench¶
If you find Automatminer or the MatBench benchmarks helpful in your research, please consider citing our publication in npj Computational Materials:
Dunn, A., Wang, Q., Ganose, A., Dopp, D., Jain, A. Benchmarking Materials Property Prediction
Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Computational Materials
6, 138 (2020). https://doi.org/10.1038/s41524-020-00406-3
API documentation¶
Autogenerated API documentation. Beware! Only for the brave.