Basic Usage

Basic usage of Automatminer requires using only one class - MatPipe.

MatPipe works with pandas dataframes as input and output. It is able to train on training data using it’s fit method, predict on new data using predict, and run benchmarks using benchmark - all in an automatic and end-to-end fashion.

Materials primitives (e.g., crystal structures) go in one end, and property predictions come out the other. MatPipe handles the intermediate operations such as assigning descriptors, cleaning problematic data, data conversions, imputation, and machine learning.

This is just a quick overview of the basic functionality. For a detailed and comprehensive tutorial, see the jupyter notebooks in the automatminer directory of the matminer_examples repository.

Initializing a pipeline

The easiest way to initialize a matpipe is using a preset.

from automatminer import MatPipe

pipe = MatPipe.from_preset("express")

This preset is a set of options specifying exactly how each of MatPipe’s constituent classes are set up. Typically, the “express” preset will give you results with a moderate degree of accuracy and relatively quick training, so we’ll use that here.

Note: The default MatPipe() is equivalent to MatPipe.from_preset("express"); other presets have different configuration options!

Training a pipeline

MatPipe has similar fit/transform syntax to scikit-learn. Your dataframe might be of the form:




<structure object>


<structure object>


<structure object>


Where the structure column contains pymatgen Structure objects and the property column is the property you are interested in (the target). Use fit to train, and specify the target column. For the dataframe we used above, you’d do:

from automatminer import MatPipe

pipe = MatPipe.from_preset("express")

# Fitting pipe on train_df using "my_property" as target, "my_property")

The MatPipe is now fit and can be used to make predictions on new data!

Making predictions

Once the pipeline is fit, we can make predictions on out-of-sample data, provided that data has the same input types that our pipeline was trained on. For example:



<structure object>

<structure object>

<structure object>

Use predict to predict new data.

from automatminer import MatPipe

pipe = MatPipe.from_preset("express"), "my_property")

# Predicting my_property values of some unknown prediction_df structures
prediction_df = pipe.predict(prediction_df)

The output will be stored in a column called "<your property> predicted".



my_property predicted

<structure object>


<structure object>


<structure object>


Using different presets

You can try out different configurations - such as more intensive featurization routines, quicker training, etc. by initializing MatPipe with a different config.

The “heavy” preset typically includes more CPU-intensive featurization and longer training times.

from automatminer import MatPipe

pipe = MatPipe.from_preset("heavy")

In contrast, use “debug” if you want very quick predictions.

from automatminer import MatPipe

pipe = MatPipe.from_preset("debug")

Saving your pipeline for later

Once fit, you can save your pipeline as a pickle file:"my_pipeline.p")

To load your file, use the MatPipe.load static method.

pipe = MatPipe.load("my_pipeline.p")

Examine your pipeline


For a human-readable executive summary of your pipeline, use MatPipe.summarize().

summary = pipe.summarize()

The dict returned by summarize specifies the top-level information as strings. An analogy: if your pipeline were a plumbing system, summarize would tell you how long each section of pipe is and the pump model.


To get comprehensive details on a pipeline, use MatPipe.inspect().

details = pipe.inspect()

Inspection specifies all parameters to all Automatminer objects needed to construct the pipeline and all of its internal operations. In contrast to the summary which provides a more human interpretable digest, inspection generates the true attribute names and values of each object in the MatPipe heirarchy. It is typically very long, though human readable. An analogy: if your pipeline were a plumbing system, inspect would tell you everything summarize tells you, plus the model numbers of all the bolts, joints, and valves.

Save to a file

Both summarize and inspect accept a filename argument if you’d like to save their outputs to JSON, YAML, or text.

summary = pipe.summarize("my_summary.yaml")
details = pipe.inspect("my_details.json")

Monitoring the log

The Automatminer log is a powerful tool for determining what is happening within the pipeline in real time. We recommend you monitor it closely as the pipeline runs.

In addition to the stdout, automatminer writes a log file in the current working directory (automatminer.log, timestamped if duplicates).

Here’s an example of an automatminer log when fitting on a dataset.

2019-10-11 16:05:41 INFO     Problem type is: regression
2019-10-11 16:05:41 INFO     Fitting MatPipe pipeline to data.
2019-10-11 16:05:41 INFO     AutoFeaturizer: Starting fitting.
2019-10-11 16:05:41 INFO     AutoFeaturizer: Adding compositions from structures.
2019-10-11 16:05:47 INFO     DataCleaner: Handling feature na by max na threshold of 0.01 with method 'drop'.
2019-10-11 16:05:47 INFO     DataCleaner: After handling na: 636 samples, 168 features
2019-10-11 16:05:47 INFO     DataCleaner: Finished fitting.
2019-10-11 16:05:47 INFO     FeatureReducer: Starting fitting.
2019-10-11 16:05:47 INFO     FeatureReducer: 57 features removed due to cross correlation more than 0.95
2019-10-11 16:05:49 INFO     TreeFeatureReducer: Finished tree-based feature reduction of 110 initial features to 13
2019-10-11 16:05:49 INFO     FeatureReducer: Finished fitting.
2019-10-11 16:05:49 INFO     FeatureReducer: Starting transforming.
2019-10-11 16:05:49 INFO     FeatureReducer: Finished transforming.
2019-10-11 16:05:49 INFO     TPOTAdaptor: Starting fitting.
2019-10-11 16:07:50 INFO     TPOTAdaptor: Finished fitting.
2019-10-11 16:07:50 INFO     MatPipe successfully fit.

If you see WARNING or ERROR, you should inspect the pipeline to make sure everything is configured as intended. If you see a CRITICAL, it is likely something is misconfigured within the pipeline and should be looked into in detail!

Quick reminders

A quick note: Default MatPipe configs automatically infer the type of pymatgen object from the dataframe column name: e.g.,

“composition” = pymatgen.Composition,

“structure” = pymatgen.Structure,

“bandstructure” = pymatgen.electronic_structure.bandstructure.BandStructure,

“dos” = pymatgen.electronic_structure.dos.DOS.

Make sure your dataframe has the correct name for its input! If you want to use custom names, see the advanced usage page.