automatminer package¶

Subpackages¶

Submodules¶

automatminer.base module¶

Base classes, mixins, and other inheritables.

class automatminer.base.DFTransformer¶

Bases: abc.ABC, sklearn.base.BaseEstimator

A base class to allow easy transformation in the same way as TransformerMixin and BaseEstimator in sklearn, but for pandas dataframes.

When implementing a base class adaptor, make sure to use @check_fitted and @set_fitted if necessary!

abstract fit(df, target, **fit_kwargs)¶

Fits the transformer to a dataframe, given a target.

Parameters

df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
fit_kwargs – Keyword paramters for fitting

Returns

(DataFrameTransformer) This object (self)

fit_transform(df, target)¶

Combines the fitting and transformation of a dataframe.

Parameters

df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

abstract transform(df, target, **transform_kwargs)¶

Transforms a dataframe.

Parameters

df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
transform_kwargs – Keyword paramters for transforming

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

automatminer.pipeline module¶

The highest level classes for pipelines.

class automatminer.pipeline.MatPipe(autofeaturizer=None, cleaner=None, reducer=None, learner=None)¶

Bases: automatminer.base.DFTransformer

Establish an ML pipeline for transforming compositions, structures, bandstructures, and DOS objects into machine-learned properties.

The pipeline includes:

featurization
ml-preprocessing
automl model fitting and creation

If you are using MatPipe for benchmarking, use the “benchmark” method.

If you have some training data and want to use MatPipe for production predictions (e.g., predicting material properties for which you have no data) use “fit” and “predict”.

The pipeline is transferrable. It can be fit on one dataset and used to predict the properties of another. The entire pipeline and all constituent objects can be summarized (via “summarize”) or inspected (via “inspect”) in human readable formats.

Note: This pipeline should function the same regardless of which “component” classes it is made out of. E.g. the steps for each method should remain the same whether using the TPOTAdaptor class as the learner or using an SinglePipelineAdaptor class as the learner. To use a preset config, use MatPipe.from_preset(preset) —————————————————————————-

Examples

# A benchmarking experiment, where all property values are known pipe = MatPipe() test_predictions = pipe.benchmark(df, “target_property”)

# Creating a pipe with data containing known properties, then predicting # on new materials pipe = MatPipe() pipe.fit(training_df, “target_property”) predictions = pipe.predict(unknown_df)

# Getting a MatPipe from preset pipe = MatPipe.from_preset(“debug”)

Parameters

autofeaturizer (AutoFeaturizer) – The autofeaturizer object used to automatically decorate the dataframe with descriptors.
cleaner (DataCleaner) – The data cleaner object used to get a featurized dataframe in ml-ready form.
reducer (FeatureReducer) – The feature reducer object used to select the best features from a “clean” dataframe.
learner (DFMLAdaptor) – The auto ml adaptor object used to actually run a auto-ml pipeline on the clean, reduced, featurized dataframe.

version¶

The automatminer version used for serialization and deserialization.

Type: str

The following attributes are set during fitting. Each has their own set

of attributes which defines more specifically how the pipeline works.

pre_fit_df¶

The dataframe on which the pipeline was fit.

Type: pd.DataFrame

post_fit_df¶

The dataframe transformed into the ML-ready form.

Type: pd.DataFrame

ml_type¶

Specifies regression or classification.

Type: str

target¶

The name of the column where target values are held.

Type: str

benchmark(**kwargs)¶

fit(**kwargs)¶

static from_preset(preset='express', **powerups)¶

Get a preset MatPipe from a string using automatminer.presets.get_preset_config

See get_preeset_config for more inspect.

Parameters

preset (str) –
The preset configuration to use. Current presets are:
- production
- express (recommended for most problems)
- express_single (no AutoML, XGBoost only)
- heavy
- debug
- debug_single (no AutoML, XGBoost only)
powerups (kwargs) –
General upgrades/changes to apply. Current powerups are:
- cache_src (str): The cache source if you want to save
  features.
- n_jobs (int): The number of parallel process to use when
  running.

inspect(**kwargs)¶

static load(filename, supress_version_mismatch=False)¶

Loads a MatPipe that was saved.

Parameters

filename (str) – The pickled MatPipe object (should have been saved using save).
supress_version_mismatch (bool) – If False, throws an error when there is a version mismatch between a serialized MatPipe and the current Automatminer version. If True, suppresses this error.

Returns

A MatPipe object.

Return type

pipe (MatPipe)

predict(**kwargs)¶

save(**kwargs)¶

summarize(**kwargs)¶

transform(df, **transform_kwargs)¶

Transforms a dataframe.

Parameters

df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
transform_kwargs – Keyword paramters for transforming

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

automatminer.presets module¶

Configurations for MatPipe.

automatminer.presets.get_available_presets()¶

Return all available presets for MatPipes.

Returns: A list of preset names.
Return type: ([str])

automatminer.presets.get_preset_config(preset='express', **powerups)¶

Preset configs for MatPipe.

USER: “express” - Good for quick benchmarks with moderate accuracy. “express_single” - Same as express but uses XGB trees as single models

instead of automl TPOT. Good for even more express results.

“production”: Used for making production predictions and benchmarks.: Balances accuracy and timeliness.
“heavy” - When high accuracy is required, and you have access to: (very) powerful computing resources. May be buggier and more difficult to run than production.

DEBUG: “debug” - Debugging with automl enabled. “debug_single” - Debugging with a single model.

Parameters

preset (str) – The name of the preset config you’d like to use.
**powerups –
Various modifications as kwargs. cache_src (str): A file path. If specified, Autofeaturizer will use

feature caching with a file stored at this location. See Autofeaturizer’s cache_src argument for more information.

n_jobs (int): The number of parallel process to use when running.
Particularly important for AutoFeaturixer and TPOTAdaptor.

Return type

dict

Returns

(dict) The desired preset config.

Navigation

automatminer package¶

Subpackages¶

Submodules¶

automatminer.base module¶

automatminer.pipeline module¶

automatminer.presets module¶

Module contents¶