automatminer package

Subpackages

Submodules

automatminer.base module

Base classes, mixins, and other inheritables.

class automatminer.base.DFTransformer

Bases: abc.ABC, sklearn.base.BaseEstimator

A base class to allow easy transformation in the same way as TransformerMixin and BaseEstimator in sklearn, but for pandas dataframes.

When implementing a base class adaptor, make sure to use @check_fitted and @set_fitted if necessary!

abstract fit(df, target, **fit_kwargs)

Fits the transformer to a dataframe, given a target.

Parameters
  • df (pandas.DataFrame) – The pandas dataframe to be fit.

  • target (str) – the target string specifying the ML target.

  • fit_kwargs – Keyword paramters for fitting

Returns

(DataFrameTransformer) This object (self)

fit_transform(df, target)

Combines the fitting and transformation of a dataframe.

Parameters
  • df (pandas.DataFrame) – The pandas dataframe to be fit.

  • target (str) – the target string specifying the ML target.

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

abstract transform(df, target, **transform_kwargs)

Transforms a dataframe.

Parameters
  • df (pandas.DataFrame) – The pandas dataframe to be fit.

  • target (str) – the target string specifying the ML target.

  • transform_kwargs – Keyword paramters for transforming

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

automatminer.pipeline module

The highest level classes for pipelines.

class automatminer.pipeline.MatPipe(autofeaturizer=None, cleaner=None, reducer=None, learner=None)

Bases: automatminer.base.DFTransformer

Establish an ML pipeline for transforming compositions, structures, bandstructures, and DOS objects into machine-learned properties.

The pipeline includes:
  • featurization

  • ml-preprocessing

  • automl model fitting and creation

If you are using MatPipe for benchmarking, use the “benchmark” method.

If you have some training data and want to use MatPipe for production predictions (e.g., predicting material properties for which you have no data) use “fit” and “predict”.

The pipeline is transferrable. It can be fit on one dataset and used to predict the properties of another. The entire pipeline and all constituent objects can be summarized (via “summarize”) or inspected (via “inspect”) in human readable formats.

Note: This pipeline should function the same regardless of which “component” classes it is made out of. E.g. the steps for each method should remain the same whether using the TPOTAdaptor class as the learner or using an SinglePipelineAdaptor class as the learner. To use a preset config, use MatPipe.from_preset(preset) —————————————————————————-

Examples

# A benchmarking experiment, where all property values are known pipe = MatPipe() test_predictions = pipe.benchmark(df, “target_property”)

# Creating a pipe with data containing known properties, then predicting # on new materials pipe = MatPipe() pipe.fit(training_df, “target_property”) predictions = pipe.predict(unknown_df)

# Getting a MatPipe from preset pipe = MatPipe.from_preset(“debug”)

Parameters
  • autofeaturizer (AutoFeaturizer) – The autofeaturizer object used to automatically decorate the dataframe with descriptors.

  • cleaner (DataCleaner) – The data cleaner object used to get a featurized dataframe in ml-ready form.

  • reducer (FeatureReducer) – The feature reducer object used to select the best features from a “clean” dataframe.

  • learner (DFMLAdaptor) – The auto ml adaptor object used to actually run a auto-ml pipeline on the clean, reduced, featurized dataframe.

version

The automatminer version used for serialization and deserialization.

Type

str

The following attributes are set during fitting. Each has their own set
of attributes which defines more specifically how the pipeline works.
pre_fit_df

The dataframe on which the pipeline was fit.

Type

pd.DataFrame

post_fit_df

The dataframe transformed into the ML-ready form.

Type

pd.DataFrame

ml_type

Specifies regression or classification.

Type

str

target

The name of the column where target values are held.

Type

str

benchmark(**kwargs)
fit(**kwargs)
static from_preset(preset='express', **powerups)

Get a preset MatPipe from a string using automatminer.presets.get_preset_config

See get_preeset_config for more inspect.

Parameters
  • preset (str) –

    The preset configuration to use. Current presets are:

    • production

    • express (recommended for most problems)

    • express_single (no AutoML, XGBoost only)

    • heavy

    • debug

    • debug_single (no AutoML, XGBoost only)

  • powerups (kwargs) –

    General upgrades/changes to apply. Current powerups are:

    • cache_src (str): The cache source if you want to save

      features.

    • n_jobs (int): The number of parallel process to use when

      running.

inspect(**kwargs)
static load(filename, supress_version_mismatch=False)

Loads a MatPipe that was saved.

Parameters
  • filename (str) – The pickled MatPipe object (should have been saved using save).

  • supress_version_mismatch (bool) – If False, throws an error when there is a version mismatch between a serialized MatPipe and the current Automatminer version. If True, suppresses this error.

Returns

A MatPipe object.

Return type

pipe (MatPipe)

predict(**kwargs)
save(**kwargs)
summarize(**kwargs)
transform(df, **transform_kwargs)

Transforms a dataframe.

Parameters
  • df (pandas.DataFrame) – The pandas dataframe to be fit.

  • target (str) – the target string specifying the ML target.

  • transform_kwargs – Keyword paramters for transforming

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

automatminer.presets module

Configurations for MatPipe.

automatminer.presets.get_available_presets()

Return all available presets for MatPipes.

Returns

A list of preset names.

Return type

([str])

automatminer.presets.get_preset_config(preset='express', **powerups)

Preset configs for MatPipe.

USER: “express” - Good for quick benchmarks with moderate accuracy. “express_single” - Same as express but uses XGB trees as single models

instead of automl TPOT. Good for even more express results.

“production”: Used for making production predictions and benchmarks.

Balances accuracy and timeliness.

“heavy” - When high accuracy is required, and you have access to

(very) powerful computing resources. May be buggier and more difficult to run than production.

DEBUG: “debug” - Debugging with automl enabled. “debug_single” - Debugging with a single model.

Parameters
  • preset (str) – The name of the preset config you’d like to use.

  • **powerups

    Various modifications as kwargs. cache_src (str): A file path. If specified, Autofeaturizer will use

    feature caching with a file stored at this location. See Autofeaturizer’s cache_src argument for more information.

    n_jobs (int): The number of parallel process to use when running.

    Particularly important for AutoFeaturixer and TPOTAdaptor.

Return type

dict

Returns

(dict) The desired preset config.

Module contents