automatminer.automl package

Submodules

automatminer.automl.adaptors module

Adaptor classes for using AutoML packages in a Matbench pipeline.

Current adaptor classes are:

TPOTAdaptor: Uses the backend from the automl project TPOT, which can be

found at https://github.com/EpistasisLab/tpot

class automatminer.automl.adaptors.SinglePipelineAdaptor(regressor, classifier)

Bases: automatminer.automl.base.DFMLAdaptor

For running single models or pipelines in a MatPipe pipeline using the same syntax as the AutoML adaptors.

This adaptor should be able to fit into a MatPipe in similar fashion to TPOTAdaptor.

Parameters
  • regressor (sklearn Pipeline or BaseEstimator-like) – The object you want to use for machine learning regression. Must implement fit/predict/transform methods analagously to BaseEstimator, but does not need to be a BaseEstimator or Pipeline.

  • classifier (sklearn Pipeline or BaseEstimator-like) – The object you want to use for machine learning classification.

The following unique attributes are set during fitting.
mode

Either AMM_REG_NAME (regression) or AMM_CLF_NAME (classification)

Type

str

property backend
property best_pipeline
property features
fit(**kwargs)

Wrapper for a method to log.

Parameters

operation (str) – The operation to be logging.

Returns

The method result.

Return type

result

property fitted_target
class automatminer.automl.adaptors.TPOTAdaptor(**tpot_kwargs)

Bases: automatminer.automl.base.DFMLAdaptor

A dataframe adaptor for the TPOT classifiers and regressors.

Parameters
  • tpot_kwargs

    All kwargs accepted by a TPOTRegressor/TPOTClassifier or TPOTBase object.

    Note that for example, you can limit the models that TPOT explores by setting config_dict directly. For example, if you want to only use random forest:

  • = { (config_dict) –

    ‘sklearn.ensemble.RandomForestRegressor’: {

    ‘n_estimators’: [100], ‘max_features’: np.arange(0.05, 1.01, 0.05), ‘min_samples_split’: range(2, 21), ‘min_samples_leaf’: range(1, 21), ‘bootstrap’: [True, False] },

    }

The following unique attributes are set during fitting.
mode

Either AMM_REG_NAME (regression) or AMM_CLF_NAME (classification)

Type

str

best_models

The best model names and their scores.

Type

OrderedDict

backend

The TPOT object interface used for ML training.

Type

TPOTBase

models

The raw sklearn-style models output by TPOT.

Type

OrderedDict

from_serialized

Whether the backend is loaded from a serialized instance. If True, the previous full TPOT data will not be available due to pickling problems.

Type

bool

property backend
property best_models
property best_pipeline
deserialize(**kwargs)
property features
fit(**kwargs)

Wrapper for a method to log.

Parameters

operation (str) – The operation to be logging.

Returns

The method result.

Return type

result

property fitted_target
serialize(**kwargs)

automatminer.automl.base module

Base classes for automl.

class automatminer.automl.base.DFMLAdaptor

Bases: automatminer.base.DFTransformer

A base class to adapt from an AutoML backend to a sklearn-style fit/predict scheme and add a few extensions for pandas dataframes.

When implementing a base class adaptor, make sure to use @check_fitted and @set_fitted if necessary!

abstract property backend

The AutoML backend object. Does not need to implement any methods for compatibility with higher level classes. If no AutoML backend is present e.g., SinglePipelineAdaptor, backend = None.

Does not need to be serializable, as matpipe.save will not save backends.

abstract property best_pipeline

The best ML pipeline found by the backend. Can be any type though BaseEstimator is preferred.

1. MUST implement a .predict method unless DFMLAdaptor.predict is overridden!

  1. MUST be serializable!

Should be as close to the algorithm as possible - i.e., instead of calling TPOTClassifier.fit, calls TPOTClassifier.fitted_pipeline_, so that examining the true form of models is more straightforward.

deserialize(**kwargs)
abstract property features

The features being used for machine learning.

Returns

The feature labels

Return type

([str])

abstract property fitted_target

The target (a string) on which the adaptor was fit on. :returns: The fitted target label. :rtype: (str)

predict(**kwargs)
serialize(**kwargs)
transform(df, target)

Transforms a dataframe.

Parameters
  • df (pandas.DataFrame) – The pandas dataframe to be fit.

  • target (str) – the target string specifying the ML target.

  • transform_kwargs – Keyword paramters for transforming

Returns

The transformed dataframe.

Return type

(pandas.DataFrame)

Module contents