automatminer package¶
Subpackages¶
- automatminer.automl package
- automatminer.featurization package
- automatminer.preprocessing package
- automatminer.tests package
- automatminer.utils package
Submodules¶
automatminer.base module¶
Base classes, mixins, and other inheritables.
-
class
automatminer.base.
DFTransformer
¶ Bases:
abc.ABC
,sklearn.base.BaseEstimator
A base class to allow easy transformation in the same way as TransformerMixin and BaseEstimator in sklearn, but for pandas dataframes.
When implementing a base class adaptor, make sure to use @check_fitted and @set_fitted if necessary!
-
abstract
fit
(df, target, **fit_kwargs)¶ Fits the transformer to a dataframe, given a target.
- Parameters
df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
fit_kwargs – Keyword paramters for fitting
- Returns
(DataFrameTransformer) This object (self)
-
fit_transform
(df, target)¶ Combines the fitting and transformation of a dataframe.
- Parameters
df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
- Returns
The transformed dataframe.
- Return type
(pandas.DataFrame)
-
abstract
transform
(df, target, **transform_kwargs)¶ Transforms a dataframe.
- Parameters
df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
transform_kwargs – Keyword paramters for transforming
- Returns
The transformed dataframe.
- Return type
(pandas.DataFrame)
-
abstract
automatminer.pipeline module¶
The highest level classes for pipelines.
-
class
automatminer.pipeline.
MatPipe
(autofeaturizer=None, cleaner=None, reducer=None, learner=None)¶ Bases:
automatminer.base.DFTransformer
Establish an ML pipeline for transforming compositions, structures, bandstructures, and DOS objects into machine-learned properties.
- The pipeline includes:
featurization
ml-preprocessing
automl model fitting and creation
If you are using MatPipe for benchmarking, use the “benchmark” method.
If you have some training data and want to use MatPipe for production predictions (e.g., predicting material properties for which you have no data) use “fit” and “predict”.
The pipeline is transferrable. It can be fit on one dataset and used to predict the properties of another. The entire pipeline and all constituent objects can be summarized (via “summarize”) or inspected (via “inspect”) in human readable formats.
Note: This pipeline should function the same regardless of which “component” classes it is made out of. E.g. the steps for each method should remain the same whether using the TPOTAdaptor class as the learner or using an SinglePipelineAdaptor class as the learner. To use a preset config, use MatPipe.from_preset(preset) —————————————————————————-
Examples
# A benchmarking experiment, where all property values are known pipe = MatPipe() test_predictions = pipe.benchmark(df, “target_property”)
# Creating a pipe with data containing known properties, then predicting # on new materials pipe = MatPipe() pipe.fit(training_df, “target_property”) predictions = pipe.predict(unknown_df)
# Getting a MatPipe from preset pipe = MatPipe.from_preset(“debug”)
- Parameters
autofeaturizer (AutoFeaturizer) – The autofeaturizer object used to automatically decorate the dataframe with descriptors.
cleaner (DataCleaner) – The data cleaner object used to get a featurized dataframe in ml-ready form.
reducer (FeatureReducer) – The feature reducer object used to select the best features from a “clean” dataframe.
learner (DFMLAdaptor) – The auto ml adaptor object used to actually run a auto-ml pipeline on the clean, reduced, featurized dataframe.
-
The following attributes are set during fitting. Each has their own set
-
of attributes which defines more specifically how the pipeline works.
-
pre_fit_df
¶ The dataframe on which the pipeline was fit.
- Type
pd.DataFrame
-
post_fit_df
¶ The dataframe transformed into the ML-ready form.
- Type
pd.DataFrame
-
benchmark
(**kwargs)¶
-
fit
(**kwargs)¶
-
static
from_preset
(preset='express', **powerups)¶ Get a preset MatPipe from a string using automatminer.presets.get_preset_config
See get_preeset_config for more inspect.
- Parameters
preset (str) –
The preset configuration to use. Current presets are:
production
express (recommended for most problems)
express_single (no AutoML, XGBoost only)
heavy
debug
debug_single (no AutoML, XGBoost only)
powerups (kwargs) –
General upgrades/changes to apply. Current powerups are:
- cache_src (str): The cache source if you want to save
features.
- n_jobs (int): The number of parallel process to use when
running.
-
inspect
(**kwargs)¶
-
static
load
(filename, supress_version_mismatch=False)¶ Loads a MatPipe that was saved.
- Parameters
- Returns
A MatPipe object.
- Return type
pipe (MatPipe)
-
predict
(**kwargs)¶
-
save
(**kwargs)¶
-
summarize
(**kwargs)¶
-
transform
(df, **transform_kwargs)¶ Transforms a dataframe.
- Parameters
df (pandas.DataFrame) – The pandas dataframe to be fit.
target (str) – the target string specifying the ML target.
transform_kwargs – Keyword paramters for transforming
- Returns
The transformed dataframe.
- Return type
(pandas.DataFrame)
automatminer.presets module¶
Configurations for MatPipe.
-
automatminer.presets.
get_available_presets
()¶ Return all available presets for MatPipes.
- Returns
A list of preset names.
- Return type
([str])
-
automatminer.presets.
get_preset_config
(preset='express', **powerups)¶ Preset configs for MatPipe.
USER: “express” - Good for quick benchmarks with moderate accuracy. “express_single” - Same as express but uses XGB trees as single models
instead of automl TPOT. Good for even more express results.
- “production”: Used for making production predictions and benchmarks.
Balances accuracy and timeliness.
- “heavy” - When high accuracy is required, and you have access to
(very) powerful computing resources. May be buggier and more difficult to run than production.
DEBUG: “debug” - Debugging with automl enabled. “debug_single” - Debugging with a single model.
- Parameters
preset (str) – The name of the preset config you’d like to use.
**powerups –
Various modifications as kwargs. cache_src (str): A file path. If specified, Autofeaturizer will use
feature caching with a file stored at this location. See Autofeaturizer’s cache_src argument for more information.
- n_jobs (int): The number of parallel process to use when running.
Particularly important for AutoFeaturixer and TPOTAdaptor.
- Return type
- Returns
(dict) The desired preset config.