automatminer.featurization package¶

Subpackages¶

automatminer.featurization.tests package

Submodules¶

automatminer.featurization.base module¶

Base classes for sets of featurizers.

class automatminer.featurization.base.FeaturizerSet(exclude=None)¶

Bases: abc.ABC

Abstract class for defining sets of featurizers.

All FeaturizerSets should implement at least fours sets of featurizers:

express - The “go-to” set of featurizers

heavy - A more expensive and complete (though not necessarily
better) version of express.

all - All featurizers available for the intended featurization type(s)

debug - An ultra-minimal set of featurizers for debugging purposes.

Each set returned is a list of matminer featurizer objects. The choice of featurizers for a given set is at the discrtetion of the implementor.

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

abstract property all¶

All featurizers available for this featurization type. These featurizers are allowed to:

have multiple, highly similar versions of the same featurizer,
not work on standard versions of the input types (e.g., SiteDOS works
on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).

Return type: List[~T]

abstract property debug¶

An ultra-minimal set of featurizers for debugging.

Return type: List[~T]

abstract property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

Return type: List[~T]

abstract property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

Return type: List[~T]

automatminer.featurization.core module¶

Classes for automatic featurization and core featurizer functionality.

class automatminer.featurization.core.AutoFeaturizer(cache_src=None, preset=None, featurizers=None, exclude=None, functionalize=False, ignore_cols=None, ignore_errors=True, drop_inputs=True, guess_oxistates=True, multiindex=False, do_precheck=True, n_jobs=None, composition_col='composition', structure_col='structure', bandstructure_col='bandstructure', dos_col='dos')¶

Bases: automatminer.base.DFTransformer

Automatically featurize a dataframe.

Use this object first by calling fit, then by calling transform.

AutoFeaturizer requires you to specify the column names for each type of

featurization, or just use the defaults:

“composition”: To use composition features “structure”: To use structure features “bandstructure”: To use bandstructure features “dos”: To use density of states features

The featurizers corresponding to each featurizer type cannot be used if the correct column name is not present.

Parameters

cache_src (str) – An absolute path to a json file holding feature information. If file exists, will read features (loc indexwise) from this file instead of featurizing. If this file does not exist, AutoFeaturizer will featurize normally, then save the features to a new file. Only features (not featurizer input objects) will be saved
preset (str) – “express” or “heavy” or “debug” or “all. Determines by preset the featurizers that should be applied. See the Featurizer sets for specifics of each. Default is “express”. Incompatible with the featurizers arg.
featurizers (dict) –
Use this option if you want to manually specify the featurizers to use. Keys are the featurizer types you want applied (e.g., “structure”, “composition”). The corresponding values are lists of featurizer objects you want for each featurizer type.

Example

{“composition”: [ElementProperty.from_preset(“matminer”),

EwaldEnergy()]

”structure”: [BagofBonds(), GlobalSymmetryFeatures()]}
exclude ([str]) – Class names of featurizers to exclude. Only used if you use a preset.
ignore_cols ([str]) – Column names to be ignored/removed from any dataframe undergoing fitting or transformation. If columns are not ignored, they may be used later on for learning.
ignore_errors (bool) – If True, each featurizer will ignore all errors during featurization.
drop_inputs (bool) – Drop the columns containing input objects for featurization after they are featurized.
guess_oxistates (bool) – If True, try to decorate sites with oxidation state.
multiiindex (bool) – If True, returns a multiindexed dataframe. Not recommended for use in MatPipe.
do_precheck (bool) – Execute a precheck on each featurizer before featurizing with it. See matminer prechecking for more info.
n_jobs (int) –
The number of parallel jobs to use during featurization for each featurizer. Default is n_cores

composition_col=”composition”,
composition_col (str) – Name of the column containing structures to be featurized.
structure_col (str) – featurized
bandstructure (str) – Name of the column containing bandstructures to be featurized.
dos_col (str) – Name of the column containing density of states obejcts to be featurized.

These attributes are set during fitting

featurizers¶

Same format as input dictionary in Args. Values contain the actual objects being used for featurization. Featurizers can be removed if check_validity=True and the featurizer is not valid for more than self.min_precheck_frac fraction of the fitting dataset.

Type: dict

features¶

The features generated from the application of all featurizers.

Type: dict

auto_featurizer¶

whether the featurizers are set automatically, or passed by the users.

Type: bool

fitted_input_df¶

The dataframe which was fitted on

Type: pd.DataFrame

converted_input_df¶

The converted dataframe which was fitted on (i.e., strings converted to compositions).

Type: pd.DataFrame

removed_featurizers¶

A list of featurizers removed by prechecking methods, if applicable

Type: [BaseFeaturizer]

Attributes not set during fitting and not specified by arguments

min_precheck_frac¶

The minimum fraction of a featuriser’s input that can be valid (via featurizer.precheck(data).

Type: float

fit(**kwargs)¶

Wrapper for a method to log.

Parameters: operation (str) – The operation to be logging.
Returns: The method result.
Return type: result

transform(**kwargs)¶

Wrapper for a method to log.

Parameters: operation (str) – The operation to be logging.
Returns: The method result.
Return type: result

automatminer.featurization.sets module¶

Defines sets of featurizers to be used by automatminer during featurization.

Featurizer sets are classes with attributes containing lists of featurizers. For example, the set of all express structure featurizers could be found with:

StructureFeaturizers().express

class automatminer.featurization.sets.AllFeaturizers(exclude=None)¶

Bases: automatminer.featurization.base.FeaturizerSet

Featurizer set containing all available featurizers.

This class provides subsets for composition, structure, density of states and band structure based featurizers. Additional sets containing all featurizers and the set of express/heavy/etc. featurizers are provided.

Example usage:

composition_featurizers = AllFeaturizers().composition

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

property all¶

All featurizers available for this featurization type. These featurizers are allowed to:

have multiple, highly similar versions of the same featurizer,
not work on standard versions of the input types (e.g., SiteDOS works
on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).

property bandstructure¶: List of all band structure based featurizers.

property composition¶: List of all composition based featurizers.

property debug¶: An ultra-minimal set of featurizers for debugging.

property dos¶: List of all density of states based featurizers.

property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

property structure¶: List of all structure based featurizers.

class automatminer.featurization.sets.BSFeaturizers(exclude=None)¶

Bases: automatminer.featurization.base.FeaturizerSet

Featurizer set containing band structure featurizers.

See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).

Example usage:

bs_featurizers = BSFeaturizers().express

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

property all¶: List of all band structure based featurizers.

property debug¶: An ultra-minimal set of featurizers for debugging.

property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

class automatminer.featurization.sets.CompositionFeaturizers(exclude=None)¶

Bases: automatminer.featurization.base.FeaturizerSet

Featurizer set containing composition featurizers.

See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).

Example usage:

best_featurizers = CompositionFeaturizers().express

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

property all¶

All featurizers available for this featurization type. These featurizers are allowed to:

have multiple, highly similar versions of the same featurizer,
not work on standard versions of the input types (e.g., SiteDOS works
on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).

property debug¶: An ultra-minimal set of featurizers for debugging.

property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

class automatminer.featurization.sets.DOSFeaturizers(exclude=None)¶

Bases: automatminer.featurization.base.FeaturizerSet

Featurizer set containing density of states featurizers.

See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).

Example usage:

dos_featurizers = DOSFeaturizers().express

Density of states featurizers should work on the entire density of states if they are in express or heavy. If they are in “all” they may work on sites or return matrices.

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

property all¶: List of all density of states based featurizers.

property debug¶: An ultra-minimal set of featurizers for debugging.

property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

class automatminer.featurization.sets.StructureFeaturizers(exclude=None)¶

Bases: automatminer.featurization.base.FeaturizerSet

Featurizer set containing structure featurizers.

See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).

Example usage:

best_featurizers = StructureFeaturizers().express

Parameters: exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.

property all¶

All featurizers available for this featurization type. These featurizers are allowed to:

have multiple, highly similar versions of the same featurizer,
not work on standard versions of the input types (e.g., SiteDOS works
on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).

property debug¶: An ultra-minimal set of featurizers for debugging.

property express¶

A focused set of featurizers which should:

be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.

property heavy¶

A more expensive and complete (though not necessarily better) version of express.

Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:

generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets

property need_fit¶

Navigation

automatminer.featurization package¶

Subpackages¶

Submodules¶

automatminer.featurization.base module¶

automatminer.featurization.core module¶

automatminer.featurization.sets module¶

Module contents¶