automatminer.featurization package¶
Subpackages¶
Submodules¶
automatminer.featurization.base module¶
Base classes for sets of featurizers.
- 
class 
automatminer.featurization.base.FeaturizerSet(exclude=None)¶ Bases:
abc.ABCAbstract class for defining sets of featurizers.
All FeaturizerSets should implement at least fours sets of featurizers:
express - The “go-to” set of featurizers
- heavy - A more expensive and complete (though not necessarily
 better) version of express.
all - All featurizers available for the intended featurization type(s)
debug - An ultra-minimal set of featurizers for debugging purposes.
Each set returned is a list of matminer featurizer objects. The choice of featurizers for a given set is at the discrtetion of the implementor.
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
abstract property 
all¶ All featurizers available for this featurization type. These featurizers are allowed to:
have multiple, highly similar versions of the same featurizer,
- not work on standard versions of the input types (e.g., SiteDOS works
 on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).
- Return type
 List[~T]
- 
abstract property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- Return type
 List[~T]
- 
abstract property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- Return type
 List[~T]
automatminer.featurization.core module¶
Classes for automatic featurization and core featurizer functionality.
- 
class 
automatminer.featurization.core.AutoFeaturizer(cache_src=None, preset=None, featurizers=None, exclude=None, functionalize=False, ignore_cols=None, ignore_errors=True, drop_inputs=True, guess_oxistates=True, multiindex=False, do_precheck=True, n_jobs=None, composition_col='composition', structure_col='structure', bandstructure_col='bandstructure', dos_col='dos')¶ Bases:
automatminer.base.DFTransformerAutomatically featurize a dataframe.
Use this object first by calling fit, then by calling transform.
- AutoFeaturizer requires you to specify the column names for each type of
 featurization, or just use the defaults:
“composition”: To use composition features “structure”: To use structure features “bandstructure”: To use bandstructure features “dos”: To use density of states features
The featurizers corresponding to each featurizer type cannot be used if the correct column name is not present.
- Parameters
 cache_src (str) – An absolute path to a json file holding feature information. If file exists, will read features (loc indexwise) from this file instead of featurizing. If this file does not exist, AutoFeaturizer will featurize normally, then save the features to a new file. Only features (not featurizer input objects) will be saved
preset (str) – “express” or “heavy” or “debug” or “all. Determines by preset the featurizers that should be applied. See the Featurizer sets for specifics of each. Default is “express”. Incompatible with the featurizers arg.
featurizers (dict) –
Use this option if you want to manually specify the featurizers to use. Keys are the featurizer types you want applied (e.g., “structure”, “composition”). The corresponding values are lists of featurizer objects you want for each featurizer type.
Example
- {“composition”: [ElementProperty.from_preset(“matminer”),
 EwaldEnergy()]
”structure”: [BagofBonds(), GlobalSymmetryFeatures()]}
exclude ([str]) – Class names of featurizers to exclude. Only used if you use a preset.
ignore_cols ([str]) – Column names to be ignored/removed from any dataframe undergoing fitting or transformation. If columns are not ignored, they may be used later on for learning.
ignore_errors (bool) – If True, each featurizer will ignore all errors during featurization.
drop_inputs (bool) – Drop the columns containing input objects for featurization after they are featurized.
guess_oxistates (bool) – If True, try to decorate sites with oxidation state.
multiiindex (bool) – If True, returns a multiindexed dataframe. Not recommended for use in MatPipe.
do_precheck (bool) – Execute a precheck on each featurizer before featurizing with it. See matminer prechecking for more info.
n_jobs (int) –
The number of parallel jobs to use during featurization for each featurizer. Default is n_cores
composition_col=”composition”,
composition_col (str) – Name of the column containing structures to be featurized.
structure_col (str) – featurized
bandstructure (str) – Name of the column containing bandstructures to be featurized.
dos_col (str) – Name of the column containing density of states obejcts to be featurized.
- 
These attributes are set during fitting 
- 
featurizers¶ Same format as input dictionary in Args. Values contain the actual objects being used for featurization. Featurizers can be removed if check_validity=True and the featurizer is not valid for more than self.min_precheck_frac fraction of the fitting dataset.
- Type
 
- 
fitted_input_df¶ The dataframe which was fitted on
- Type
 pd.DataFrame
- 
converted_input_df¶ The converted dataframe which was fitted on (i.e., strings converted to compositions).
- Type
 pd.DataFrame
- 
removed_featurizers¶ A list of featurizers removed by prechecking methods, if applicable
- Type
 [BaseFeaturizer]
- 
Attributes not set during fitting and not specified by arguments 
- 
min_precheck_frac¶ The minimum fraction of a featuriser’s input that can be valid (via featurizer.precheck(data).
- Type
 
automatminer.featurization.sets module¶
Defines sets of featurizers to be used by automatminer during featurization.
Featurizer sets are classes with attributes containing lists of featurizers. For example, the set of all express structure featurizers could be found with:
StructureFeaturizers().express
- 
class 
automatminer.featurization.sets.AllFeaturizers(exclude=None)¶ Bases:
automatminer.featurization.base.FeaturizerSetFeaturizer set containing all available featurizers.
This class provides subsets for composition, structure, density of states and band structure based featurizers. Additional sets containing all featurizers and the set of express/heavy/etc. featurizers are provided.
Example usage:
composition_featurizers = AllFeaturizers().composition
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
property 
all¶ All featurizers available for this featurization type. These featurizers are allowed to:
have multiple, highly similar versions of the same featurizer,
- not work on standard versions of the input types (e.g., SiteDOS works
 on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).
- 
property 
bandstructure¶ List of all band structure based featurizers.
- 
property 
composition¶ List of all composition based featurizers.
- 
property 
debug¶ An ultra-minimal set of featurizers for debugging.
- 
property 
dos¶ List of all density of states based featurizers.
- 
property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- 
property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- 
property 
structure¶ List of all structure based featurizers.
- 
class 
automatminer.featurization.sets.BSFeaturizers(exclude=None)¶ Bases:
automatminer.featurization.base.FeaturizerSetFeaturizer set containing band structure featurizers.
See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).
Example usage:
bs_featurizers = BSFeaturizers().express
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
property 
all¶ List of all band structure based featurizers.
- 
property 
debug¶ An ultra-minimal set of featurizers for debugging.
- 
property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- 
property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- 
class 
automatminer.featurization.sets.CompositionFeaturizers(exclude=None)¶ Bases:
automatminer.featurization.base.FeaturizerSetFeaturizer set containing composition featurizers.
See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).
Example usage:
best_featurizers = CompositionFeaturizers().express
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
property 
all¶ All featurizers available for this featurization type. These featurizers are allowed to:
have multiple, highly similar versions of the same featurizer,
- not work on standard versions of the input types (e.g., SiteDOS works
 on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).
- 
property 
debug¶ An ultra-minimal set of featurizers for debugging.
- 
property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- 
property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- 
class 
automatminer.featurization.sets.DOSFeaturizers(exclude=None)¶ Bases:
automatminer.featurization.base.FeaturizerSetFeaturizer set containing density of states featurizers.
See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).
Example usage:
dos_featurizers = DOSFeaturizers().express
Density of states featurizers should work on the entire density of states if they are in express or heavy. If they are in “all” they may work on sites or return matrices.
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
property 
all¶ List of all density of states based featurizers.
- 
property 
debug¶ An ultra-minimal set of featurizers for debugging.
- 
property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- 
property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- 
class 
automatminer.featurization.sets.StructureFeaturizers(exclude=None)¶ Bases:
automatminer.featurization.base.FeaturizerSetFeaturizer set containing structure featurizers.
See the FeaturizerSet documentation for inspect of each property (sublist of featurizers).
Example usage:
best_featurizers = StructureFeaturizers().express
- Parameters
 exclude (list of str, optional) – A list of featurizer class names that will be excluded from the set of featurizers returned.
- 
property 
all¶ All featurizers available for this featurization type. These featurizers are allowed to:
have multiple, highly similar versions of the same featurizer,
- not work on standard versions of the input types (e.g., SiteDOS works
 on the DOS for a single site, not structure
return non-vectorized outputs (e.g., matrices, other data types).
- 
property 
debug¶ An ultra-minimal set of featurizers for debugging.
- 
property 
express¶ A focused set of featurizers which should:
be reasonably fast to featurize
be not prone to errors/nans
provide informative learning features
do not include many irrelevant features making ML expensive
have each featurizer return a vector
allow the recognized type (structure, composition, etc.) as input.
- 
property 
heavy¶ A more expensive and complete (though not necessarily better) version of express.
Similar to express, all featurizers selected should return useful learning features. However the selected featurizers may now:
generate many (thousands+) features
be expensive to featurize (1s+ per item)
be prone to NaNs on certain datasets
- 
property 
need_fit¶