Basic Usage ================== Basic usage of Automatminer requires using only one class - :code:`MatPipe`. :code:`MatPipe` works with **pandas dataframes as input and output**. It is able to train on training data using it's :code:`fit` method, predict on new data using :code:`predict`, and run benchmarks using :code:`benchmark` - all in an automatic and end-to-end fashion. Materials primitives (e.g., crystal structures) go in one end, and property predictions come out the other. :code:`MatPipe` handles the intermediate operations such as assigning descriptors, cleaning problematic data, data conversions, imputation, and machine learning. This is just a quick overview of the basic functionality. For a detailed and comprehensive tutorial, see the jupyter notebooks in the automatminer directory of the `matminer_examples `_ repository. Initializing a pipeline ----------------------- The easiest way to initialize a matpipe is using a preset. .. code-block:: python from automatminer import MatPipe pipe = MatPipe.from_preset("express") This preset is a set of options specifying exactly how each of :code:`MatPipe`'s constituent classes are set up. Typically, the "express" preset will give you results with a moderate degree of accuracy and relatively quick training, so we'll use that here. Note: The default :code:`MatPipe()` is equivalent to :code:`MatPipe.from_preset("express")`; other presets have different configuration options! Training a pipeline --------------------- MatPipe has similar fit/transform syntax to scikit-learn. Your dataframe might be of the form: :code:`train_df` .. list-table:: :align: left :header-rows: 1 * - :code:`structure` - :code:`my_property` * - :code:`` - 0.3819 * - :code:`` - -0.1123 * - :code:`` - -0.091 * - ... - ... Where the structure column contains :code:`pymatgen` :code:`Structure` objects and the property column is the property you are interested in (the target). Use :code:`fit` to train, and specify the target column. For the dataframe we used above, you'd do: .. code-block:: python from automatminer import MatPipe pipe = MatPipe.from_preset("express") # Fitting pipe on train_df using "my_property" as target pipe.fit(train_df, "my_property") The MatPipe is now fit and can be used to make predictions on new data! Making predictions ------------------- Once the pipeline is fit, we can make predictions on out-of-sample data, provided that data has the same input types that our pipeline was trained on. For example: :code:`prediction_df` .. list-table:: :align: left :header-rows: 1 * - :code:`structure` * - :code:`` * - :code:`` * - :code:`` * - ... Use :code:`predict` to predict new data. .. code-block:: python from automatminer import MatPipe pipe = MatPipe.from_preset("express") pipe.fit(train_df, "my_property") # Predicting my_property values of some unknown prediction_df structures prediction_df = pipe.predict(prediction_df) The output will be stored in a column called :code:`" predicted"`. :code:`prediction_df` .. list-table:: :align: left :header-rows: 1 * - :code:`structure` - :code:`my_property predicted` * - :code:`` - 0.449 * - :code:`` - -0.573 * - :code:`` - -0.005 * - ... - ... Using different presets ----------------------- You can try out different configurations - such as more intensive featurization routines, quicker training, etc. by initializing MatPipe with a different config. The "heavy" preset typically includes more CPU-intensive featurization and longer training times. .. code-block:: python from automatminer import MatPipe pipe = MatPipe.from_preset("heavy") In contrast, use "debug" if you want very quick predictions. .. code-block:: python from automatminer import MatPipe pipe = MatPipe.from_preset("debug") Saving your pipeline for later ------------------------------ Once fit, you can save your pipeline as a pickle file: .. code-block:: python pipe.save("my_pipeline.p") To load your file, use the :code:`MatPipe.load` static method. .. code-block:: python pipe = MatPipe.load("my_pipeline.p") Examine your pipeline --------------------- **Summarize** For a human-readable executive summary of your pipeline, use :code:`MatPipe.summarize()`. .. code-block:: python summary = pipe.summarize() The dict returned by summarize specifies the top-level information as strings. An analogy: if your pipeline were a plumbing system, :code:`summarize` would tell you how long each section of pipe is and the pump model. **Inspect** To get comprehensive details on a pipeline, use :code:`MatPipe.inspect()`. .. code-block:: python details = pipe.inspect() Inspection specifies all parameters to all Automatminer objects needed to construct the pipeline and all of its internal operations. In contrast to the summary which provides a more human interpretable digest, inspection generates the true attribute names and values of each object in the MatPipe heirarchy. It is typically very long, though human readable. An analogy: if your pipeline were a plumbing system, :code:`inspect` would tell you everything :code:`summarize` tells you, plus the model numbers of all the bolts, joints, and valves. **Save to a file** Both :code:`summarize` and :code:`inspect` accept a filename argument if you'd like to save their outputs to JSON, YAML, or text. .. code-block:: python summary = pipe.summarize("my_summary.yaml") details = pipe.inspect("my_details.json") Monitoring the log ------------------ The Automatminer log is a powerful tool for determining what is happening within the pipeline in real time. We recommend you monitor it closely as the pipeline runs. In addition to the stdout, automatminer writes a log file in the current working directory (:code:`automatminer.log`, timestamped if duplicates). Here's an example of an automatminer log when fitting on a dataset. .. code-block:: 2019-10-11 16:05:41 INFO Problem type is: regression 2019-10-11 16:05:41 INFO Fitting MatPipe pipeline to data. 2019-10-11 16:05:41 INFO AutoFeaturizer: Starting fitting. 2019-10-11 16:05:41 INFO AutoFeaturizer: Adding compositions from structures. ... 2019-10-11 16:05:47 INFO DataCleaner: Handling feature na by max na threshold of 0.01 with method 'drop'. 2019-10-11 16:05:47 INFO DataCleaner: After handling na: 636 samples, 168 features 2019-10-11 16:05:47 INFO DataCleaner: Finished fitting. 2019-10-11 16:05:47 INFO FeatureReducer: Starting fitting. 2019-10-11 16:05:47 INFO FeatureReducer: 57 features removed due to cross correlation more than 0.95 2019-10-11 16:05:49 INFO TreeFeatureReducer: Finished tree-based feature reduction of 110 initial features to 13 2019-10-11 16:05:49 INFO FeatureReducer: Finished fitting. 2019-10-11 16:05:49 INFO FeatureReducer: Starting transforming. 2019-10-11 16:05:49 INFO FeatureReducer: Finished transforming. 2019-10-11 16:05:49 INFO TPOTAdaptor: Starting fitting. 2019-10-11 16:07:50 INFO TPOTAdaptor: Finished fitting. 2019-10-11 16:07:50 INFO MatPipe successfully fit. If you see :code:`WARNING` or :code:`ERROR`, you should inspect the pipeline to make sure everything is configured as intended. If you see a :code:`CRITICAL`, it is likely something is misconfigured within the pipeline and should be looked into in detail! Quick reminders --------------- **A quick note**: Default MatPipe configs automatically infer the type of pymatgen object from the dataframe column name: e.g., "composition" = :code:`pymatgen.Composition`, "structure" = :code:`pymatgen.Structure`, "bandstructure" = :code:`pymatgen.electronic_structure.bandstructure.BandStructure`, "dos" = :code:`pymatgen.electronic_structure.dos.DOS`. **Make sure your dataframe has the correct name for its input!** If you want to use custom names, see the advanced usage page.