matminer.data_retrieval package

Submodules

matminer.data_retrieval.retrieve_AFLOW module

matminer.data_retrieval.retrieve_Citrine module

matminer.data_retrieval.retrieve_MDF module

matminer.data_retrieval.retrieve_MP module

class matminer.data_retrieval.retrieve_MP.MPDataRetrieval(api_key=None)

Bases: matminer.data_retrieval.retrieve_base.BaseDataRetrieval

Retrieves data from the Materials Project database.

If you use this data retrieval class, please additionally cite:

Ong, S.P., Cholia, S., Jain, A., Brafman, M., Gunter, D., Ceder, G., Persson, K.A., 2015. The Materials Application Programming Interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science 97, 209–215. https://doi.org/10.1016/j.commatsci.2014.10.037

__init__(api_key=None)
Args:
api_key: (str) Your Materials Project API key, or None if you’ve

set up your pymatgen config.

The link to comprehensive API documentation or data source.

Returns:

(str): A link to the API documentation for this DataRetrieval class.

get_data(criteria, properties, mp_decode=True, index_mpid=True)
Args:
criteria: (str/dict) see MPRester.query() for a description of this

parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}

properties: (list) see MPRester.query() for a description of this

parameter. Example: [“formula”, “formation_energy_per_atom”]

mp_decode: (bool) see MPRester.query() for a description of this

parameter. Whether to decode to a Pymatgen object where possible.

index_mpid: (bool) Whether to set the materials_id as the dataframe

index.

Returns ([dict]):

a list of jsons that match the criteria and contain properties

get_dataframe(criteria, properties, index_mpid=True, **kwargs)

Gets data from MP in a dataframe format. See api_link for more details.

Args:

criteria (dict): the same as in get_data properties ([str]): the same properties supported as in get_data

plus: “structure”, “initial_structure”, “final_structure”, “bandstructure” (line mode), “bandstructure_uniform”, “phonon_bandstructure”, “phonon_ddb”, “phonon_bandstructure”, “phonon_dos”. Note that for a long list of compounds, it may take a long time to retrieve some of these objects.

index_mpid (bool): the same as in get_data kwargs (dict): the same keyword arguments as in get_data

Returns (pandas.Dataframe):

try_get_prop_by_material_id(prop, material_id_list, **kwargs)

Call the relevant get_prop_by_material_id. “prop” is a property such as bandstructure that is not readily available in supported properties of the get_data function but via the get_bandstructure_by_material_id method for example.

Args:
prop (str): the name of the property. Options are:

“bandstructure”, “dos”, “phonon_dos”, “phonon_bandstructure”, “phonon_ddb”

material_id_list ([str]): list of material_id of compounds kwargs (dict): other keyword arguments that get_*_by_material_id

may have; e.g. line_mode in get_bandstructure_by_material_id

Returns ([target prop object or NaN]):

If the target property is not available for a certain material_id, NaN is returned.

matminer.data_retrieval.retrieve_MPDS module

matminer.data_retrieval.retrieve_MongoDB module

class matminer.data_retrieval.retrieve_MongoDB.MongoDataRetrieval(coll)

Bases: matminer.data_retrieval.retrieve_base.BaseDataRetrieval

__init__(coll)

Retrieves data from a MongoDB collection to a pandas.Dataframe object

Args:

coll: A MongoDB collection object

The link to comprehensive API documentation or data source.

Returns:

(str): A link to the API documentation for this DataRetrieval class.

get_dataframe(criteria, properties=None, limit=0, sort=None, idx_field=None, strict=False)
Args:

criteria: (dict) - a pymongo-style query to filter data records properties: ([str] or None) - a list of str fields to retrieve;

dot-notation is allowed (e.g. “structure.lattice.a”). Set to “None” to try to auto-detect the fields.

limit: (int) - max number of entries. 0 means no limit sort: (tuple) - pymongo-style sort option idx_field: (str) - name of field to use as index (must have unique

entries)

strict: (bool) - if False, replaces missing values with NaN

Returns (pandas.DataFrame):

matminer.data_retrieval.retrieve_MongoDB.clean_projection(projection)

Projecting on e.g. ‘a.b.’ and ‘a’ is disallowed in MongoDb, so project inclusively. See unit tests for examples of what this is doing.

Args:

projection: (list) - list of fields to retrieve; dot-notation is allowed.

matminer.data_retrieval.retrieve_MongoDB.is_int(x)
matminer.data_retrieval.retrieve_MongoDB.remove_ints(projection)

Transforms a string like “a.1.x” to “a.x” - for Mongo projection purposes

Args:

projection: (str) the projection to remove ints from

Returns (str)

matminer.data_retrieval.retrieve_base module

class matminer.data_retrieval.retrieve_base.BaseDataRetrieval

Bases: object

Abstract class to retrieve data from various material APIs while adhering to a quasi-standard format for querying.

## Implementing a new DataRetrieval class

If you have an API which you’d like to incorporate into matminer’s data retrieval tools, using BaseDataRetrieval is the preferred way of doing so. All DataRetrieval classes should subclass BaseDataRetrieval and implement the following:

  • get_dataframe()

  • api_link()

Retrieving data should be done by the user with get_dataframe. Criteria should be a dictionary which will be used to form a query to the database. Properties should be a list which defines the columns that will be returned. While the ‘criteria’ and ‘properties’ arguments may have different valid values depending on the database, they should always have sensible formats and names if possible. For example, the user should be calling this:

df = MyDataRetrieval().get_dataframe(criteria={‘band_gap’: 0.0},

properties=[‘structure’])

…or this:

df = MyDataRetrieval().get_dataframe(criteria={‘band_gap’: [0.0, 0.15]},

properties=[“density of states”])

NOT this:

df = MyDataRetrieval().get_dataframe(criteria={‘query.bg[0] && band_gap’: 0.0},

properties=[‘Struct.page[Value]’])

The implemented DataRetrieval class should handle the conversion from a ‘sensible’ query to a query fit for the individual API and database.

There may be cases where a ‘sensible’ query is not sufficient to define a query to the API; in this case, use the get_dataframe kwargs sparingly to augment the criteria, properties, or form of the underlying API query.

A method for accessing raw DB data with an API-native query may be provided by overriding get_data. The link to the original API documentation must be provided by overriding api_link().

## Documenting a DataRetrieval class

The class documentation for each DataRetrieval class must contain a brief description of the possible data that can be retrieved with the API source. It should also detail the form of the criteria and properties that can be retrieved with the class, and/or should link to a web page showing this information. The options of the class must all be defined in the __init__ function of the class, and we recommend documenting them using the [Google style](https://google.github.io/styleguide/pyguide.html).

The link to comprehensive API documentation or data source.

Returns:

(str): A link to the API documentation for this DataRetrieval class.

get_dataframe(criteria, properties, **kwargs)

Retrieve a dataframe of properties from the database which satisfy criteria.

Args:
criteria (dict): The name of each criterion is the key; the value

or range of the criterion is the value.

properties (list): Properties to return from the query matching

the criteria. For example, [‘structure’, ‘formula’]

Returns:
(pandas DataFrame) The dataframe containing properties as columns

and samples as rows.

Module contents