matminer.figrecipes package

Submodules

matminer.figrecipes.plot module

class matminer.figrecipes.plot.PlotlyFig(df=None, mode='offline', title=None, x_title=None, y_title=None, colorbar_title='auto', x_scale='linear', y_scale='linear', ticksize=25, fontscale=1, fontsize=25, fontfamily='Courier', bgcolor='white', fontcolor=None, colorscale='Viridis', height=None, width=None, resolution_scale=None, margins=100, pad=0, username=None, api_key=None, filename='temp-plot', show_offline_plot=True, hovermode='closest', hoverinfo='x+y+text', hovercolor=None)

Bases: object

__init__(df=None, mode='offline', title=None, x_title=None, y_title=None, colorbar_title='auto', x_scale='linear', y_scale='linear', ticksize=25, fontscale=1, fontsize=25, fontfamily='Courier', bgcolor='white', fontcolor=None, colorscale='Viridis', height=None, width=None, resolution_scale=None, margins=100, pad=0, username=None, api_key=None, filename='temp-plot', show_offline_plot=True, hovermode='closest', hoverinfo='x+y+text', hovercolor=None)

Class for making Plotly plots

Args:

Data:
df (DataFrame): A pandas dataframe object which can be used to

generate several plots.

mode: (str)
  1. ‘offline’: creates and saves plots on the local disk

  2. ‘notebook’: to embed plots in IPython/Jupyter notebook,

  3. ‘online’: save the plot in your online plotly account,

(iv) ‘static’: save a static image of the plot locally NOTE: Both ‘online’ and ‘static’ modes require either ‘username’ and ‘api_key’ or Plotly credentials file.

Axes:

title: (str) title of plot x_title: (str) title of x-axis y_title: (str) title of y-axis colorbar_title (str or None): the colorbar (z) title. If set to

“auto” the name of the third column (if pd.Series) is chosen.

x_scale: (str) Sets the x axis scaling type. Select from

‘linear’, ‘log’, ‘date’, ‘category’.

y_scale: (str) Sets the y axis scaling type. Select from

‘linear’, ‘log’, ‘date’, ‘category’.

ticksize: (int) size of ticks in px

Fonts:
fontscale (int/float): The relative scale of the font to the

rest of the plot

fontsize: (int) size of text of plot title and axis titles fontfamily: (str) The HTML font family to use in browser - for

example, “Arial”, or “Times New Roman”. If multiple passed, the list is an order of preference in case fonts are not found on the system.

Colors:

bgcolor: (str) Sets the background color. For example, “grey”. fontcolor: (str) Sets all font colors. For example, “black”. colorscale: (str/list) Sets the colorscale (colormap). See

https://plot.ly/python/colorscales/ for details on what data types are acceptable for color maps. String names of colormaps can also be used, e.g., ‘Jet’ or ‘Viridis’. A useful list of Plotly builtins is: Greys, YlGnBu, Greens, YlOrRd, Bluered, RdBu, Reds, Blues, Picnic, Rainbow, Portland, Jet, Hot, Blackbody, Earth, Electric, Viridis.

Formatting:

height: (float) output height (in pixels) width: (float) output width (in pixels) resolution_scale: (float) Increase the resolution of the image

by scale amount, eg: 3. Only valid for PNG and JPEG.

margins (float or [float]): Specify the margin (in px) with a

list [top, bottom, right, left], or a number which will set all margins.

pad: (float) Sets the amount of padding (in px) between the

plotting area and the axis lines

Plotly:

username: (str) plotly account username api_key: (str) plotly account API key

Offline:

filename: (str) name/filepath of plot file show_offline_plot: (bool) automatically opens the plot offline

Intreractivity:
hovermode: (str) determines the mode of hover interactions. Can

be ‘x’/’y’/’closest’/False

hoverinfo: (str) Determines displayed information on mouseover.

Any combination of “x”, “y”, “z”, “text”, “name” with a “+” OR “all” or “none” or “skip”. Examples: “x”, “y”, “x+y”, “x+y+z”, “all”

hovercolor: (str) The color to set for the hover background.

If None, uses the trace color.

Returns: None

Attributes: These are either fields that Plotly’s ‘layout’ cannot work with directly or are managerial values PlotlyFig uses separate from PlotlyDict.

df (DataFrame): The dataframe which can be used to generate multiple

plots.

mode (str): The plot mode, specified above in the argument. show_offline_plot (bool): If True, opens up plot offline. username (str): The Plotly username api_key (str): The Plotly api key resolution_scale (int/float): Scale up the resolution of static

images proportionally using this parameter.

layout (dict): The dictionary passed to Plotly which specifies

the PlotlyDict ‘layout’ value.

font_style (dict): The general font style, in Plotly syntax. plot_counter (int): The number appended onto generated offline plots colorbar_title (str): The title of the colorbar colorscale (str): See argument documentation above. hoverinfo (str): See argument documentation above. ticksize (int): See argument documentation above.

bar(data=None, cols=None, x=None, y=None, labels=None, barmode='group', colors=None, bargap=None, return_plot=False)

Create a bar chart using Plotly.

Can be used with x and y arguments or with a dataframe (passed as ‘data’ or taken from constructor).

Args:
data (DataFrame): The column names will become the ‘x’ axis. The

rows will become sets of bars (e.g., 3 rows = 3 sets of bars for each x point).

cols ([str]): A list of strings specifying columns of a DataFrame

passed into the constructor to be used as data. Should not be used with ‘data’.

x (list or [list]): A list containing ‘x’ axis values. Can be a list

of lists if there is more than one set of bars.

y (list or [list]): A list containing ‘y’ values. Can be a list of

lists if there is more than one set of bars (more than one set of data for each ‘x’ axis value).

labels (str or [str]): Defines the label for each set of bars. If

str, defines the column of the DataFrame to use for labelling. The column’s entry for a row will be the label for that row. If it is a list of strings, should be used with x and y, and defines the label for each set of bars.

barmode: Defines how sets of bars are displayed. Can be set to

“group” or “stack”.

colors ([str]): The list of colors to use for each set of bars.

The length of this list should be equal to the number of rows (sets of bars) present in your data.

bargap (int/float): Separation between bars. return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns:

A Plotly bar chart object.

create_plot(fig, return_plot=False)

Creates a plotly plot based on its dictionary representation. The modes of plotting are:

  1. offline: Makes an offline html.

  2. notebook: Embeds in Jupyter notebook

  3. online: Send to Plotly, requires credentials

  4. static: Creates a static image of the plot

  5. return: Returns the dictionary representation of the plot.

Args:

fig: (dictionary) contains data and layout information return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns:

A Plotly Figure object (if return_plot = True)

heatmap_basic(data=None, x_labels=None, y_labels=None, colorscale=None, colorscale_range=None, annotations_text=None, annotations_font_size=20, annotations_color='white', return_plot=False)

Make a heatmap plot, either using 2D arrays of values, or a dataframe.

Args:
data: (array) an array of arrays. For example, in case of a pandas

dataframe ‘df’, data=df.values.tolist(). If None, uses the data frame passed into the constructor.

x_labels: (array) an array of strings to label the heatmap columns y_labels: (array) an array of strings to label the heatmap rows colorscale (str/array): See colorscale in __init__. colorscale_range: (array) Sets the minimum (first array item) and

maximum value (second array item) of the colorscale.

annotations_text: (array) an array of arrays, with each value being

a string annotation to the corresponding value in ‘data’

annotations_font_size: (int) size of annotation text annotations_color: (str/array) color of annotation text - accepts

similar formats as other color variables

Returns: A Plotly heatmap plot Figure object.

heatmap_df(data=None, cols=None, x_labels=None, x_nqs=6, y_labels=None, y_nqs=4, precision=1, annotation='count', annotation_color='black', colorscale=None, color_range=None, return_plot=False)

A heatmap which can accept a dataframe as input directly.

Args:

data: (dataframe): only the first 3 numerical columns considered cols ([str]): A list of strings specifying the columns of the

dataframe (either data or self.df) to use. Currenly, only 3 columns is supported. Note that the order in cols matter, the first is considered x, second y and the third as z (color)

x_labels ([str]): labels for the categories in x data (first column) x_nqs (int or None): if unique values for x_prop is more than this,

x_prop is divided into x_nqs quantiles for better presentation *if x_labels is set, x_nqs ignored (i.e. x_nqs = len(x_labels))

y_labels ([str]): similar to x_labels but for the 2nd column in data y_nqs (int or None): similar to x_nqs but for the 2nd column in data precision (int): number of floating points used for binning/display annotation (str or None): mode of annotation. Options are:

None: no annotations “count”: the number of data available in each cell displayed “value”: the actual value of the cell in addition to colorbar

annotation_color (str): the color of annotation (text inside cells) colorscale: see the __init__ doc for colorscale color_range ([min, max]): the range of numbers included in colorbar.

if any number is outside of this range, it will be forced to either one. Note that if colorcol_range is set, the colorbar ticks will be updated to reflect -min or max+ at the two ends.

return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns: A Plotly heatmap plot Figure object.

histogram(data=None, cols=None, orientation='vertical', histnorm='', n_bins=None, bins=None, colors=None, bargap=0, return_plot=False)

Creates a Plotly histogram. If multiple series of data are available, will create an overlaid histogram.

For n_bins, start, end, size, colors, and bargaps, all defaults are Plotly defaults.

Args:
data (DataFrame or list or [list]): A dataframe containing at least

one numerical column. Also accepts lists of numerical values or list of lists of numerical values. If None, uses the dataframe passed into the constructor.

cols ([str]): A list of strings specifying the columns of the

dataframe to use. Each column will be represented with its own histogram in the overlay.

orientation (str): Determines whether histogram is oriented

horizontally or vertically. Use “vertical” or “horizontal”.

histnorm: The technique for creating the plot. Can be “probability

density”, “probability”, “density”, or “” (count).

n_bins (int or [int]): The number of binds to include on each plot.

if only one number specified, all histograms will have the same number of bins

bins (dict or [dict]): specifications of the bins including start,

end and size. If n_bins is set, size cannot be set in bins. Also size is ignored if start or end not specified. Examples: 1) bins=None, n_bins = 25 2) bins={‘start’: 0, ‘end’: 50, ‘size’: 2.0}, n_bins=None

colors (str or list): The list of colors for each histogram (if

overlaid). If only one series of data is present or all series should have the same value, a single str determines the color of the bins.

bargaps (float or list): The gaps between bars for all histograms

shown.

return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns:

Plotly histogram figure.

parallel_coordinates(data=None, cols=None, line=None, precision=2, colors=None, return_plot=False)

Create a Plotly Parcoords plot from dataframes.

Args:
data (DataFrame or list): A dataframe containing at least

one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.

cols ([str]): A list of strings specifying the columns of the

dataframe to use.

colors (str): The name of the column to use for the color bar. line (dict): plotly line dict with keys such as “color” or “width” precision (int): the number of floating points for columns with

float data type (2 is recommended for a nice visualization)

return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns:

a Plotly parallel coordinates plot.

scatter_matrix(data=None, cols=None, colors=None, marker=None, labels=None, marker_scale=1.0, return_plot=False, default_color='#98AFC7', **kwargs)

Create a Plotly scatter matrix plot from dataframes using Plotly. Args:

data (DataFrame or list): A dataframe containing at least

one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.

cols ([str]): A list of strings specifying the columns of the

dataframe to use.

colors: (str) name of the column used for colorbar marker (dict): if size is set, it will override the automatic size return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

labels (see PlotlyFig.xy_plot documentation): default_color (str): default marker color. Ignored if colors is

set. Histograms color is always set by this default_color.

**kwargs: keyword arguments of scatterplot. Forbidden args are

‘size’, ‘color’ and ‘colorscale’ in ‘marker’. See example below

Returns: a Plotly scatter matrix plot

# Example for more control over markers: from matminer.figrecipes.plotly.make_plots import PlotlyFig from matminer.datasets.dataframe_loader import load_elastic_tensor df = load_elastic_tensor() pf = PlotlyFig() pf.scatter_matrix(df[[‘volume’, ‘G_VRH’, ‘K_VRH’, ‘poisson_ratio’]],

colorcol=’poisson_ratio’, text=df[‘material_id’], marker={‘symbol’: ‘diamond’, ‘size’: 8, ‘line’: {‘width’: 1, ‘color’: ‘black’}}, colormap=’Viridis’, title=’Elastic Properties Scatter Matrix’)

set_arguments(**kwargs)

Method to modify some of the layout and PlotlyFig arguments after instantiation.

Allowed arguments: title, x_title, y_title, colorbar_title, filename, mode, api_key, username, show_offline_plot

Args:

**kwargs: allowed variables to change are listed below:

Returns: None

setup_labels(labels, data, expected_length=None)
Set the input labels to the appropriate format to support labeling of

each data point with one or multiple labels that shows upon hovering over the point (Plotly default behavior).

Args:

labels (str or [str] or [list]): see the docs for labels in xy data (DataFrame or list): A dataframe containing at least

one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.

expected_length (int): the expected length of the rows/labels. This

is len(data) if data is dataframe and length of axes

Returns ([list]): list of labels each with the expected length

triangle(data=None, cols=None, sum_of_3=1.0, axes_titles=None, labels=None, markers=None, return_plot=False)

Phase diagram type plot for 3 (and only 3) variables that always add to a certain number (e.g. 1 or 100%); regardless the rows are separately normalized inside plotly so that they add to 1 as otherwise, triangle plot does not make sense.

Args:

data: (dataframe): if not set, self.df is used cols ([str]): A list of strings specifying the 3 columns of the

dataframe (either data or self.df) to plot the triangle plot for. Note that the order of 3 axes is decided based on the order of cols.

sum_of_3 (int/float): scale the sum of cols to this number. axes_titles ([str]): titles of the 3 axes, this overrides the

dataframe column names. Note that if set, axes_titles must be of the length 3. Examples:

[‘A’, ‘B’, ‘X’] [‘title 1’, ‘’, ‘title 2] (i.e. no title for the 2nd axis)

labels (str or [str] or [list]): to set annotation for scatter

points the same for all traces. Note that, several column names can be simultaneously used as labels but it is important to understand that when labels is set, it is assumed that all traces have the same length as the same labels are assigned to

markers (None or dict): plotly marker dict with keys such as size,

symbol, color, line, etc

return_plot (bool): Returns the dictionary representation of the

figure if True.

Returns: A Plotly triangle plot Figure object.

violin(data=None, cols=None, use_colorscale=False, rugplot=False, group_col=None, groups=None, colorscale=None, return_plot=False)

Create a violin plot using Plotly.

Args:
data: (DataFrame/list) A dataframe containing at least one

numerical column. Also accepts lists/arrays of numerical values, using columns as separate variables (distributions are down rows). If None, uses the dataframe passed into the constructor.

cols: ([str]) The labels for the columns of the dataframe to be

included in the plot. If data is passed as a list/array, pass a list of cols to be used as labels for the violins.

rugplot: (bool) If True, plots the distribution of the data next

to the violin with a ‘rugplot’.

group_col: (str) Name of the column containing the group for each

row, if it exists. Used only if there is one entry in cols.

groups: ([str]): All group names to be included in the violin plot.

Used only if there is one entry in cols.

colorscale: (str/tuple/list/dict) either a plotly scale name (Greys,

YlGnBu, Greens, etc.), an rgb or hex color, a color tuple, a list/dict of colors. The color is representative of the median value of the violin.

use_colorscale: (bool) Only applicable if grouping by another

variable. Will implement a colorscale based on the first 2 colors of param colors. This means colors must be a list with at least 2 colors in it (Plotly colorscales are accepted since they map to a list of two rgb colors)

return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns: A Plotly violin plot Figure object.

xy(xy_pairs, colors=None, color_range=None, labels=None, limits=None, names=None, sizes=None, modes='markers', markers=None, marker_scale=1.0, lines=None, colorscale=None, showlegends=None, error_bars=None, normalize_size=True, return_plot=False)

Make an XY scatter plot, either using arrays of values, or a dataframe.

Args:
xy_pairs (tuple or [tuple]): each tuple in the list of tuples

is a trace on xy scatter plot. Each tuple contains a pair of x & y lists with the same length. example: ([1, 2], [2, 4]) # one trace, one tuple example: [([1,2,3], [2,4,6]), ([1,3], [2.5,5.5])] # 2 traces example: [(df[‘x1’], df[‘y1’]), (df[‘x2’], df[‘y2’])] example: [(‘x1’, ‘y1’), (‘x2’, ‘y2’), (‘x1’, ‘y2’)] # 3 traces

colors (list or np.ndarray or pd.Series): set the colors for traces

It can also be used to set the colors of the markers shown in the colorbar (list of numbers); overwrites marker[‘color’] and will override colorscales if trace colors are specified as strings. example: “red” # all traces and lines will be red example: ‘GDP’ or df[‘GDP’] # colorscale based on GDP (if

available in self.df or df respectively)

example: [“green”, “GDP”] # trace 1 is green and the markers of

trace 2 are colored based on GDP

color_range ([min, max]): the range of numbers included in colorbar.

if any number is outside of this range, it will be forced to either one. Note that if colorcol_range is set, the colorbar ticks will be updated to reflect -min or max+ at the two ends.

labels (str or [str] or [list]): to set annotation for scatter

points the same for all traces. Note that, several column names can be simultaneously used as labels but it is important to understand that when labels is set, it is assumed that all traces have the same length as the same labels are assigned to all traces (if there are more than one trace of course).

Examples:

labels = ‘formula’ [‘material_id’, ‘formula’] these 2 columns must be available [[‘red’, ‘green’, ‘blue’], [‘warm’, ‘mild’, ‘cold’]] the latter example assumes all xy traces have 3 points then point one has (‘red’, ‘warm’) label, 2 has (‘green’, ‘mild’) and finally point 3 (‘blue’, ‘cold’)

limits (dict): The x and y limits defining the ranges the plot will

show. Should be in the form {‘x’: (lower, higher), ‘y’: (lower, higher)}. Omit either key to prevent limits from being imposed on that axis.

names (str or [str]): list of trace names used for legend. By

default column name (or trace if NA) used if pd.Series passed

sizes (str, float, [float], [list]). Options:

str: column name in data with list of numbers used for marker size float: a single size used for all traces in xy_pairs [float]: list of fixed sizes used for traces (length==len(xy_pairs)) [list]: list of list of sizes for each trace in xy_pairs

modes (str or [str]): trace style; can be ‘markers’, ‘lines’ or

‘lines+markers’.

markers (dict or [dict]): gives the ability to fine tune marker

of each scatter plot individually if list of dicts passed. Note that the key “size” is forbidden in markers. Use sizes arg instead.

lines (dict or [dict]: similar to markers though only if mode==’lines’ showlegends (bool or [bool]): indicating whether to show legend

for each trace (or simply turn it on/off for all if not list)

error_bars ([str or list]): numbers used for error bars in the y

direction. String input is interpreted as dataframe column name

normalize_size (bool): if True, normalize the size list. return_plot (bool): Returns the dictionary representation of the

figure if True. If False, prints according to self.mode (set with mode in __init__).

Returns: A Plotly Scatter plot Figure object.

Module contents