rocketsled package

Submodules

rocketsled.acq module

Acquisition functions and utilities.

rocketsled.acq.acquire(acq, Y, mu, std, return_means=False)
rocketsled.acq.ei(fmin, mu, std, xi=0.01)

Returns expected improvement values.

Args:

fmin (float): Minimum value of the objective function known thus far. mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for

each y.

xi (float): Amount of expected improvement, optional hyper-parameter.

Default value taken from “Practical bayesian optimization” by Daniel Lizotte (2008).

Returns:

vals (numpy array): Acquisition values.

rocketsled.acq.lcb(fmin, mu, std, kappa=1.96)
Returns lower confidence bound estimates.
fmin (float): (not used): Minimum value of the objective function known

thus far.

mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for

each y.

kappa (float): Controls the variance in the prediction,

affecting exploration/exploitation.

Returns:

vals (numpy array): Acquisition values.

rocketsled.acq.pi(fmin, mu, std, xi=0.01)

Returns probability of improvement values.

Args:

fmin (float): Minimum value of the objective function known thus far. mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for

each y.

xi (float): Amount of expected improvement, optional hyper-parameter.

Default value taken from “Practical bayesian optimization” by Daniel Lizotte (2008).

Returns:

vals (numpy array): Acquisition values.

rocketsled.acq.ppredict(X, Y, space, model)

Run a split and fit on a random subsample of the entire explored X. Use this fitted model to predict the remaining space. Meant to be run in parallel in combination with joblib’s delayed and Parallel utilities.

Args:

X ([list]): A list of x vectors, for training. Y (list): A list of scalars, for training. space ([list]): A list of possible X vectors, yet to be explored. This

is the ‘test’ set.

model (BaseEstimator object): sklearn estimator object. Must have .fit

and .predict methods.

Returns:
(numpy array): The 1-D array of predicted points for the entire

remaining space.

rocketsled.acq.predict(X, Y, space, model, nstraps)

rocketsled.control module

A class to configure, manage, and analyze optimizations. Similar to the LaunchPad for FireWorks.

class rocketsled.control.MissionControl(launchpad, opt_label)

Bases: object

A class for configuring and controlling rocketsled optimization.

Args:
launchpad (LaunchPad): The launchpad to use for storing optimization

information.

opt_label (str): The name of the collection where Rocketsled should

keep optimization data (in the same db as the LaunchPad). Please use a new collection (ie no other documents are present in the collection).

__init__(launchpad, opt_label)
configure(wf_creator, dimensions, **kwargs)

Set up the optimization config. Required before using OptTask, but only needs to be done once. To reconfigure, use MissionControl.reset and then use configure again.

Defaults can be found in defaults.yaml.

Args:

wf_creator (function or str): The function object that creates the

workflow based on a unique vector, x. Alternatively, the full string module path to that function, e.g. “mypkg.mymodule.my_wf_creator”, which must importable and found in PYTHONPATH.

dimensions ([tuple]): each 2-tuple in the list defines one dimension in

the search space in (low, high) format. For categorical or discontinuous dimensions, includes all possible categories or values as a list of any length or a tuple of length>2. Example: dimensions = dim = [(1,100), (9.293, 18.2838), (“red”, “blue”, “green”)].

**kwargs: Keyword arguments for defining the optimization. A full list

of possible kwargs is given below:

Optimization data: opt_label (str): The label to use for this collection of

optimization data.

Workflow creator function: wf_creator_args (list): the positional args to be passed to the

wf_creator function alongsize the new x vector

wf_creator_kwargs (dict): details the kwargs to be passed to the

wf_creator function alongside the new x vector

Predictors (optimization): predictor (function or str): a function which given a list of

searched points and unsearched points, returns an optimized guess.

To use a builtin predictor, pass in one of:

‘GaussianProcessRegressor’, ‘RandomForestRegressor’, ‘ExtraTreesRegressor’, ‘GradientBoostingRegressor’, ‘random’ (random guess)

The default is ‘GaussianProcessRegressor’

To use a custom predictor, pass in the function object. Alternatively, the full string module path to that function, e.g. “mypkg.mymodule.my_predictor”, which must importable and found in PYTHONPATH. Example builtin predictor: ‘GaussianProcessRegressor’ Example custom predictor: my_predictor Example custom predictor 2: ‘my_pkg.my_module.my_predictor’

predictor_args (list): the positional args to be passed to the model

along with a list of points to be searched. For sklearn-based predictors included in OptTask, these positional args are passed to the init method of the chosen model. For custom predictors, these are passed to the chosen predictor function alongside the searched guesses, the output from searched guesses, and an unsearched space to be used with optimization.

predictor_kwargs (dict): the kwargs to be passed to the model.

Similar to predictor_args.

n_search_pts (int): The number of points to be searched in the

search space when choosing the next best point. Choosing more points to search may increase the effectiveness of the optimization but take longer to evaluate. The default is 1000.

n_train_pts (int): The number of already explored points to be

chosen for training. Default is None, meaning all available points will be used for training. Reduce the number of points to decrease training times.

n_bootstraps (int): The number of times each optimization should,

sample, train, and predict values when generating uncertainty estimates for prediction. At least 10 data points must be present for bootstrapping. Not used if: acq not specified, custom predictor used, or GaussianProcessRegressor used.

acq (str): The acquisition function to use. Can be ‘ei’ for expected

improvement, ‘pi’ for probability of improvement, or ‘lcb’ for lower confidence bound, or None for greedy selection. Only works with builtin predictors.

space_file (str): The fully specified path of a pickle file

containing a list of all possible searchable vectors. For example ‘/Users/myuser/myfolder/myspace.p’. When loaded, this space_file should be a list of tuples.

onehot_categorical (bool): If True, preprocesses categorical data

(strings) to one-hot encoded binary arrays for use with custom predictor functions. Default False.

duplicate_check (bool): If True, checks that custom optimizers are

not making duplicate guesses; all built-in optimizers cannot duplicate guess. If the custom predictor suggests a duplicate, OptTask picks a random guess out of the remaining untried space. Default is no duplicate check, and an error is raised if a duplicate is suggested.

tolerances (list): The tolerance of each feature when duplicate

checking. For categorical features, put ‘None’ Example: Our dimensions are [(1, 100), [‘red’, ‘blue’], (2.0, 20.0)]. We want our first parameter to be a duplicate only if it is exact, and our third parameter to be a duplicate if it is within 1e-6. Then:

tolerances=[0, None, 1e-6]

maximize (bool): If True, maximizes the objective function instead

of minimizing. Defaults to False, meaninng minimze.

z-vector features: get_z (function or str): the fully-qualified name of a function

(or function object itself) which, given an x vector, returns another vector z which provides extra information to the machine learner. The features defined in z are not used to run the workflow, but are used for learning. If z_features are enabled, ONLY z features will be used for learning (x vectors essentially become tags or identifiers only) Examples:

get_z = ‘my_pkg.my_module.my_fun’ get_z = ‘/path/to/folder/containing/my_dir/my_module.my_fun’

get_z_args (list): the positional arguments to be passed to the

get_z function alongside x

get_z_kwargs (dict): the kwargs to be passed to the get_z function

alongside x

z_file (str): The filename (pickle file) where OptTask should save

/cache z calculations. Specify this argument if calculating z for many (n_search_pts) is not trivial and will cost time in computing. With this argument specified, each z will only be calculated once. Defaults to None, meaning that all unexplored z are re-calculated each iteration. Example:

z_file = ‘/path/to/z_guesses.p’

Parallelism: enforce_sequential (bool): WARNING: Experimental feature! If True,

enforces that RS optimizations are run sequentially (default), which prevents duplicate guesses from ever being run. If False, allows OptTasks to run optimizations in parallel, which may cause duplicate guesses with high parallelism.

batch_size (int): The number of jobs to submit per batch for a batch

optimization. For example, batch_size=5 will optimize every 5th job, then submit another 5 jobs based on the best 5 predictions (recomputing the acquisition function after each prediction).

timeout (int): The number of seconds to wait before resetting the

lock on the db.

Returns:

None: If you want to run the OptTask workflow, you’ll need to pass in the launchpad and opt_label arguments in your wf_creator.

fetch_matrices(include_reserved=False)

Return the X and Y matrices for this optimization.

Args:
include_reserved (bool): If True, returns “reserved” guesses (those

which have been submitted to the launchpad but have not been successfully run). y values for these guesses are “reserved”.

Returns:
all_x, all_y ([list], [list]): The X (input) matrix has dimensions

n_samples, n_dimensions. The Y (output) matrix has dimensions n_samples, n_objectives. Only completed entries are retrieved.

plot(show_best=True, show_mean=True, latexify=False, font_family='serif', scale='linear', summarize=True, print_pareto=False)

Visualize the progress of an optimization.

Args:
show_best (bool): Point out the best point on legend and on plot. If

more than one best point (i.e., multiple equal maxima), show them all. If multiobjective, shows best for each objective, and prints the best value and x for each objective.

show_mean (bool): Show the mean and standard deviation for the

guesses as the computations are carried out.

latexify (bool): Use LaTeX for formatting. font_family (str): The font family to use for rendering. Choose from

‘serif’, ‘sans-serif’, ‘fantasy’, ‘monospace’, or ‘cursive’.

scale (str): Whether to scale the plot’s y axis according to log

(‘log’) or ‘linear’ scale.

summarize (bool): If True, stdouts summary from .summarize. print_pareto (bool): If True, display all Pareto-optimal objective

values.

Returns:

A matplotlib plot object handle

reset(hard=False)

Reset (delete) this optimization configuration and/or collection.

Soft reset (hard=False): Delete the configuration, but keep the

optimization data. This is useful if you are changing optimizers and want to keep the previous data (recommended)

Hard reset (hard=True): Delete all data from the collection, including

optimizatiomn data. WARNING - THIS OPTION IS NOT REVERSIBLE!

Args:
hard (bool): Whether to do a hard or soft reset. If False, deletes

only the configuration, leaving the previously stored optimization data. If True, deletes everything from the optimization collection.

Returns:

None

summarize()

Returns stats about the optimization collection and checks consistency of the collection.

Returns:

fmtstr (str): The formatted information from the analysis, to print.

property task

Return a preconfigured OptTask which can be inserted into a workflow. Make sure to run .configure before using this task, otherwise your workflow optimization might not work!

Returns:

OptTask: An OptTask object.

rocketsled.task module

The FireTask for running automatic optimization loops.

Please see the documentation for a comprehensive guide on usage.

class rocketsled.task.OptTask(*args, **kwargs)

Bases: fireworks.core.firework.FireTaskBase

A FireTask for automatically running optimization loops and storing optimization data for complex workflows.

OptTask takes in _x and _y from the fw_spec (input/output of current guess), gathers X (previous guesses input) and y (previous guesses output), and predicts the next best guess.

Required args:
launchpad (LaunchPad): A Fireworks LaunchPad object, which can be used

to define the host/port/name of the db.

opt_label (string): Names the collection of that the particular

optimization’s data will be stored in. Multiple collections correspond to multiple independent optimizations.

__init__(*args, **kwargs)
optimize(fw_spec, manager_id)

Run the optimization algorithm.

Args:

fw_spec (dict): The firework spec. manager_id (ObjectId): The MongoDB object id of the manager

document.

Returns:

x (iterable): The current x guess. y: (iterable): The current y (objective function) value z (iterable): The z vector associated with x all_xz_new ([list] or [tuple]): The predicted next best guess(es),

including their associated z vectors

n_completed (int): The number of completed guesses/workflows

pop_lock(manager_id)

Releases the current process lock on the manager doc, and moves waiting processes from the queue to the lock.

Args:

manager_id: The MongoDB ObjectID object of the manager doc.

Returns:

None

required_params = ['launchpad', 'opt_label']
run_task(fw_spec)

FireTask for running an optimization loop.

Args:

fw_spec (dict): the firetask spec. Must contain a ‘_y’ key with a float type field and must contain a ‘_x’ key containing a vector uniquely defining the point in search space.

Returns:

(FWAction) A workflow based on the workflow creator and a new, optimized guess.

stash(x, y, z, all_xz_new, n_completed)

Write documents to database after optimization.

Args:

x (iterable): The current x guess. y: (iterable): The current y (objective function) value z (iterable): The z vector associated with x all_xz_new ([list] or [tuple]): The predicted next best guess(es),

including their associated z vectors

n_completed (int): The number of completed guesses/workflows

Returns:
opt_id (pymongo InsertedOneResult): The result of the insertion

of the new optimization document in the database. If multiple opt_ids are valid (ie batch mode is enabled), the last opt_id is returned.

rocketsled.utils module

Utility functions for OptTask.

exception rocketsled.utils.BatchNotReadyError

Bases: rocketsled.utils.RSBaseException

Batch-mode scheme broken

exception rocketsled.utils.DimensionMismatchError

Bases: rocketsled.utils.RSBaseException

Dimensions of the search space are ill-defined or conflicting

class rocketsled.utils.Dtypes

Bases: object

Defines the datatypes available for optimization.

__init__()
exception rocketsled.utils.ExhaustedSpaceError

Bases: rocketsled.utils.RSBaseException

When the search space has been exhausted.

exception rocketsled.utils.NotConfiguredError

Bases: rocketsled.utils.RSBaseException

When rocketsled config doc is broken or not found.

exception rocketsled.utils.ObjectiveError

Bases: rocketsled.utils.RSBaseException

Errors relating to objectives.

exception rocketsled.utils.RSBaseException

Bases: BaseException

Base exception for rocketsled exceptions.

rocketsled.utils.check_dims(dims)

Ensure the dimensions are in the correct format for the optimization.

Dimensions should be a list or tuple of lists or tuples each defining the search space in one dimension. The datatypes used inside each dimension’s definition should be NumPy compatible datatypes.

Continuous numerical dimensions (floats and ranges of ints) should be 2-tuples in the form (upper, lower). Categorical dimensions or discontinuous numerical dimensions should be exhaustive lists/tuples such as [‘red’, ‘green’, ‘blue’] or [1.2, 11.5, 15.0].

Args:

dims (list): The dimensions of the search space.

Returns:

([str]): Types of the dimensions in the search space defined by dims.

rocketsled.utils.convert_native(a)

Convert iterables of non-native types to native types for bson storage in the database. For situations where .tolist() does not work.

Args:
a (iterable or scalar): Input list of strings, ints, or

floats, as either numpy or native types (or others), which will be force-coerced to native types. Also works with scalar entries such as floats, ints, etc.

Returns:

native (list): A list of the data in a, converted to native types.

rocketsled.utils.convert_value_to_native(val, dtypes=<rocketsled.utils.Dtypes object>)

Convert a single value to the native datatype for storage in the opt db.

Args:
val (int/float/str): Numpy or native implementation of numeric or

categrical dtype

dtypes (Dtypes): An instance of the Dtypes class

Returns:

native (int/float/str): The native python value of val.

rocketsled.utils.deserialize(fun)

Takes a fireworks serialzed function handle and maps to a function object.

Args:
fun (string): a ‘module.function’ or ‘/path/to/mod.func’ style string

specifying the function

Returns:

(function) The function object defined by fun

rocketsled.utils.get_default_opttask_kwargs()

Get the default configuration kwargs for OptTask.

Args:

None

Returns:

conf_dict (dict): The default kwargs for OptTask

rocketsled.utils.get_len(obj)

A utility function for getting the length of an object.

Args:

obj: An object, optionally iterable.

Returns:

The length of that object if it is a list or tuple, otherwise 1.

rocketsled.utils.is_discrete(dims, criteria='all')

Checks if the search space is discrete.

Args:

dims ([tuple]): dimensions of the search space criteria (str/unicode): If ‘all’, returns bool based on whether

ALL dimensions are discrete. If ‘any’, returns bool based on whether ANY dimensions are discrete.

Returns:

(bool) whether the search space is totally discrete.

rocketsled.utils.is_duplicate_by_tolerance(x_new, all_x_explored, tolerances)

Duplicate checks with tolerances.

Args:

x_new (list): the new guess to be duplicate checked all_x_explored ([list]): the list of all explored guesses tolerances (list): the tolerances of each dimension

Returns:

True if x_new is a duplicate of a guess in X_explored. False if x_new is unique in the space and has yet to be tried.

rocketsled.utils.latex_float(f)

Convert floating point number into latex-formattable string for visualize. Might relocate to viz.py

Args:

f (float): A floating point number

Returns:

float_str (str): A latex-formatted string representing f.

rocketsled.utils.pareto(all_y, maximize=False)

Returns the indices of Pareto-optimal solutions.

Args:
Y [list]: A list of lists containing values to be evaluated for Pareto-

optimality

Returns:

list - The indices of the entries which are Pareto-optimal

rocketsled.utils.random_guess(dimensions)

Returns random new inputs based on the dimensions of the search space. It works with float, integer, and categorical types

Args:
dimensions ([tuple]): defines the dimensions of each parameter

example: [(1,50),(-18.939,22.435),[“red”, “green” , “blue”]]

Returns:
random_vector (list): randomly chosen next params in the search space

example: [12, 1.9383, “green”]

rocketsled.utils.serialize(fun)

Turn a python function into a string which can later be used to deserialize the function. Only works with importable modules.

Args:

fun (function object): The python function.

Returns:

(str) The full function path as a string.

rocketsled.utils.split_xz(xz, x_dims, x_only=False, z_only=False)

Split concatenated xz vector into x and z vectors.

Args:

xz (list): The XZ matrix. x_dims ([list/tuple]) the dimensions of the X dimensions x_only (bool): If True, returns only the x vector. z_only (bool): If True, returns only the z vector.

Returns:

x, z (list, list): the separate X and Z matrices.

Module contents

Rocketsled is an optimization suite “on rails” based on Scikit-learn and FireWorks workflows.