rocketsled package¶
Subpackages¶
Submodules¶
rocketsled.acq module¶
Acquisition functions and utilities.
- rocketsled.acq.acquire(acq, Y, mu, std, return_means=False)¶
- rocketsled.acq.ei(fmin, mu, std, xi=0.01)¶
Returns expected improvement values.
- Args:
fmin (float): Minimum value of the objective function known thus far. mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for
each y.
- xi (float): Amount of expected improvement, optional hyper-parameter.
Default value taken from “Practical bayesian optimization” by Daniel Lizotte (2008).
- Returns:
vals (numpy array): Acquisition values.
- rocketsled.acq.lcb(fmin, mu, std, kappa=1.96)¶
- Returns lower confidence bound estimates.
- fmin (float): (not used): Minimum value of the objective function known
thus far.
mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for
each y.
- kappa (float): Controls the variance in the prediction,
affecting exploration/exploitation.
- Returns:
vals (numpy array): Acquisition values.
- rocketsled.acq.pi(fmin, mu, std, xi=0.01)¶
Returns probability of improvement values.
- Args:
fmin (float): Minimum value of the objective function known thus far. mu (numpy array): Mean value of bootstrapped predictions for each y. std (numpy array): Standard deviation of bootstrapped predictions for
each y.
- xi (float): Amount of expected improvement, optional hyper-parameter.
Default value taken from “Practical bayesian optimization” by Daniel Lizotte (2008).
- Returns:
vals (numpy array): Acquisition values.
- rocketsled.acq.ppredict(X, Y, space, model)¶
Run a split and fit on a random subsample of the entire explored X. Use this fitted model to predict the remaining space. Meant to be run in parallel in combination with joblib’s delayed and Parallel utilities.
- Args:
X ([list]): A list of x vectors, for training. Y (list): A list of scalars, for training. space ([list]): A list of possible X vectors, yet to be explored. This
is the ‘test’ set.
- model (BaseEstimator object): sklearn estimator object. Must have .fit
and .predict methods.
- Returns:
- (numpy array): The 1-D array of predicted points for the entire
remaining space.
- rocketsled.acq.predict(X, Y, space, model, nstraps)¶
rocketsled.control module¶
A class to configure, manage, and analyze optimizations. Similar to the LaunchPad for FireWorks.
- class rocketsled.control.MissionControl(launchpad, opt_label)¶
Bases:
object
A class for configuring and controlling rocketsled optimization.
- Args:
- launchpad (LaunchPad): The launchpad to use for storing optimization
information.
- opt_label (str): The name of the collection where Rocketsled should
keep optimization data (in the same db as the LaunchPad). Please use a new collection (ie no other documents are present in the collection).
- __init__(launchpad, opt_label)¶
- configure(wf_creator, dimensions, **kwargs)¶
Set up the optimization config. Required before using OptTask, but only needs to be done once. To reconfigure, use MissionControl.reset and then use configure again.
Defaults can be found in defaults.yaml.
Args:
- wf_creator (function or str): The function object that creates the
workflow based on a unique vector, x. Alternatively, the full string module path to that function, e.g. “mypkg.mymodule.my_wf_creator”, which must importable and found in PYTHONPATH.
- dimensions ([tuple]): each 2-tuple in the list defines one dimension in
the search space in (low, high) format. For categorical or discontinuous dimensions, includes all possible categories or values as a list of any length or a tuple of length>2. Example: dimensions = dim = [(1,100), (9.293, 18.2838), (“red”, “blue”, “green”)].
- **kwargs: Keyword arguments for defining the optimization. A full list
of possible kwargs is given below:
Optimization data: opt_label (str): The label to use for this collection of
optimization data.
Workflow creator function: wf_creator_args (list): the positional args to be passed to the
wf_creator function alongsize the new x vector
- wf_creator_kwargs (dict): details the kwargs to be passed to the
wf_creator function alongside the new x vector
Predictors (optimization): predictor (function or str): a function which given a list of
searched points and unsearched points, returns an optimized guess.
- To use a builtin predictor, pass in one of:
‘GaussianProcessRegressor’, ‘RandomForestRegressor’, ‘ExtraTreesRegressor’, ‘GradientBoostingRegressor’, ‘random’ (random guess)
The default is ‘GaussianProcessRegressor’
To use a custom predictor, pass in the function object. Alternatively, the full string module path to that function, e.g. “mypkg.mymodule.my_predictor”, which must importable and found in PYTHONPATH. Example builtin predictor: ‘GaussianProcessRegressor’ Example custom predictor: my_predictor Example custom predictor 2: ‘my_pkg.my_module.my_predictor’
- predictor_args (list): the positional args to be passed to the model
along with a list of points to be searched. For sklearn-based predictors included in OptTask, these positional args are passed to the init method of the chosen model. For custom predictors, these are passed to the chosen predictor function alongside the searched guesses, the output from searched guesses, and an unsearched space to be used with optimization.
- predictor_kwargs (dict): the kwargs to be passed to the model.
Similar to predictor_args.
- n_search_pts (int): The number of points to be searched in the
search space when choosing the next best point. Choosing more points to search may increase the effectiveness of the optimization but take longer to evaluate. The default is 1000.
- n_train_pts (int): The number of already explored points to be
chosen for training. Default is None, meaning all available points will be used for training. Reduce the number of points to decrease training times.
- n_bootstraps (int): The number of times each optimization should,
sample, train, and predict values when generating uncertainty estimates for prediction. At least 10 data points must be present for bootstrapping. Not used if: acq not specified, custom predictor used, or GaussianProcessRegressor used.
- acq (str): The acquisition function to use. Can be ‘ei’ for expected
improvement, ‘pi’ for probability of improvement, or ‘lcb’ for lower confidence bound, or None for greedy selection. Only works with builtin predictors.
- space_file (str): The fully specified path of a pickle file
containing a list of all possible searchable vectors. For example ‘/Users/myuser/myfolder/myspace.p’. When loaded, this space_file should be a list of tuples.
- onehot_categorical (bool): If True, preprocesses categorical data
(strings) to one-hot encoded binary arrays for use with custom predictor functions. Default False.
- duplicate_check (bool): If True, checks that custom optimizers are
not making duplicate guesses; all built-in optimizers cannot duplicate guess. If the custom predictor suggests a duplicate, OptTask picks a random guess out of the remaining untried space. Default is no duplicate check, and an error is raised if a duplicate is suggested.
- tolerances (list): The tolerance of each feature when duplicate
checking. For categorical features, put ‘None’ Example: Our dimensions are [(1, 100), [‘red’, ‘blue’], (2.0, 20.0)]. We want our first parameter to be a duplicate only if it is exact, and our third parameter to be a duplicate if it is within 1e-6. Then:
tolerances=[0, None, 1e-6]
- maximize (bool): If True, maximizes the objective function instead
of minimizing. Defaults to False, meaninng minimze.
z-vector features: get_z (function or str): the fully-qualified name of a function
(or function object itself) which, given an x vector, returns another vector z which provides extra information to the machine learner. The features defined in z are not used to run the workflow, but are used for learning. If z_features are enabled, ONLY z features will be used for learning (x vectors essentially become tags or identifiers only) Examples:
get_z = ‘my_pkg.my_module.my_fun’ get_z = ‘/path/to/folder/containing/my_dir/my_module.my_fun’
- get_z_args (list): the positional arguments to be passed to the
get_z function alongside x
- get_z_kwargs (dict): the kwargs to be passed to the get_z function
alongside x
- z_file (str): The filename (pickle file) where OptTask should save
/cache z calculations. Specify this argument if calculating z for many (n_search_pts) is not trivial and will cost time in computing. With this argument specified, each z will only be calculated once. Defaults to None, meaning that all unexplored z are re-calculated each iteration. Example:
z_file = ‘/path/to/z_guesses.p’
Parallelism: enforce_sequential (bool): WARNING: Experimental feature! If True,
enforces that RS optimizations are run sequentially (default), which prevents duplicate guesses from ever being run. If False, allows OptTasks to run optimizations in parallel, which may cause duplicate guesses with high parallelism.
- batch_size (int): The number of jobs to submit per batch for a batch
optimization. For example, batch_size=5 will optimize every 5th job, then submit another 5 jobs based on the best 5 predictions (recomputing the acquisition function after each prediction).
- timeout (int): The number of seconds to wait before resetting the
lock on the db.
- Returns:
None: If you want to run the OptTask workflow, you’ll need to pass in the launchpad and opt_label arguments in your wf_creator.
- fetch_matrices(include_reserved=False)¶
Return the X and Y matrices for this optimization.
- Args:
- include_reserved (bool): If True, returns “reserved” guesses (those
which have been submitted to the launchpad but have not been successfully run). y values for these guesses are “reserved”.
- Returns:
- all_x, all_y ([list], [list]): The X (input) matrix has dimensions
n_samples, n_dimensions. The Y (output) matrix has dimensions n_samples, n_objectives. Only completed entries are retrieved.
- plot(show_best=True, show_mean=True, latexify=False, font_family='serif', scale='linear', summarize=True, print_pareto=False)¶
Visualize the progress of an optimization.
- Args:
- show_best (bool): Point out the best point on legend and on plot. If
more than one best point (i.e., multiple equal maxima), show them all. If multiobjective, shows best for each objective, and prints the best value and x for each objective.
- show_mean (bool): Show the mean and standard deviation for the
guesses as the computations are carried out.
latexify (bool): Use LaTeX for formatting. font_family (str): The font family to use for rendering. Choose from
‘serif’, ‘sans-serif’, ‘fantasy’, ‘monospace’, or ‘cursive’.
- scale (str): Whether to scale the plot’s y axis according to log
(‘log’) or ‘linear’ scale.
summarize (bool): If True, stdouts summary from .summarize. print_pareto (bool): If True, display all Pareto-optimal objective
values.
- Returns:
A matplotlib plot object handle
- reset(hard=False)¶
Reset (delete) this optimization configuration and/or collection.
- Soft reset (hard=False): Delete the configuration, but keep the
optimization data. This is useful if you are changing optimizers and want to keep the previous data (recommended)
- Hard reset (hard=True): Delete all data from the collection, including
optimizatiomn data. WARNING - THIS OPTION IS NOT REVERSIBLE!
- Args:
- hard (bool): Whether to do a hard or soft reset. If False, deletes
only the configuration, leaving the previously stored optimization data. If True, deletes everything from the optimization collection.
- Returns:
None
- summarize()¶
Returns stats about the optimization collection and checks consistency of the collection.
- Returns:
fmtstr (str): The formatted information from the analysis, to print.
- property task¶
Return a preconfigured OptTask which can be inserted into a workflow. Make sure to run .configure before using this task, otherwise your workflow optimization might not work!
- Returns:
OptTask: An OptTask object.
rocketsled.task module¶
The FireTask for running automatic optimization loops.
Please see the documentation for a comprehensive guide on usage.
- class rocketsled.task.OptTask(*args, **kwargs)¶
Bases:
fireworks.core.firework.FireTaskBase
A FireTask for automatically running optimization loops and storing optimization data for complex workflows.
OptTask takes in _x and _y from the fw_spec (input/output of current guess), gathers X (previous guesses input) and y (previous guesses output), and predicts the next best guess.
- Required args:
- launchpad (LaunchPad): A Fireworks LaunchPad object, which can be used
to define the host/port/name of the db.
- opt_label (string): Names the collection of that the particular
optimization’s data will be stored in. Multiple collections correspond to multiple independent optimizations.
- __init__(*args, **kwargs)¶
- optimize(fw_spec, manager_id)¶
Run the optimization algorithm.
- Args:
fw_spec (dict): The firework spec. manager_id (ObjectId): The MongoDB object id of the manager
document.
- Returns:
x (iterable): The current x guess. y: (iterable): The current y (objective function) value z (iterable): The z vector associated with x all_xz_new ([list] or [tuple]): The predicted next best guess(es),
including their associated z vectors
n_completed (int): The number of completed guesses/workflows
- pop_lock(manager_id)¶
Releases the current process lock on the manager doc, and moves waiting processes from the queue to the lock.
- Args:
manager_id: The MongoDB ObjectID object of the manager doc.
- Returns:
None
- required_params = ['launchpad', 'opt_label']¶
- run_task(fw_spec)¶
FireTask for running an optimization loop.
- Args:
fw_spec (dict): the firetask spec. Must contain a ‘_y’ key with a float type field and must contain a ‘_x’ key containing a vector uniquely defining the point in search space.
- Returns:
(FWAction) A workflow based on the workflow creator and a new, optimized guess.
- stash(x, y, z, all_xz_new, n_completed)¶
Write documents to database after optimization.
- Args:
x (iterable): The current x guess. y: (iterable): The current y (objective function) value z (iterable): The z vector associated with x all_xz_new ([list] or [tuple]): The predicted next best guess(es),
including their associated z vectors
n_completed (int): The number of completed guesses/workflows
- Returns:
- opt_id (pymongo InsertedOneResult): The result of the insertion
of the new optimization document in the database. If multiple opt_ids are valid (ie batch mode is enabled), the last opt_id is returned.
rocketsled.utils module¶
Utility functions for OptTask.
- exception rocketsled.utils.BatchNotReadyError¶
Bases:
rocketsled.utils.RSBaseException
Batch-mode scheme broken
- exception rocketsled.utils.DimensionMismatchError¶
Bases:
rocketsled.utils.RSBaseException
Dimensions of the search space are ill-defined or conflicting
- class rocketsled.utils.Dtypes¶
Bases:
object
Defines the datatypes available for optimization.
- __init__()¶
- exception rocketsled.utils.ExhaustedSpaceError¶
Bases:
rocketsled.utils.RSBaseException
When the search space has been exhausted.
- exception rocketsled.utils.NotConfiguredError¶
Bases:
rocketsled.utils.RSBaseException
When rocketsled config doc is broken or not found.
- exception rocketsled.utils.ObjectiveError¶
Bases:
rocketsled.utils.RSBaseException
Errors relating to objectives.
- exception rocketsled.utils.RSBaseException¶
Bases:
BaseException
Base exception for rocketsled exceptions.
- rocketsled.utils.check_dims(dims)¶
Ensure the dimensions are in the correct format for the optimization.
Dimensions should be a list or tuple of lists or tuples each defining the search space in one dimension. The datatypes used inside each dimension’s definition should be NumPy compatible datatypes.
Continuous numerical dimensions (floats and ranges of ints) should be 2-tuples in the form (upper, lower). Categorical dimensions or discontinuous numerical dimensions should be exhaustive lists/tuples such as [‘red’, ‘green’, ‘blue’] or [1.2, 11.5, 15.0].
- Args:
dims (list): The dimensions of the search space.
- Returns:
([str]): Types of the dimensions in the search space defined by dims.
- rocketsled.utils.convert_native(a)¶
Convert iterables of non-native types to native types for bson storage in the database. For situations where .tolist() does not work.
- Args:
- a (iterable or scalar): Input list of strings, ints, or
floats, as either numpy or native types (or others), which will be force-coerced to native types. Also works with scalar entries such as floats, ints, etc.
- Returns:
native (list): A list of the data in a, converted to native types.
- rocketsled.utils.convert_value_to_native(val, dtypes=<rocketsled.utils.Dtypes object>)¶
Convert a single value to the native datatype for storage in the opt db.
- Args:
- val (int/float/str): Numpy or native implementation of numeric or
categrical dtype
dtypes (Dtypes): An instance of the Dtypes class
- Returns:
native (int/float/str): The native python value of val.
- rocketsled.utils.deserialize(fun)¶
Takes a fireworks serialzed function handle and maps to a function object.
- Args:
- fun (string): a ‘module.function’ or ‘/path/to/mod.func’ style string
specifying the function
- Returns:
(function) The function object defined by fun
- rocketsled.utils.get_default_opttask_kwargs()¶
Get the default configuration kwargs for OptTask.
- Args:
None
- Returns:
conf_dict (dict): The default kwargs for OptTask
- rocketsled.utils.get_len(obj)¶
A utility function for getting the length of an object.
- Args:
obj: An object, optionally iterable.
- Returns:
The length of that object if it is a list or tuple, otherwise 1.
- rocketsled.utils.is_discrete(dims, criteria='all')¶
Checks if the search space is discrete.
- Args:
dims ([tuple]): dimensions of the search space criteria (str/unicode): If ‘all’, returns bool based on whether
ALL dimensions are discrete. If ‘any’, returns bool based on whether ANY dimensions are discrete.
- Returns:
(bool) whether the search space is totally discrete.
- rocketsled.utils.is_duplicate_by_tolerance(x_new, all_x_explored, tolerances)¶
Duplicate checks with tolerances.
- Args:
x_new (list): the new guess to be duplicate checked all_x_explored ([list]): the list of all explored guesses tolerances (list): the tolerances of each dimension
- Returns:
True if x_new is a duplicate of a guess in X_explored. False if x_new is unique in the space and has yet to be tried.
- rocketsled.utils.latex_float(f)¶
Convert floating point number into latex-formattable string for visualize. Might relocate to viz.py
- Args:
f (float): A floating point number
- Returns:
float_str (str): A latex-formatted string representing f.
- rocketsled.utils.pareto(all_y, maximize=False)¶
Returns the indices of Pareto-optimal solutions.
- Args:
- Y [list]: A list of lists containing values to be evaluated for Pareto-
optimality
- Returns:
list - The indices of the entries which are Pareto-optimal
- rocketsled.utils.random_guess(dimensions)¶
Returns random new inputs based on the dimensions of the search space. It works with float, integer, and categorical types
- Args:
- dimensions ([tuple]): defines the dimensions of each parameter
example: [(1,50),(-18.939,22.435),[“red”, “green” , “blue”]]
- Returns:
- random_vector (list): randomly chosen next params in the search space
example: [12, 1.9383, “green”]
- rocketsled.utils.serialize(fun)¶
Turn a python function into a string which can later be used to deserialize the function. Only works with importable modules.
- Args:
fun (function object): The python function.
- Returns:
(str) The full function path as a string.
- rocketsled.utils.split_xz(xz, x_dims, x_only=False, z_only=False)¶
Split concatenated xz vector into x and z vectors.
- Args:
xz (list): The XZ matrix. x_dims ([list/tuple]) the dimensions of the X dimensions x_only (bool): If True, returns only the x vector. z_only (bool): If True, returns only the z vector.
- Returns:
x, z (list, list): the separate X and Z matrices.
Module contents¶
Rocketsled is an optimization suite “on rails” based on Scikit-learn and FireWorks workflows.