Configuration Guide#

Example Configurations#

This configuration can be generated from BOA with the following command:

python -m boa.config --output-path [path to output]

Default Configuration#

# ###########
# objective #
# ###########

# Your objective to be optimized by BOA.
# This can be a single objective, scalarized objective, or a multi-objective (pareto objective).
# For a single objective, list a single metric in the metrics field.
# For a multi-objective, list multiple metrics in the metrics field.
# For a scalarized objective, list multiple metrics in the metrics field and specify the
# weights for each metric in each metrics weight field.
objective:
  # A list of BOAMetric objects that represent the metrics to be used in the objective.
  metrics:
  - name: metric1  # Name of the metric. This is used to identify the metric in your wrapper script.
      # metrics to be used for optimization. You can use list any metric in built into BOA.
      # Those metrics can be found here: :mod:`Metrics <boa.metrics.metrics>`.
      # If no metric is specified, a :class:`pass through<.PassThrough>` metric will be used.
      # Which means that the metric will be computed by the user and passed to BOA.
      # You can also use any metric from sklearn by passing in the name of the metric
      # and metric type as `sklearn_metric`.
      # You can also use any metric from the Ax's or BoTorch's synthetic metrics modules by
      # passing in the name of the metric and metric type as `synthetic_metric`.
    metric: RMSE
  # String representation of outcome constraint of metrics.
  # This bounds a metric (or linear combination of metrics)
  # by some bound (>= or <=).
  # (ex. ['metric1 >= 0.0', 'metric2 <= 1.0', '2*metric1 + .5*metric2 <= 1.0'])
  outcome_constraints: '...'
  # String representation of Objective Thresholds for multi-objective optimization.
  # An objective threshold represents the threshold for an objective metric
  # to contribute to hypervolume calculations. A list containing the objective
  # threshold for each metric collectively form a reference point.
  # Because the objective thresholds are used to calculate hypervolume, they
  # can only be used for multi-objective optimization.
  # (ex. ['metric1 >= 0.0', 'metric2 <= 1.0'])
  objective_thresholds: '...'
  # A boolean that indicates whether the scalarized objective should be minimized or maximized.
  # Only used for scalarized objectives because each metric can have its own minimize flag.
  # Will be ignored for non scalarized objectives.
  minimize: '...'

# ############
# parameters #
# ############

# Parameters to optimize over. This can be expressed in two ways. The first is a list of dictionaries, where each
# dictionary represents a parameter. The second is a dictionary of dictionaries, where the key is the name of the
# parameter and the value is the dictionary representing the parameter.

# .. code-block:: yaml

# ## Dictionary of dictionaries
# x1:
# type: range
# bounds: [0, 1]
# value_type: float
# x2:
# type: range
# bounds: [0.0, 1.0]  # value_type is inferred from bounds

# .. code-block:: yaml

# ## List of dictionaries
# -   name: x1
# type: range
# bounds: [0, 1]
# value_type: float

# .. code-block:: yaml


# ## Fixed Types
# x3: 4.0  # Fixed type, value is 4.0
# x4:
# type: fixed
# value: "some string"  # Fixed type, value is "some string"

# ## Choice Options
# x5:
# type: choice
# values: ["a", "b"]
parameters:
  x1:
    type: range
    bounds:
    - 0
    - 1
    value_type: float
  x2:
    type: choice
    values:
    - a
    - b
    - c
  x3: 4.0

# #####################
# generation_strategy #
# #####################

# Your generation strategy is how new trials will be generated, that is, what acquisition function
# will be used to select the next trial, what kernel will be used to model the objective function,
# as well as other options such as max parallelism.

# This is an optional section. If not specified, Ax will choose a generation strategy for you.
# Based on your objective, parameters, and other options. You can pass options to how Ax chooses
# a generation strategy by passing options under `generation_strategy`.

# See https://ax.dev/tutorials/generation_strategy.html and
# https://ax.dev/api/modelbridge.html#ax.modelbridge.dispatch_utils.choose_generation_strategy
# For specific options.

# If you want to specify your own generation strategy, you can do so by passing a list of
# steps under `generation_strategy.steps`

# .. code-block:: yaml

# generation_strategy:
# # Use Ax's SAASBO algorithm, which is particularly well suited for high dimensional problems
# use_saasbo: true
# max_parallelism_cap: 10  # Maximum number of trials allowed to run in parallel

# Other options are possible,
# see https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy
# and Models from ax.modelbridge.registry.py for more options
# Some options include SOBOL, GPEI, Thompson, GPKG (knowledge gradient), and others.
# See https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_node.GenerationStep
# For specific options you can pass to each step

# .. code-block:: yaml

# generation_strategy:
# steps:
# -   model: SOBOL
# num_trials: 20
# -   model: GPEI  # Gaussian Process with Expected Improvement
# num_trials: -1
# max_parallelism: 10  # Maximum number of trials allowed to run in parallel
generation_strategy: {}

# ###########
# scheduler #
# ###########
# Settings for a scheduler instance.

# Attributes:
# max_pending_trials: Maximum number of pending trials the scheduler
# can have ``STAGED`` or ``RUNNING`` at once, required. If looking
# to use ``Runner.poll_available_capacity`` as a primary guide for
# how many trials should be pending at a given time, set this limit
# to a high number, as an upper bound on number of trials that
# should not be exceeded.
# trial_type: Type of trials (1-arm ``Trial`` or multi-arm ``Batch
# Trial``) that will be deployed using the scheduler. Defaults
# to 1-arm `Trial`. NOTE: use ``BatchTrial`` only if need to
# evaluate multiple arms *together*, e.g. in an A/B-test
# influenced by data nonstationarity. For cases where just
# deploying multiple arms at once is beneficial but the trials
# are evaluated *independently*, implement ``run_trials`` method
# in scheduler subclass, to deploy multiple 1-arm trials at
# the same time.
# batch_size: If using BatchTrial the number of arms to be generated and
# deployed per trial.
# total_trials: Limit on number of trials a given ``Scheduler``
# should run. If no stopping criteria are implemented on
# a given scheduler, exhaustion of this number of trials
# will be used as default stopping criterion in
# ``Scheduler.run_all_trials``. Required to be non-null if
# using ``Scheduler.run_all_trials`` (not required for
# ``Scheduler.run_n_trials``).
# tolerated_trial_failure_rate: Fraction of trials in this
# optimization that are allowed to fail without the whole
# optimization ending. Expects value between 0 and 1.
# NOTE: Failure rate checks begin once
# min_failed_trials_for_failure_rate_check trials have
# failed; after that point if the ratio of failed trials
# to total trials ran so far exceeds the failure rate,
# the optimization will halt.
# min_failed_trials_for_failure_rate_check: The minimum number
# of trials that must fail in `Scheduler` in order to start
# checking failure rate.
# log_filepath: File, to which to write optimization logs.
# logging_level: Minimum level of logging statements to log,
# defaults to ``logging.INFO``.
# ttl_seconds_for_trials: Optional TTL for all trials created
# within this ``Scheduler``, in seconds. Trials that remain
# ``RUNNING`` for more than their TTL seconds will be marked
# ``FAILED`` once the TTL elapses and may be re-suggested by
# the Ax optimization models.
# init_seconds_between_polls: Initial wait between rounds of
# polling, in seconds. Relevant if using the default wait-
# for-completed-runs functionality of the base ``Scheduler``
# (if ``wait_for_completed_trials_and_report_results`` is not
# overridden). With the default waiting, every time a poll
# returns that no trial evaluations completed, wait
# time will increase; once some completed trial evaluations
# are found, it will reset back to this value. Specify 0
# to not introduce any wait between polls.
# min_seconds_before_poll: Minimum number of seconds between
# beginning to run a trial and the first poll to check
# trial status.
# timeout_hours: Number of hours after which the optimization will abort.
# seconds_between_polls_backoff_factor: The rate at which the poll
# interval increases.
# run_trials_in_batches: If True and ``poll_available_capacity`` is
# implemented to return non-null results, trials will be dispatched
# in groups via `run_trials` instead of one-by-one via ``run_trial``.
# This allows to save time, IO calls or computation in cases where
# dispatching trials in groups is more efficient then sequential
# deployment. The size of the groups will be determined as
# the minimum of ``self.poll_available_capacity()`` and the number
# of generator runs that the generation strategy is able to produce
# without more data or reaching its allowed max paralellism limit.
# debug_log_run_metadata: Whether to log run_metadata for debugging purposes.
# early_stopping_strategy: A ``BaseEarlyStoppingStrategy`` that determines
# whether a trial should be stopped given the current state of
# the experiment. Used in ``should_stop_trials_early``.
# global_stopping_strategy: A ``BaseGlobalStoppingStrategy`` that determines
# whether the full optimization should be stopped or not.
# suppress_storage_errors_after_retries: Whether to fully suppress SQL
# storage-related errors if encounted, after retrying the call
# multiple times. Only use if SQL storage is not important for the given
# use case, since this will only log, but not raise, an exception if
# it's encountered while saving to DB or loading from it.

# n_trials: Only run this many trials,
# in contrast to `total_trials` which is a hard limit, even after reloading the
# scheduler, this will run n_trials trials every time you reload the scheduler.
# Making it easier to use when reloading the scheduler and continuing to run trials.
scheduler:
  max_pending_trials: '...'
  trial_type: '...'
  batch_size: '...'
  total_trials: '...'
  tolerated_trial_failure_rate: '...'
  min_failed_trials_for_failure_rate_check: '...'
  log_filepath: '...'
  logging_level: '...'
  ttl_seconds_for_trials: '...'
  init_seconds_between_polls: '...'
  min_seconds_before_poll: '...'
  seconds_between_polls_backoff_factor: '...'
  timeout_hours: '...'
  run_trials_in_batches: '...'
  debug_log_run_metadata: '...'
  early_stopping_strategy: '...'
  global_stopping_strategy: '...'
  suppress_storage_errors_after_retries: '...'

# ######
# name #
# ######
name: '...'

# #######################
# parameter_constraints #
# #######################
parameter_constraints: []

# ###############
# model_options #
# ###############
model_options: '...'

# ################
# script_options #
# ################
script_options:
  # Whether to use the config file as the base path for all relative paths.
  # If True, all relative paths will be relative to the config file directory.
  # Defaults to True if not specified.
  # If launched through BOA CLI, this will be set to True automatically.
  # rel_to_config and rel_to_launch cannot both be specified.
  rel_to_config: '...'
  # Whether to use the CLI launch directory as the base path for all relative paths.
  # If True, all relative paths will be relative to the CLI launch directory.
  # Defaults to `rel_to_config` argument if not specified.
  # rel_to_config and rel_to_launch cannot both be specified.
  rel_to_launch: '...'
  # Name of the python wrapper class. Used for python interface only.
  # Defaults to `Wrapper` if not specified.
  wrapper_name: '...'
  # Path to the python wrapper file. Used for python interface only.
  # Defaults to `wrapper.py` if not specified.
  wrapper_path: '...'
  # Path to the working directory. Defaults to `.` (Current working directory) if not specified.
  working_dir: '...'
  # Path to the directory for the output of the experiment
  # You may specify this or output_dir in your configuration file instead.
  experiment_dir: '...'
  # Output directory of project,
  # If you specify output_dir, then output will be saved in
  # output_dir / experiment_name
  # Because of this only either experiment_dir or output_dir may be specified.
  # (if neither experiment_dir nor output_dir are specified, output_dir defaults
  # to whatever pwd returns (and equivalent on windows))
  output_dir: '...'
  # Whether to append a timestamp to the output directory to ensure uniqueness.
  # Defaults to `True` if not specified.
  append_timestamp: '...'
  # Shell command to run the model. Used for the language-agnostic interface only.
  # this is what BOA will do to launch your script.
  # it will also pass as a command line argument the current trial directory
  # that is be parameterized by BOA.
  # `run_model` is the only needed shell command of these 4, because you
  # can use it also to write your config, run your model, set your trial status,
  # and fetch your trial data all in one script if you so choose. The
  # other scripts are provided as a convenience to segment out your logic.
  # This can either be a relative path or absolute path.
  run_model: '...'
  # Shell command to write your configs out. See `run_model` for more details.
  write_configs: '...'
  # Shell command to set your trial status. See `run_model` for more details.
  set_trial_status: '...'
  # Shell command to fetch your trial data. See `run_model` for more details.
  fetch_trial_data: '...'
  base_path: '...'

# ################
# parameter_keys #
# ################
parameter_keys: '...'

# #############
# config_path #
# #############
config_path: '...'

# ##########
# n_trials #
# ##########
n_trials: '...'

Configuration Detailed Reference#