{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Multi-Objective NAS with Ax\n===========================\n\n**Authors:** [David Eriksson](https://github.com/dme65), [Max\nBalandat](https://github.com/Balandat), and the Adaptive Experimentation\nteam at Meta.\n\nIn this tutorial, we show how to use [Ax](https://ax.dev/) to run\nmulti-objective neural architecture search (NAS) for a simple neural\nnetwork model on the popular MNIST dataset. While the underlying\nmethodology would typically be used for more complicated models and\nlarger datasets, we opt for a tutorial that is easily runnable\nend-to-end on a laptop in less than 20 minutes.\n\nIn many NAS applications, there is a natural tradeoff between multiple\nobjectives of interest. For instance, when deploying models on-device we\nmay want to maximize model performance (for example, accuracy), while\nsimultaneously minimizing competing metrics like power consumption,\ninference latency, or model size in order to satisfy deployment\nconstraints. Often, we may be able to reduce computational requirements\nor latency of predictions substantially by accepting minimally lower\nmodel performance. Principled methods for exploring such tradeoffs\nefficiently are key enablers of scalable and sustainable AI, and have\nmany successful applications at Meta - see for instance our [case\nstudy](https://research.facebook.com/blog/2021/07/optimizing-model-accuracy-and-latency-using-bayesian-multi-objective-neural-architecture-search/)\non a Natural Language Understanding model.\n\nIn our example here, we will tune the widths of two hidden layers, the\nlearning rate, the dropout probability, the batch size, and the number\nof training epochs. The goal is to trade off performance (accuracy on\nthe validation set) and model size (the number of model parameters).\n\nThis tutorial makes use of the following PyTorch libraries:\n\n-   [PyTorch\n    Lightning](https://github.com/PyTorchLightning/pytorch-lightning)\n    (specifying the model and training loop)\n-   [TorchX](https://github.com/pytorch/torchx) (for running training\n    jobs remotely / asynchronously)\n-   [BoTorch](https://github.com/pytorch/botorch) (the Bayesian\n    Optimization library powering Ax\\'s algorithms)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Defining the TorchX App\n=======================\n\nOur goal is to optimize the PyTorch Lightning training job defined in\n[mnist\\_train\\_nas.py](https://github.com/pytorch/tutorials/tree/main/intermediate_source/mnist_train_nas.py).\nTo do this using TorchX, we write a helper function that takes in the\nvalues of the architecture and hyperparameters of the training job and\ncreates a [TorchX AppDef](https://pytorch.org/torchx/latest/basics.html)\nwith the appropriate settings.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pathlib import Path\n\nimport torchx\n\nfrom torchx import specs\nfrom torchx.components import utils\n\n\ndef trainer(\n    log_path: str,\n    hidden_size_1: int,\n    hidden_size_2: int,\n    learning_rate: float,\n    epochs: int,\n    dropout: float,\n    batch_size: int,\n    trial_idx: int = -1,\n) -> specs.AppDef:\n\n    # define the log path so we can pass it to the TorchX ``AppDef``\n    if trial_idx >= 0:\n        log_path = Path(log_path).joinpath(str(trial_idx)).absolute().as_posix()\n\n    return utils.python(\n        # command line arguments to the training script\n        \"--log_path\",\n        log_path,\n        \"--hidden_size_1\",\n        str(hidden_size_1),\n        \"--hidden_size_2\",\n        str(hidden_size_2),\n        \"--learning_rate\",\n        str(learning_rate),\n        \"--epochs\",\n        str(epochs),\n        \"--dropout\",\n        str(dropout),\n        \"--batch_size\",\n        str(batch_size),\n        # other config options\n        name=\"trainer\",\n        script=\"mnist_train_nas.py\",\n        image=torchx.version.TORCHX_IMAGE,\n    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up the Runner\n=====================\n\nAx's [Runner](https://ax.dev/api/core.html#ax.core.runner.Runner)\nabstraction allows writing interfaces to various backends. Ax already\ncomes with Runner for TorchX, and so we just need to configure it. For\nthe purpose of this tutorial we run jobs locally in a fully asynchronous\nfashion.\n\nIn order to launch them on a cluster, you can instead specify a\ndifferent TorchX scheduler and adjust the configuration appropriately.\nFor example, if you have a Kubernetes cluster, you just need to change\nthe scheduler from `local_cwd` to `kubernetes`).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import tempfile\nfrom ax.runners.torchx import TorchXRunner\n\n# Make a temporary dir to log our results into\nlog_dir = tempfile.mkdtemp()\n\nax_runner = TorchXRunner(\n    tracker_base=\"/tmp/\",\n    component=trainer,\n    # NOTE: To launch this job on a cluster instead of locally you can\n    # specify a different scheduler and adjust arguments appropriately.\n    scheduler=\"local_cwd\",\n    component_const_params={\"log_path\": log_dir},\n    cfg={},\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up the `SearchSpace`\n============================\n\nFirst, we define our search space. Ax supports both range parameters of\ntype integer and float as well as choice parameters which can have\nnon-numerical types such as strings. We will tune the hidden sizes,\nlearning rate, dropout, and number of epochs as range parameters and\ntune the batch size as an ordered choice parameter to enforce it to be a\npower of 2.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.core import (\n    ChoiceParameter,\n    ParameterType,\n    RangeParameter,\n    SearchSpace,\n)\n\nparameters = [\n    # NOTE: In a real-world setting, hidden_size_1 and hidden_size_2\n    # should probably be powers of 2, but in our simple example this\n    # would mean that ``num_params`` can't take on that many values, which\n    # in turn makes the Pareto frontier look pretty weird.\n    RangeParameter(\n        name=\"hidden_size_1\",\n        lower=16,\n        upper=128,\n        parameter_type=ParameterType.INT,\n        log_scale=True,\n    ),\n    RangeParameter(\n        name=\"hidden_size_2\",\n        lower=16,\n        upper=128,\n        parameter_type=ParameterType.INT,\n        log_scale=True,\n    ),\n    RangeParameter(\n        name=\"learning_rate\",\n        lower=1e-4,\n        upper=1e-2,\n        parameter_type=ParameterType.FLOAT,\n        log_scale=True,\n    ),\n    RangeParameter(\n        name=\"epochs\",\n        lower=1,\n        upper=4,\n        parameter_type=ParameterType.INT,\n    ),\n    RangeParameter(\n        name=\"dropout\",\n        lower=0.0,\n        upper=0.5,\n        parameter_type=ParameterType.FLOAT,\n    ),\n    ChoiceParameter(  # NOTE: ``ChoiceParameters`` don't require log-scale\n        name=\"batch_size\",\n        values=[32, 64, 128, 256],\n        parameter_type=ParameterType.INT,\n        is_ordered=True,\n        sort_values=True,\n    ),\n]\n\nsearch_space = SearchSpace(\n    parameters=parameters,\n    # NOTE: In practice, it may make sense to add a constraint\n    # hidden_size_2 <= hidden_size_1\n    parameter_constraints=[],\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up Metrics\n==================\n\nAx has the concept of a [Metric](https://ax.dev/api/core.html#metric)\nthat defines properties of outcomes and how observations are obtained\nfor these outcomes. This allows e.g. encoding how data is fetched from\nsome distributed execution backend and post-processed before being\npassed as input to Ax.\n\nIn this tutorial we will use [multi-objective\noptimization](https://ax.dev/tutorials/multiobjective_optimization.html)\nwith the goal of maximizing the validation accuracy and minimizing the\nnumber of model parameters. The latter represents a simple proxy of\nmodel latency, which is hard to estimate accurately for small ML models\n(in an actual application we would benchmark the latency while running\nthe model on-device).\n\nIn our example TorchX will run the training jobs in a fully asynchronous\nfashion locally and write the results to the `log_dir` based on the\ntrial index (see the `trainer()` function above). We will define a\nmetric class that is aware of that logging directory. By subclassing\n[TensorboardCurveMetric](https://ax.dev/api/metrics.html?highlight=tensorboardcurvemetric#ax.metrics.tensorboard.TensorboardCurveMetric)\nwe get the logic to read and parse the TensorBoard logs for free.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.metrics.tensorboard import TensorboardMetric\nfrom tensorboard.backend.event_processing import plugin_event_multiplexer as event_multiplexer\n\nclass MyTensorboardMetric(TensorboardMetric):\n\n    # NOTE: We need to tell the new TensorBoard metric how to get the id /\n    # file handle for the TensorBoard logs from a trial. In this case\n    # our convention is to just save a separate file per trial in\n    # the prespecified log dir.\n    def _get_event_multiplexer_for_trial(self, trial):\n        mul = event_multiplexer.EventMultiplexer(max_reload_threads=20)\n        mul.AddRunsFromDirectory(Path(log_dir).joinpath(str(trial.index)).as_posix(), None)\n        mul.Reload()\n    \n        return mul\n\n    # This indicates whether the metric is queryable while the trial is\n    # still running. We don't use this in the current tutorial, but Ax\n    # utilizes this to implement trial-level early-stopping functionality.\n    @classmethod\n    def is_available_while_running(cls):\n        return False"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can instantiate the metrics for accuracy and the number of model\nparameters. Here [curve\\_name]{.title-ref} is the name of the metric in\nthe TensorBoard logs, while [name]{.title-ref} is the metric name used\ninternally by Ax. We also specify [lower\\_is\\_better]{.title-ref} to\nindicate the favorable direction of the two metrics.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "val_acc = MyTensorboardMetric(\n    name=\"val_acc\",\n    tag=\"val_acc\",\n    lower_is_better=False,\n)\nmodel_num_params = MyTensorboardMetric(\n    name=\"num_params\",\n    tag=\"num_params\",\n    lower_is_better=True,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Setting up the `OptimizationConfig`\n===================================\n\nThe way to tell Ax what it should optimize is by means of an\n[OptimizationConfig](https://ax.dev/api/core.html#module-ax.core.optimization_config).\nHere we use a `MultiObjectiveOptimizationConfig` as we will be\nperforming multi-objective optimization.\n\nAdditionally, Ax supports placing constraints on the different metrics\nby specifying objective thresholds, which bound the region of interest\nin the outcome space that we want to explore. For this example, we will\nconstrain the validation accuracy to be at least 0.94 (94%) and the\nnumber of model parameters to be at most 80,000.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.core import MultiObjective, Objective, ObjectiveThreshold\nfrom ax.core.optimization_config import MultiObjectiveOptimizationConfig\n\n\nopt_config = MultiObjectiveOptimizationConfig(\n    objective=MultiObjective(\n        objectives=[\n            Objective(metric=val_acc, minimize=False),\n            Objective(metric=model_num_params, minimize=True),\n        ],\n    ),\n    objective_thresholds=[\n        ObjectiveThreshold(metric=val_acc, bound=0.94, relative=False),\n        ObjectiveThreshold(metric=model_num_params, bound=80_000, relative=False),\n    ],\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Creating the Ax Experiment\n==========================\n\nIn Ax, the\n[Experiment](https://ax.dev/api/core.html#ax.core.experiment.Experiment)\nobject is the object that stores all the information about the problem\nsetup.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.core import Experiment\n\nexperiment = Experiment(\n    name=\"torchx_mnist\",\n    search_space=search_space,\n    optimization_config=opt_config,\n    runner=ax_runner,\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Choosing the Generation Strategy\n================================\n\nA\n[GenerationStrategy](https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_strategy.GenerationStrategy)\nis the abstract representation of how we would like to perform the\noptimization. While this can be customized (if you'd like to do so, see\n[this tutorial](https://ax.dev/tutorials/generation_strategy.html)), in\nmost cases Ax can automatically determine an appropriate strategy based\non the search space, optimization config, and the total number of trials\nwe want to run.\n\nTypically, Ax chooses to evaluate a number of random configurations\nbefore starting a model-based Bayesian Optimization strategy.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "total_trials = 48  # total evaluation budget\n\nfrom ax.modelbridge.dispatch_utils import choose_generation_strategy\n\ngs = choose_generation_strategy(\n    search_space=experiment.search_space,\n    optimization_config=experiment.optimization_config,\n    num_trials=total_trials,\n  )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Configuring the Scheduler\n=========================\n\nThe `Scheduler` acts as the loop control for the optimization. It\ncommunicates with the backend to launch trials, check their status, and\nretrieve results. In the case of this tutorial, it is simply reading and\nparsing the locally saved logs. In a remote execution setting, it would\ncall APIs. The following illustration from the Ax [Scheduler\ntutorial](https://ax.dev/tutorials/scheduler.html) summarizes how the\nScheduler interacts with external systems used to run trial evaluations:\n\n![image](https://pytorch.org/tutorials/_static/img/ax_scheduler_illustration.png)\n\nThe `Scheduler` requires the `Experiment` and the `GenerationStrategy`.\nA set of options can be passed in via `SchedulerOptions`. Here, we\nconfigure the number of total evaluations as well as\n`max_pending_trials`, the maximum number of trials that should run\nconcurrently. In our local setting, this is the number of training jobs\nrunning as individual processes, while in a remote execution setting,\nthis would be the number of machines you want to use in parallel.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.service.scheduler import Scheduler, SchedulerOptions\n\nscheduler = Scheduler(\n    experiment=experiment,\n    generation_strategy=gs,\n    options=SchedulerOptions(\n        total_trials=total_trials, max_pending_trials=4\n    ),\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Running the optimization\n========================\n\nNow that everything is configured, we can let Ax run the optimization in\na fully automated fashion. The Scheduler will periodically check the\nlogs for the status of all currently running trials, and if a trial\ncompletes the scheduler will update its status on the experiment and\nfetch the observations needed for the Bayesian optimization algorithm.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "scheduler.run_all_trials()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Evaluating the results\n======================\n\nWe can now inspect the result of the optimization using helper functions\nand visualizations included with Ax.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "First, we generate a dataframe with a summary of the results of the\nexperiment. Each row in this dataframe corresponds to a trial (that is,\na training job that was run), and contains information on the status of\nthe trial, the parameter configuration that was evaluated, and the\nmetric values that were observed. This provides an easy way to sanity\ncheck the optimization.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.service.utils.report_utils import exp_to_df\n\ndf = exp_to_df(experiment)\ndf.head(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can also visualize the Pareto frontier of tradeoffs between the\nvalidation accuracy and the number of model parameters.\n\n```{=html}\n<div style=\"background-color: #6bcebb; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>TIP:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>Ax uses Plotly to produce interactive plots, which allow you todo things like zoom, crop, or hover in order to view detailsof components of the plot. Try it out, and take a look at the<a href=\"https://ax.dev/tutorials/visualizations.html\">visualization tutorial</a>if you'd like to learn more).</p>\n```\n```{=html}\n</div>\n```\nThe final optimization results are shown in the figure below where the\ncolor corresponds to the iteration number for each trial. We see that\nour method was able to successfully explore the trade-offs and found\nboth large models with high validation accuracy as well as small models\nwith comparatively lower validation accuracy.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.service.utils.report_utils import _pareto_frontier_scatter_2d_plotly\n\n_pareto_frontier_scatter_2d_plotly(experiment)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To better understand what our surrogate models have learned about the\nblack box objectives, we can take a look at the leave-one-out cross\nvalidation results. Since our models are Gaussian Processes, they not\nonly provide point predictions but also uncertainty estimates about\nthese predictions. A good model means that the predicted means (the\npoints in the figure) are close to the 45 degree line and that the\nconfidence intervals cover the 45 degree line with the expected\nfrequency (here we use 95% confidence intervals, so we would expect them\nto contain the true observation 95% of the time).\n\nAs the figures below show, the model size (`num_params`) metric is much\neasier to model than the validation accuracy (`val_acc`) metric.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.modelbridge.cross_validation import compute_diagnostics, cross_validate\nfrom ax.plot.diagnostic import interact_cross_validation_plotly\nfrom ax.utils.notebook.plotting import init_notebook_plotting, render\n\ncv = cross_validate(model=gs.model)  # The surrogate model is stored on the ``GenerationStrategy``\ncompute_diagnostics(cv)\n\ninteract_cross_validation_plotly(cv)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can also make contour plots to better understand how the different\nobjectives depend on two of the input parameters. In the figure below,\nwe show the validation accuracy predicted by the model as a function of\nthe two hidden sizes. The validation accuracy clearly increases as the\nhidden sizes increase.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from ax.plot.contour import interact_contour_plotly\n\ninteract_contour_plotly(model=gs.model, metric_name=\"val_acc\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Similarly, we show the number of model parameters as a function of the\nhidden sizes in the figure below and see that it also increases as a\nfunction of the hidden sizes (the dependency on `hidden_size_1` is much\nlarger).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "interact_contour_plotly(model=gs.model, metric_name=\"num_params\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Acknowledgments\n===============\n\nWe thank the TorchX team (in particular Kiuk Chung and Tristan Rice) for\ntheir help with integrating TorchX with Ax.\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}