{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(beta) Running the compiled optimizer with an LR Scheduler\n==========================================================\n\n**Author:** [Michael Lazos](https://github.com/mlazos)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The optimizer is a key algorithm for training any deep learning model.\nIn this example, we will show how to pair the optimizer, which has been\ncompiled using `torch.compile`, with the LR schedulers to accelerate\ntraining convergence.\n\n```{=html}\n
NOTE:
\n```\n```{=html}\n
\n```\n```{=html}\n

This tutorial requires PyTorch 2.3.0 or later.

\n```\n```{=html}\n
\n```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model Setup\n===========\n\nFor this example, we\\'ll use a simple sequence of linear layers.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\n\n# Create simple model\nmodel = torch.nn.Sequential(\n *[torch.nn.Linear(1024, 1024, False, device=\"cuda\") for _ in range(10)]\n)\ninput = torch.rand(1024, device=\"cuda\")\n\n# run forward pass\noutput = model(input)\n\n# run backward to populate the grads for our optimizer below\noutput.sum().backward()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting up and running the compiled optimizer with LR Scheduler\n===============================================================\n\nIn this section, we\\'ll use the Adam optimizer with LinearLR Scheduler\nand create a helper function to wrap the `step()` call for each of them\nin `torch.compile()`.\n\n```{=html}\n
NOTE:
\n```\n```{=html}\n
\n```\n```{=html}\n

torch.compile is only supported on CUDA devices that have a compute capability of 7.0 or higher.

\n```\n```{=html}\n
\n```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# exit cleanly if we are on a device that doesn't support ``torch.compile``\nif torch.cuda.get_device_capability() < (7, 0):\n print(\"Exiting because torch.compile is not supported on this device.\")\n import sys\n sys.exit(0)\n\n# !!! IMPORTANT !!! Wrap the lr in a Tensor if we are pairing the\n# the optimizer with an LR Scheduler.\n# Without this, torch.compile will recompile as the value of the LR\n# changes.\nopt = torch.optim.Adam(model.parameters(), lr=torch.tensor(0.01))\nsched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)\n\n@torch.compile(fullgraph=False)\ndef fn():\n opt.step()\n sched.step()\n\n\n# Warmup runs to compile the function\nfor _ in range(5):\n fn()\n print(opt.param_groups[0][\"lr\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extension: What happens with a non-tensor LR?\n=============================================\n\nFor the curious, we will show how to peek into what happens with\n`torch.compile` when we don\\'t wrap the LR in a tensor.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# No longer wrap the LR in a tensor here\nopt = torch.optim.Adam(model.parameters(), lr=0.01)\nsched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)\n\n@torch.compile(fullgraph=False)\ndef fn():\n opt.step()\n sched.step()\n\n# Setup logging to view recompiles\ntorch._logging.set_logs(recompiles=True)\n\n# Warmup runs to compile the function\n# We will now recompile on each iteration\n# as the value of the lr is mutated.\nfor _ in range(5):\n fn()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this example, we can see that we recompile the optimizer a few\ntimes due to the guard failure on the `lr` in `param_groups[0]`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Conclusion\n==========\n\nIn this tutorial we showed how to pair the optimizer compiled with\n`torch.compile` with an LR Scheduler to accelerate training convergence.\nWe used a model consisting of a simple sequence of linear layers with\nthe Adam optimizer paired with a LinearLR scheduler to demonstrate the\nLR changing across iterations.\n\nSee also:\n\n- [Compiled optimizer\n tutorial](https://pytorch.org/tutorials/recipes/compiling_optimizer.html) -\n an intro into the compiled optimizer.\n- [Compiling the optimizer with\n PT2](https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669) -\n deeper technical details on the compiled optimizer.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }