{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[Learn the Basics](intro.html) \\|\\|\n[Quickstart](quickstart_tutorial.html) \\|\\|\n[Tensors](tensorqs_tutorial.html) \\|\\| [Datasets &\nDataLoaders](data_tutorial.html) \\|\\|\n[Transforms](transforms_tutorial.html) \\|\\| [Build\nModel](buildmodel_tutorial.html) \\|\\| **Autograd** \\|\\|\n[Optimization](optimization_tutorial.html) \\|\\| [Save & Load\nModel](saveloadrun_tutorial.html)\n\nAutomatic Differentiation with `torch.autograd`\n===============================================\n\nWhen training neural networks, the most frequently used algorithm is\n**back propagation**. In this algorithm, parameters (model weights) are\nadjusted according to the **gradient** of the loss function with respect\nto the given parameter.\n\nTo compute those gradients, PyTorch has a built-in differentiation\nengine called `torch.autograd`. It supports automatic computation of\ngradient for any computational graph.\n\nConsider the simplest one-layer neural network, with input `x`,\nparameters `w` and `b`, and some loss function. It can be defined in\nPyTorch in the following manner:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\n\nx = torch.ones(5)  # input tensor\ny = torch.zeros(3)  # expected output\nw = torch.randn(5, 3, requires_grad=True)\nb = torch.randn(3, requires_grad=True)\nz = torch.matmul(x, w)+b\nloss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Tensors, Functions and Computational graph\n==========================================\n\nThis code defines the following **computational graph**:\n\n![](https://pytorch.org/tutorials/_static/img/basics/comp-graph.png)\n\nIn this network, `w` and `b` are **parameters**, which we need to\noptimize. Thus, we need to be able to compute the gradients of loss\nfunction with respect to those variables. In order to do that, we set\nthe `requires_grad` property of those tensors.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>You can set the value of <code>requires_grad</code> when creating atensor, or later by using <code>x.requires_grad_(True)</code> method.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "A function that we apply to tensors to construct computational graph is\nin fact an object of class `Function`. This object knows how to compute\nthe function in the *forward* direction, and also how to compute its\nderivative during the *backward propagation* step. A reference to the\nbackward propagation function is stored in `grad_fn` property of a\ntensor. You can find more information of `Function` [in the\ndocumentation](https://pytorch.org/docs/stable/autograd.html#function).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Gradient function for z = {z.grad_fn}\")\nprint(f\"Gradient function for loss = {loss.grad_fn}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Computing Gradients\n===================\n\nTo optimize weights of parameters in the neural network, we need to\ncompute the derivatives of our loss function with respect to parameters,\nnamely, we need $\\frac{\\partial loss}{\\partial w}$ and\n$\\frac{\\partial loss}{\\partial b}$ under some fixed values of `x` and\n`y`. To compute those derivatives, we call `loss.backward()`, and then\nretrieve the values from `w.grad` and `b.grad`:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "loss.backward()\nprint(w.grad)\nprint(b.grad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<ul>\n<li>We can only obtain the <code>grad</code> properties for the leafnodes of the computational graph, which have <code>requires_grad</code> propertyset to <code>True</code>. For all other nodes in our graph, gradients will not beavailable.- We can only perform gradient calculations using<code>backward</code> once on a given graph, for performance reasons. If we needto do several <code>backward</code> calls on the same graph, we need to pass<code>retain_graph=True</code> to the <code>backward</code> call.</li>\n</ul>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Disabling Gradient Tracking\n===========================\n\nBy default, all tensors with `requires_grad=True` are tracking their\ncomputational history and support gradient computation. However, there\nare some cases when we do not need to do that, for example, when we have\ntrained the model and just want to apply it to some input data, i.e. we\nonly want to do *forward* computations through the network. We can stop\ntracking computations by surrounding our computation code with\n`torch.no_grad()` block:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "z = torch.matmul(x, w)+b\nprint(z.requires_grad)\n\nwith torch.no_grad():\n    z = torch.matmul(x, w)+b\nprint(z.requires_grad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Another way to achieve the same result is to use the `detach()` method\non the tensor:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "z = torch.matmul(x, w)+b\nz_det = z.detach()\nprint(z_det.requires_grad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "There are reasons you might want to disable gradient tracking:\n\n:   -   To mark some parameters in your neural network as **frozen\n        parameters**.\n    -   To **speed up computations** when you are only doing forward\n        pass, because computations on tensors that do not track\n        gradients would be more efficient.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "More on Computational Graphs\n============================\n\nConceptually, autograd keeps a record of data (tensors) and all executed\noperations (along with the resulting new tensors) in a directed acyclic\ngraph (DAG) consisting of\n[Function](https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)\nobjects. In this DAG, leaves are the input tensors, roots are the output\ntensors. By tracing this graph from roots to leaves, you can\nautomatically compute the gradients using the chain rule.\n\nIn a forward pass, autograd does two things simultaneously:\n\n-   run the requested operation to compute a resulting tensor\n-   maintain the operation's *gradient function* in the DAG.\n\nThe backward pass kicks off when `.backward()` is called on the DAG\nroot. `autograd` then:\n\n-   computes the gradients from each `.grad_fn`,\n-   accumulates them in the respective tensor's `.grad` attribute\n-   using the chain rule, propagates all the way to the leaf tensors.\n\n```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>An important thing to note is that the graph is recreated from scratch; after each<code>.backward()</code> call, autograd starts populating a new graph. This isexactly what allows you to use control flow statements in your model;you can change the shape, size and operations at every iteration ifneeded.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Optional Reading: Tensor Gradients and Jacobian Products\n========================================================\n\nIn many cases, we have a scalar loss function, and we need to compute\nthe gradient with respect to some parameters. However, there are cases\nwhen the output function is an arbitrary tensor. In this case, PyTorch\nallows you to compute so-called **Jacobian product**, and not the actual\ngradient.\n\nFor a vector function $\\vec{y}=f(\\vec{x})$, where\n$\\vec{x}=\\langle x_1,\\dots,x_n\\rangle$ and\n$\\vec{y}=\\langle y_1,\\dots,y_m\\rangle$, a gradient of $\\vec{y}$ with\nrespect to $\\vec{x}$ is given by **Jacobian matrix**:\n\n$$\\begin{aligned}\nJ=\\left(\\begin{array}{ccc}\n   \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n   \\vdots & \\ddots & \\vdots\\\\\n   \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n   \\end{array}\\right)\n\\end{aligned}$$\n\nInstead of computing the Jacobian matrix itself, PyTorch allows you to\ncompute **Jacobian Product** $v^T\\cdot J$ for a given input vector\n$v=(v_1 \\dots v_m)$. This is achieved by calling `backward` with $v$ as\nan argument. The size of $v$ should be the same as the size of the\noriginal tensor, with respect to which we want to compute the product:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "inp = torch.eye(4, 5, requires_grad=True)\nout = (inp+1).pow(2).t()\nout.backward(torch.ones_like(out), retain_graph=True)\nprint(f\"First call\\n{inp.grad}\")\nout.backward(torch.ones_like(out), retain_graph=True)\nprint(f\"\\nSecond call\\n{inp.grad}\")\ninp.grad.zero_()\nout.backward(torch.ones_like(out), retain_graph=True)\nprint(f\"\\nCall after zeroing gradients\\n{inp.grad}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Notice that when we call `backward` for the second time with the same\nargument, the value of the gradient is different. This happens because\nwhen doing `backward` propagation, PyTorch **accumulates the\ngradients**, i.e. the value of computed gradients is added to the `grad`\nproperty of all leaf nodes of computational graph. If you want to\ncompute the proper gradients, you need to zero out the `grad` property\nbefore. In real-life training an *optimizer* helps us to do this.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>Previously we were calling <code>backward()</code> function withoutparameters. This is essentially equivalent to calling<code>backward(torch.tensor(1.0))</code>, which is a useful way to compute thegradients in case of a scalar-valued function, such as loss duringneural network training.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "------------------------------------------------------------------------\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Further Reading\n===============\n\n-   [Autograd\n    Mechanics](https://pytorch.org/docs/stable/notes/autograd.html)\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}