{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Neural Networks\n===============\n\nNeural networks can be constructed using the `torch.nn` package.\n\nNow that you had a glimpse of `autograd`, `nn` depends on `autograd` to\ndefine models and differentiate them. An `nn.Module` contains layers,\nand a method `forward(input)` that returns the `output`.\n\nFor example, look at this network that classifies digit images:\n\n![convnet](https://pytorch.org/tutorials/_static/img/mnist.png)\n\nIt is a simple feed-forward network. It takes the input, feeds it\nthrough several layers one after the other, and then finally gives the\noutput.\n\nA typical training procedure for a neural network is as follows:\n\n-   Define the neural network that has some learnable parameters (or\n    weights)\n-   Iterate over a dataset of inputs\n-   Process input through the network\n-   Compute the loss (how far is the output from being correct)\n-   Propagate gradients back into the network's parameters\n-   Update the weights of the network, typically using a simple update\n    rule: `weight = weight - learning_rate * gradient`\n\nDefine the network\n------------------\n\nLet's define this network:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass Net(nn.Module):\n\n    def __init__(self):\n        super(Net, self).__init__()\n        # 1 input image channel, 6 output channels, 5x5 square convolution\n        # kernel\n        self.conv1 = nn.Conv2d(1, 6, 5)\n        self.conv2 = nn.Conv2d(6, 16, 5)\n        # an affine operation: y = Wx + b\n        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension \n        self.fc2 = nn.Linear(120, 84)\n        self.fc3 = nn.Linear(84, 10)\n\n    def forward(self, input):\n        # Convolution layer C1: 1 input image channel, 6 output channels,\n        # 5x5 square convolution, it uses RELU activation function, and\n        # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch\n        c1 = F.relu(self.conv1(input))\n        # Subsampling layer S2: 2x2 grid, purely functional,\n        # this layer does not have any parameter, and outputs a (N, 6, 14, 14) Tensor\n        s2 = F.max_pool2d(c1, (2, 2))\n        # Convolution layer C3: 6 input channels, 16 output channels,\n        # 5x5 square convolution, it uses RELU activation function, and\n        # outputs a (N, 16, 10, 10) Tensor\n        c3 = F.relu(self.conv2(s2))\n        # Subsampling layer S4: 2x2 grid, purely functional,\n        # this layer does not have any parameter, and outputs a (N, 16, 5, 5) Tensor\n        s4 = F.max_pool2d(c3, 2)\n        # Flatten operation: purely functional, outputs a (N, 400) Tensor\n        s4 = torch.flatten(s4, 1)\n        # Fully connected layer F5: (N, 400) Tensor input,\n        # and outputs a (N, 120) Tensor, it uses RELU activation function\n        f5 = F.relu(self.fc1(s4))\n        # Fully connected layer F6: (N, 120) Tensor input,\n        # and outputs a (N, 84) Tensor, it uses RELU activation function\n        f6 = F.relu(self.fc2(f5))\n        # Gaussian layer OUTPUT: (N, 84) Tensor input, and\n        # outputs a (N, 10) Tensor\n        output = self.fc3(f6)\n        return output\n\n\nnet = Net()\nprint(net)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You just have to define the `forward` function, and the `backward`\nfunction (where gradients are computed) is automatically defined for you\nusing `autograd`. You can use any of the Tensor operations in the\n`forward` function.\n\nThe learnable parameters of a model are returned by `net.parameters()`\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "params = list(net.parameters())\nprint(len(params))\nprint(params[0].size())  # conv1's .weight"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s try a random 32x32 input. Note: expected input size of this net\n(LeNet) is 32x32. To use this net on the MNIST dataset, please resize\nthe images from the dataset to 32x32.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "input = torch.randn(1, 1, 32, 32)\nout = net(input)\nprint(out)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Zero the gradient buffers of all parameters and backprops with random\ngradients:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "net.zero_grad()\nout.backward(torch.randn(1, 10))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p><code>torch.nn</code> only supports mini-batches. The entire <code>torch.nn</code>package only supports inputs that are a mini-batch of samples, and nota single sample.For example, <code>nn.Conv2d</code> will take in a 4D Tensor of<code>nSamples x nChannels x Height x Width</code>.If you have a single sample, just use <code>input.unsqueeze(0)</code> to adda fake batch dimension.</p>\n```\n```{=html}\n</div>\n```\nBefore proceeding further, let\\'s recap all the classes you've seen so\nfar.\n\n**Recap:**\n\n:   -   `torch.Tensor` - A *multi-dimensional array* with support for\n        autograd operations like `backward()`. Also *holds the gradient*\n        w.r.t. the tensor.\n    -   `nn.Module` - Neural network module. *Convenient way of\n        encapsulating parameters*, with helpers for moving them to GPU,\n        exporting, loading, etc.\n    -   `nn.Parameter` - A kind of Tensor, that is *automatically\n        registered as a parameter when assigned as an attribute to a*\n        `Module`.\n    -   `autograd.Function` - Implements *forward and backward\n        definitions of an autograd operation*. Every `Tensor` operation\n        creates at least a single `Function` node that connects to\n        functions that created a `Tensor` and *encodes its history*.\n\n**At this point, we covered:**\n\n:   -   Defining a neural network\n    -   Processing inputs and calling backward\n\n**Still Left:**\n\n:   -   Computing the loss\n    -   Updating the weights of the network\n\nLoss Function\n=============\n\nA loss function takes the (output, target) pair of inputs, and computes\na value that estimates how far away the output is from the target.\n\nThere are several different [loss\nfunctions](https://pytorch.org/docs/nn.html#loss-functions) under the nn\npackage . A simple loss is: `nn.MSELoss` which computes the mean-squared\nerror between the output and the target.\n\nFor example:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "output = net(input)\ntarget = torch.randn(10)  # a dummy target, for example\ntarget = target.view(1, -1)  # make it the same shape as output\ncriterion = nn.MSELoss()\n\nloss = criterion(output, target)\nprint(loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, if you follow `loss` in the backward direction, using its\n`.grad_fn` attribute, you will see a graph of computations that looks\nlike this:\n\n``` {.sh}\ninput -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n      -> flatten -> linear -> relu -> linear -> relu -> linear\n      -> MSELoss\n      -> loss\n```\n\nSo, when we call `loss.backward()`, the whole graph is differentiated\nw.r.t. the neural net parameters, and all Tensors in the graph that have\n`requires_grad=True` will have their `.grad` Tensor accumulated with the\ngradient.\n\nFor illustration, let us follow a few steps backward:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(loss.grad_fn)  # MSELoss\nprint(loss.grad_fn.next_functions[0][0])  # Linear\nprint(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Backprop\n========\n\nTo backpropagate the error all we have to do is to `loss.backward()`.\nYou need to clear the existing gradients though, else gradients will be\naccumulated to existing gradients.\n\nNow we shall call `loss.backward()`, and have a look at conv1\\'s bias\ngradients before and after the backward.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "net.zero_grad()     # zeroes the gradient buffers of all parameters\n\nprint('conv1.bias.grad before backward')\nprint(net.conv1.bias.grad)\n\nloss.backward()\n\nprint('conv1.bias.grad after backward')\nprint(net.conv1.bias.grad)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we have seen how to use loss functions.\n\n**Read Later:**\n\n> The neural network package contains various modules and loss functions\n> that form the building blocks of deep neural networks. A full list\n> with documentation is [here](https://pytorch.org/docs/nn).\n\n**The only thing left to learn is:**\n\n> -   Updating the weights of the network\n\nUpdate the weights\n==================\n\nThe simplest update rule used in practice is the Stochastic Gradient\nDescent (SGD):\n\n``` {.python}\nweight = weight - learning_rate * gradient\n```\n\nWe can implement this using simple Python code:\n\n``` {.python}\nlearning_rate = 0.01\nfor f in net.parameters():\n    f.data.sub_(f.grad.data * learning_rate)\n```\n\nHowever, as you use neural networks, you want to use various different\nupdate rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable\nthis, we built a small package: `torch.optim` that implements all these\nmethods. Using it is very simple:\n\n``` {.python}\nimport torch.optim as optim\n\n# create your optimizer\noptimizer = optim.SGD(net.parameters(), lr=0.01)\n\n# in your training loop:\noptimizer.zero_grad()   # zero the gradient buffers\noutput = net(input)\nloss = criterion(output, target)\nloss.backward()\noptimizer.step()    # Does the update\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>Observe how gradient buffers had to be manually set to zero using<code>optimizer.zero_grad()</code>. This is because gradients are accumulatedas explained in the <a href=\"\">Backprop</a> section.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}