{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neural Networks\n===============\n\nNeural networks can be constructed using the `torch.nn` package.\n\nNow that you had a glimpse of `autograd`, `nn` depends on `autograd` to\ndefine models and differentiate them. An `nn.Module` contains layers,\nand a method `forward(input)` that returns the `output`.\n\nFor example, look at this network that classifies digit images:\n\n![convnet](https://pytorch.org/tutorials/_static/img/mnist.png)\n\nIt is a simple feed-forward network. It takes the input, feeds it\nthrough several layers one after the other, and then finally gives the\noutput.\n\nA typical training procedure for a neural network is as follows:\n\n- Define the neural network that has some learnable parameters (or\n weights)\n- Iterate over a dataset of inputs\n- Process input through the network\n- Compute the loss (how far is the output from being correct)\n- Propagate gradients back into the network's parameters\n- Update the weights of the network, typically using a simple update\n rule: `weight = weight - learning_rate * gradient`\n\nDefine the network\n------------------\n\nLet's define this network:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass Net(nn.Module):\n\n def __init__(self):\n super(Net, self).__init__()\n # 1 input image channel, 6 output channels, 5x5 square convolution\n # kernel\n self.conv1 = nn.Conv2d(1, 6, 5)\n self.conv2 = nn.Conv2d(6, 16, 5)\n # an affine operation: y = Wx + b\n self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension \n self.fc2 = nn.Linear(120, 84)\n self.fc3 = nn.Linear(84, 10)\n\n def forward(self, input):\n # Convolution layer C1: 1 input image channel, 6 output channels,\n # 5x5 square convolution, it uses RELU activation function, and\n # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch\n c1 = F.relu(self.conv1(input))\n # Subsampling layer S2: 2x2 grid, purely functional,\n # this layer does not have any parameter, and outputs a (N, 6, 14, 14) Tensor\n s2 = F.max_pool2d(c1, (2, 2))\n # Convolution layer C3: 6 input channels, 16 output channels,\n # 5x5 square convolution, it uses RELU activation function, and\n # outputs a (N, 16, 10, 10) Tensor\n c3 = F.relu(self.conv2(s2))\n # Subsampling layer S4: 2x2 grid, purely functional,\n # this layer does not have any parameter, and outputs a (N, 16, 5, 5) Tensor\n s4 = F.max_pool2d(c3, 2)\n # Flatten operation: purely functional, outputs a (N, 400) Tensor\n s4 = torch.flatten(s4, 1)\n # Fully connected layer F5: (N, 400) Tensor input,\n # and outputs a (N, 120) Tensor, it uses RELU activation function\n f5 = F.relu(self.fc1(s4))\n # Fully connected layer F6: (N, 120) Tensor input,\n # and outputs a (N, 84) Tensor, it uses RELU activation function\n f6 = F.relu(self.fc2(f5))\n # Gaussian layer OUTPUT: (N, 84) Tensor input, and\n # outputs a (N, 10) Tensor\n output = self.fc3(f6)\n return output\n\n\nnet = Net()\nprint(net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You just have to define the `forward` function, and the `backward`\nfunction (where gradients are computed) is automatically defined for you\nusing `autograd`. You can use any of the Tensor operations in the\n`forward` function.\n\nThe learnable parameters of a model are returned by `net.parameters()`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "params = list(net.parameters())\nprint(len(params))\nprint(params[0].size()) # conv1's .weight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let\\'s try a random 32x32 input. Note: expected input size of this net\n(LeNet) is 32x32. To use this net on the MNIST dataset, please resize\nthe images from the dataset to 32x32.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "input = torch.randn(1, 1, 32, 32)\nout = net(input)\nprint(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zero the gradient buffers of all parameters and backprops with random\ngradients:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "net.zero_grad()\nout.backward(torch.randn(1, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{=html}\n
NOTE:
\n```\n```{=html}\n
\n```\n```{=html}\n

torch.nn only supports mini-batches. The entire torch.nnpackage only supports inputs that are a mini-batch of samples, and nota single sample.For example, nn.Conv2d will take in a 4D Tensor ofnSamples x nChannels x Height x Width.If you have a single sample, just use input.unsqueeze(0) to adda fake batch dimension.

\n```\n```{=html}\n
\n```\nBefore proceeding further, let\\'s recap all the classes you've seen so\nfar.\n\n**Recap:**\n\n: - `torch.Tensor` - A *multi-dimensional array* with support for\n autograd operations like `backward()`. Also *holds the gradient*\n w.r.t. the tensor.\n - `nn.Module` - Neural network module. *Convenient way of\n encapsulating parameters*, with helpers for moving them to GPU,\n exporting, loading, etc.\n - `nn.Parameter` - A kind of Tensor, that is *automatically\n registered as a parameter when assigned as an attribute to a*\n `Module`.\n - `autograd.Function` - Implements *forward and backward\n definitions of an autograd operation*. Every `Tensor` operation\n creates at least a single `Function` node that connects to\n functions that created a `Tensor` and *encodes its history*.\n\n**At this point, we covered:**\n\n: - Defining a neural network\n - Processing inputs and calling backward\n\n**Still Left:**\n\n: - Computing the loss\n - Updating the weights of the network\n\nLoss Function\n=============\n\nA loss function takes the (output, target) pair of inputs, and computes\na value that estimates how far away the output is from the target.\n\nThere are several different [loss\nfunctions](https://pytorch.org/docs/nn.html#loss-functions) under the nn\npackage . A simple loss is: `nn.MSELoss` which computes the mean-squared\nerror between the output and the target.\n\nFor example:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "output = net(input)\ntarget = torch.randn(10) # a dummy target, for example\ntarget = target.view(1, -1) # make it the same shape as output\ncriterion = nn.MSELoss()\n\nloss = criterion(output, target)\nprint(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, if you follow `loss` in the backward direction, using its\n`.grad_fn` attribute, you will see a graph of computations that looks\nlike this:\n\n``` {.sh}\ninput -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n -> flatten -> linear -> relu -> linear -> relu -> linear\n -> MSELoss\n -> loss\n```\n\nSo, when we call `loss.backward()`, the whole graph is differentiated\nw.r.t. the neural net parameters, and all Tensors in the graph that have\n`requires_grad=True` will have their `.grad` Tensor accumulated with the\ngradient.\n\nFor illustration, let us follow a few steps backward:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(loss.grad_fn) # MSELoss\nprint(loss.grad_fn.next_functions[0][0]) # Linear\nprint(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Backprop\n========\n\nTo backpropagate the error all we have to do is to `loss.backward()`.\nYou need to clear the existing gradients though, else gradients will be\naccumulated to existing gradients.\n\nNow we shall call `loss.backward()`, and have a look at conv1\\'s bias\ngradients before and after the backward.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "net.zero_grad() # zeroes the gradient buffers of all parameters\n\nprint('conv1.bias.grad before backward')\nprint(net.conv1.bias.grad)\n\nloss.backward()\n\nprint('conv1.bias.grad after backward')\nprint(net.conv1.bias.grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we have seen how to use loss functions.\n\n**Read Later:**\n\n> The neural network package contains various modules and loss functions\n> that form the building blocks of deep neural networks. A full list\n> with documentation is [here](https://pytorch.org/docs/nn).\n\n**The only thing left to learn is:**\n\n> - Updating the weights of the network\n\nUpdate the weights\n==================\n\nThe simplest update rule used in practice is the Stochastic Gradient\nDescent (SGD):\n\n``` {.python}\nweight = weight - learning_rate * gradient\n```\n\nWe can implement this using simple Python code:\n\n``` {.python}\nlearning_rate = 0.01\nfor f in net.parameters():\n f.data.sub_(f.grad.data * learning_rate)\n```\n\nHowever, as you use neural networks, you want to use various different\nupdate rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable\nthis, we built a small package: `torch.optim` that implements all these\nmethods. Using it is very simple:\n\n``` {.python}\nimport torch.optim as optim\n\n# create your optimizer\noptimizer = optim.SGD(net.parameters(), lr=0.01)\n\n# in your training loop:\noptimizer.zero_grad() # zero the gradient buffers\noutput = net(input)\nloss = criterion(output, target)\nloss.backward()\noptimizer.step() # Does the update\n```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{=html}\n
NOTE:
\n```\n```{=html}\n
\n```\n```{=html}\n

Observe how gradient buffers had to be manually set to zero usingoptimizer.zero_grad(). This is because gradients are accumulatedas explained in the Backprop section.

\n```\n```{=html}\n
\n```\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }