{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neural Networks\n===============\n\nNeural networks can be constructed using the `torch.nn` package.\n\nNow that you had a glimpse of `autograd`, `nn` depends on `autograd` to\ndefine models and differentiate them. An `nn.Module` contains layers,\nand a method `forward(input)` that returns the `output`.\n\nFor example, look at this network that classifies digit images:\n\n\n\nIt is a simple feed-forward network. It takes the input, feeds it\nthrough several layers one after the other, and then finally gives the\noutput.\n\nA typical training procedure for a neural network is as follows:\n\n- Define the neural network that has some learnable parameters (or\n weights)\n- Iterate over a dataset of inputs\n- Process input through the network\n- Compute the loss (how far is the output from being correct)\n- Propagate gradients back into the network's parameters\n- Update the weights of the network, typically using a simple update\n rule: `weight = weight - learning_rate * gradient`\n\nDefine the network\n------------------\n\nLet's define this network:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass Net(nn.Module):\n\n def __init__(self):\n super(Net, self).__init__()\n # 1 input image channel, 6 output channels, 5x5 square convolution\n # kernel\n self.conv1 = nn.Conv2d(1, 6, 5)\n self.conv2 = nn.Conv2d(6, 16, 5)\n # an affine operation: y = Wx + b\n self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension \n self.fc2 = nn.Linear(120, 84)\n self.fc3 = nn.Linear(84, 10)\n\n def forward(self, input):\n # Convolution layer C1: 1 input image channel, 6 output channels,\n # 5x5 square convolution, it uses RELU activation function, and\n # outputs a Tensor with size (N, 6, 28, 28), where N is the size of the batch\n c1 = F.relu(self.conv1(input))\n # Subsampling layer S2: 2x2 grid, purely functional,\n # this layer does not have any parameter, and outputs a (N, 6, 14, 14) Tensor\n s2 = F.max_pool2d(c1, (2, 2))\n # Convolution layer C3: 6 input channels, 16 output channels,\n # 5x5 square convolution, it uses RELU activation function, and\n # outputs a (N, 16, 10, 10) Tensor\n c3 = F.relu(self.conv2(s2))\n # Subsampling layer S4: 2x2 grid, purely functional,\n # this layer does not have any parameter, and outputs a (N, 16, 5, 5) Tensor\n s4 = F.max_pool2d(c3, 2)\n # Flatten operation: purely functional, outputs a (N, 400) Tensor\n s4 = torch.flatten(s4, 1)\n # Fully connected layer F5: (N, 400) Tensor input,\n # and outputs a (N, 120) Tensor, it uses RELU activation function\n f5 = F.relu(self.fc1(s4))\n # Fully connected layer F6: (N, 120) Tensor input,\n # and outputs a (N, 84) Tensor, it uses RELU activation function\n f6 = F.relu(self.fc2(f5))\n # Gaussian layer OUTPUT: (N, 84) Tensor input, and\n # outputs a (N, 10) Tensor\n output = self.fc3(f6)\n return output\n\n\nnet = Net()\nprint(net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You just have to define the `forward` function, and the `backward`\nfunction (where gradients are computed) is automatically defined for you\nusing `autograd`. You can use any of the Tensor operations in the\n`forward` function.\n\nThe learnable parameters of a model are returned by `net.parameters()`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "params = list(net.parameters())\nprint(len(params))\nprint(params[0].size()) # conv1's .weight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let\\'s try a random 32x32 input. Note: expected input size of this net\n(LeNet) is 32x32. To use this net on the MNIST dataset, please resize\nthe images from the dataset to 32x32.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "input = torch.randn(1, 1, 32, 32)\nout = net(input)\nprint(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zero the gradient buffers of all parameters and backprops with random\ngradients:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "net.zero_grad()\nout.backward(torch.randn(1, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{=html}\n
torch.nn
only supports mini-batches. The entire torch.nn
package only supports inputs that are a mini-batch of samples, and nota single sample.For example, nn.Conv2d
will take in a 4D Tensor ofnSamples x nChannels x Height x Width
.If you have a single sample, just use input.unsqueeze(0)
to adda fake batch dimension.
Observe how gradient buffers had to be manually set to zero usingoptimizer.zero_grad()
. This is because gradients are accumulatedas explained in the Backprop section.