{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "What is [torch.nn]{.title-ref} *really*?\n========================================\n\n**Authors:** Jeremy Howard, [fast.ai](https://www.fast.ai). Thanks to\nRachel Thomas and Francisco Ingham.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We recommend running this tutorial as a notebook, not a script. To\ndownload the notebook (`.ipynb`) file, click the link at the top of the\npage.\n\nPyTorch provides the elegantly designed modules and classes\n[torch.nn](https://pytorch.org/docs/stable/nn.html) ,\n[torch.optim](https://pytorch.org/docs/stable/optim.html) ,\n[Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset)\n, and\n[DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader)\nto help you create and train neural networks. In order to fully utilize\ntheir power and customize them for your problem, you need to really\nunderstand exactly what they\\'re doing. To develop this understanding,\nwe will first train basic neural net on the MNIST data set without using\nany features from these models; we will initially only use the most\nbasic PyTorch tensor functionality. Then, we will incrementally add one\nfeature from `torch.nn`, `torch.optim`, `Dataset`, or `DataLoader` at a\ntime, showing exactly what each piece does, and how it works to make the\ncode either more concise, or more flexible.\n\n**This tutorial assumes you already have PyTorch installed, and are\nfamiliar with the basics of tensor operations.** (If you\\'re familiar\nwith Numpy array operations, you\\'ll find the PyTorch tensor operations\nused here nearly identical).\n\nMNIST data setup\n================\n\nWe will use the classic\n[MNIST](https://yann.lecun.com/exdb/mnist/index.html) dataset, which\nconsists of black-and-white images of hand-drawn digits (between 0 and\n9).\n\nWe will use [pathlib](https://docs.python.org/3/library/pathlib.html)\nfor dealing with paths (part of the Python 3 standard library), and will\ndownload the dataset using\n[requests](http://docs.python-requests.org/en/master/). We will only\nimport modules when we use them, so you can see exactly what\\'s being\nused at each point.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pathlib import Path\nimport requests\n\nDATA_PATH = Path(\"data\")\nPATH = DATA_PATH / \"mnist\"\n\nPATH.mkdir(parents=True, exist_ok=True)\n\nURL = \"https://github.com/pytorch/tutorials/raw/main/_static/\"\nFILENAME = \"mnist.pkl.gz\"\n\nif not (PATH / FILENAME).exists():\n        content = requests.get(URL + FILENAME).content\n        (PATH / FILENAME).open(\"wb\").write(content)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This dataset is in numpy array format, and has been stored using pickle,\na python-specific format for serializing data.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pickle\nimport gzip\n\nwith gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Each image is 28 x 28, and is being stored as a flattened row of length\n784 (=28x28). Let\\'s take a look at one; we need to reshape it to 2d\nfirst.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from matplotlib import pyplot\nimport numpy as np\n\npyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n# ``pyplot.show()`` only if not on Colab\ntry:\n    import google.colab\nexcept ImportError:\n    pyplot.show()\nprint(x_train.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "PyTorch uses `torch.tensor`, rather than numpy arrays, so we need to\nconvert our data.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\n\nx_train, y_train, x_valid, y_valid = map(\n    torch.tensor, (x_train, y_train, x_valid, y_valid)\n)\nn, c = x_train.shape\nprint(x_train, y_train)\nprint(x_train.shape)\nprint(y_train.min(), y_train.max())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Neural net from scratch (without `torch.nn`)\n============================================\n\nLet\\'s first create a model using nothing but PyTorch tensor operations.\nWe\\'re assuming you\\'re already familiar with the basics of neural\nnetworks. (If you\\'re not, you can learn them at\n[course.fast.ai](https://course.fast.ai)).\n\nPyTorch provides methods to create random or zero-filled tensors, which\nwe will use to create our weights and bias for a simple linear model.\nThese are just regular tensors, with one very special addition: we tell\nPyTorch that they require a gradient. This causes PyTorch to record all\nof the operations done on the tensor, so that it can calculate the\ngradient during back-propagation *automatically*!\n\nFor the weights, we set `requires_grad` **after** the initialization,\nsince we don\\'t want that step included in the gradient. (Note that a\ntrailing `_` in PyTorch signifies that the operation is performed\nin-place.)\n\n```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>We are initializing the weights here with<a href=\"http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf\">Xavier initialisation</a>(by multiplying with <code>1/sqrt(n)</code>).</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import math\n\nweights = torch.randn(784, 10) / math.sqrt(784)\nweights.requires_grad_()\nbias = torch.zeros(10, requires_grad=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Thanks to PyTorch\\'s ability to calculate gradients automatically, we\ncan use any standard Python function (or callable object) as a model! So\nlet\\'s just write a plain matrix multiplication and broadcasted addition\nto create a simple linear model. We also need an activation function, so\nwe\\'ll write [log\\_softmax]{.title-ref} and use it. Remember: although\nPyTorch provides lots of prewritten loss functions, activation\nfunctions, and so forth, you can easily write your own using plain\npython. PyTorch will even create fast accelerator or vectorized CPU code\nfor your function automatically.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def log_softmax(x):\n    return x - x.exp().sum(-1).log().unsqueeze(-1)\n\ndef model(xb):\n    return log_softmax(xb @ weights + bias)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In the above, the `@` stands for the matrix multiplication operation. We\nwill call our function on one batch of data (in this case, 64 images).\nThis is one *forward pass*. Note that our predictions won\\'t be any\nbetter than random at this stage, since we start with random weights.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "bs = 64  # batch size\n\nxb = x_train[0:bs]  # a mini-batch from x\npreds = model(xb)  # predictions\npreds[0], preds.shape\nprint(preds[0], preds.shape)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As you see, the `preds` tensor contains not only the tensor values, but\nalso a gradient function. We\\'ll use this later to do backprop.\n\nLet\\'s implement negative log-likelihood to use as the loss function\n(again, we can just use standard Python):\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def nll(input, target):\n    return -input[range(target.shape[0]), target].mean()\n\nloss_func = nll"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s check our loss with our random model, so we can see if we improve\nafter a backprop pass later.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "yb = y_train[0:bs]\nprint(loss_func(preds, yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s also implement a function to calculate the accuracy of our model.\nFor each prediction, if the index with the largest value matches the\ntarget value, then the prediction was correct.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def accuracy(out, yb):\n    preds = torch.argmax(out, dim=1)\n    return (preds == yb).float().mean()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s check the accuracy of our random model, so we can see if our\naccuracy improves as our loss improves.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(accuracy(preds, yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can now run a training loop. For each iteration, we will:\n\n-   select a mini-batch of data (of size `bs`)\n-   use the model to make predictions\n-   calculate the loss\n-   `loss.backward()` updates the gradients of the model, in this case,\n    `weights` and `bias`.\n\nWe now use these gradients to update the weights and bias. We do this\nwithin the `torch.no_grad()` context manager, because we do not want\nthese actions to be recorded for our next calculation of the gradient.\nYou can read more about how PyTorch\\'s Autograd records operations\n[here](https://pytorch.org/docs/stable/notes/autograd.html).\n\nWe then set the gradients to zero, so that we are ready for the next\nloop. Otherwise, our gradients would record a running tally of all the\noperations that had happened (i.e. `loss.backward()` *adds* the\ngradients to whatever is already stored, rather than replacing them).\n\n```{=html}\n<div style=\"background-color: #6bcebb; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>TIP:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>You can use the standard python debugger to step through PyTorchcode, allowing you to check the various variable values at each step.Uncomment <code>set_trace()</code> below to try it out.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from IPython.core.debugger import set_trace\n\nlr = 0.5  # learning rate\nepochs = 2  # how many epochs to train for\n\nfor epoch in range(epochs):\n    for i in range((n - 1) // bs + 1):\n        #         set_trace()\n        start_i = i * bs\n        end_i = start_i + bs\n        xb = x_train[start_i:end_i]\n        yb = y_train[start_i:end_i]\n        pred = model(xb)\n        loss = loss_func(pred, yb)\n\n        loss.backward()\n        with torch.no_grad():\n            weights -= weights.grad * lr\n            bias -= bias.grad * lr\n            weights.grad.zero_()\n            bias.grad.zero_()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "That\\'s it: we\\'ve created and trained a minimal neural network (in this\ncase, a logistic regression, since we have no hidden layers) entirely\nfrom scratch!\n\nLet\\'s check the loss and accuracy and compare those to what we got\nearlier. We expect that the loss will have decreased and accuracy to\nhave increased, and they have.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Using `torch.nn.functional`\n===========================\n\nWe will now refactor our code, so that it does the same thing as before,\nonly we\\'ll start taking advantage of PyTorch\\'s `nn` classes to make it\nmore concise and flexible. At each step from here, we should be making\nour code one or more of: shorter, more understandable, and/or more\nflexible.\n\nThe first and easiest step is to make our code shorter by replacing our\nhand-written activation and loss functions with those from\n`torch.nn.functional` (which is generally imported into the namespace\n`F` by convention). This module contains all the functions in the\n`torch.nn` library (whereas other parts of the library contain classes).\nAs well as a wide range of loss and activation functions, you\\'ll also\nfind here some convenient functions for creating neural nets, such as\npooling functions. (There are also functions for doing convolutions,\nlinear layers, etc, but as we\\'ll see, these are usually better handled\nusing other parts of the library.)\n\nIf you\\'re using negative log likelihood loss and log softmax\nactivation, then Pytorch provides a single function `F.cross_entropy`\nthat combines the two. So we can even remove the activation function\nfrom our model.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch.nn.functional as F\n\nloss_func = F.cross_entropy\n\ndef model(xb):\n    return xb @ weights + bias"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that we no longer call `log_softmax` in the `model` function.\nLet\\'s confirm that our loss and accuracy are the same as before:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Refactor using `nn.Module`\n==========================\n\nNext up, we\\'ll use `nn.Module` and `nn.Parameter`, for a clearer and\nmore concise training loop. We subclass `nn.Module` (which itself is a\nclass and able to keep track of state). In this case, we want to create\na class that holds our weights, bias, and method for the forward step.\n`nn.Module` has a number of attributes and methods (such as\n`.parameters()` and `.zero_grad()`) which we will be using.\n\n```{=html}\n<div style=\"background-color: #54c7ec; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>NOTE:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p><code>nn.Module</code> (uppercase M) is a PyTorch specific concept, and is aclass we'll be using a lot. <code>nn.Module</code> is not to be confused with the Pythonconcept of a (lowercase <code>m</code>) <a href=\"https://docs.python.org/3/tutorial/modules.html\">module</a>,which is a file of Python code that can be imported.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from torch import nn\n\nclass Mnist_Logistic(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n        self.bias = nn.Parameter(torch.zeros(10))\n\n    def forward(self, xb):\n        return xb @ self.weights + self.bias"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Since we\\'re now using an object instead of just using a function, we\nfirst have to instantiate our model:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = Mnist_Logistic()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can calculate the loss in the same way as before. Note that\n`nn.Module` objects are used as if they are functions (i.e they are\n*callable*), but behind the scenes Pytorch will call our `forward`\nmethod automatically.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Previously for our training loop we had to update the values for each\nparameter by name, and manually zero out the grads for each parameter\nseparately, like this:\n\n``` {.python}\nwith torch.no_grad():\n    weights -= weights.grad * lr\n    bias -= bias.grad * lr\n    weights.grad.zero_()\n    bias.grad.zero_()\n```\n\nNow we can take advantage of model.parameters() and model.zero\\_grad()\n(which are both defined by PyTorch for `nn.Module`) to make those steps\nmore concise and less prone to the error of forgetting some of our\nparameters, particularly if we had a more complicated model:\n\n``` {.python}\nwith torch.no_grad():\n    for p in model.parameters(): p -= p.grad * lr\n    model.zero_grad()\n```\n\nWe\\'ll wrap our little training loop in a `fit` function so we can run\nit again later.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def fit():\n    for epoch in range(epochs):\n        for i in range((n - 1) // bs + 1):\n            start_i = i * bs\n            end_i = start_i + bs\n            xb = x_train[start_i:end_i]\n            yb = y_train[start_i:end_i]\n            pred = model(xb)\n            loss = loss_func(pred, yb)\n\n            loss.backward()\n            with torch.no_grad():\n                for p in model.parameters():\n                    p -= p.grad * lr\n                model.zero_grad()\n\nfit()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s double-check that our loss has gone down:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Refactor using `nn.Linear`\n==========================\n\nWe continue to refactor our code. Instead of manually defining and\ninitializing `self.weights` and `self.bias`, and calculating\n`xb  @ self.weights + self.bias`, we will instead use the Pytorch class\n[nn.Linear](https://pytorch.org/docs/stable/nn.html#linear-layers) for a\nlinear layer, which does all that for us. Pytorch has many types of\npredefined layers that can greatly simplify our code, and often makes it\nfaster too.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Mnist_Logistic(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.lin = nn.Linear(784, 10)\n\n    def forward(self, xb):\n        return self.lin(xb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We instantiate our model and calculate the loss in the same way as\nbefore:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = Mnist_Logistic()\nprint(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We are still able to use our same `fit` method as before.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fit()\n\nprint(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Refactor using `torch.optim`\n============================\n\nPytorch also has a package with various optimization algorithms,\n`torch.optim`. We can use the `step` method from our optimizer to take a\nforward step, instead of manually updating each parameter.\n\nThis will let us replace our previous manually coded optimization step:\n\n``` {.python}\nwith torch.no_grad():\n    for p in model.parameters(): p -= p.grad * lr\n    model.zero_grad()\n```\n\nand instead use just:\n\n``` {.python}\nopt.step()\nopt.zero_grad()\n```\n\n(`optim.zero_grad()` resets the gradient to 0 and we need to call it\nbefore computing the gradient for the next minibatch.)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from torch import optim"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We\\'ll define a little function to create our model and optimizer so we\ncan reuse it in the future.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def get_model():\n    model = Mnist_Logistic()\n    return model, optim.SGD(model.parameters(), lr=lr)\n\nmodel, opt = get_model()\nprint(loss_func(model(xb), yb))\n\nfor epoch in range(epochs):\n    for i in range((n - 1) // bs + 1):\n        start_i = i * bs\n        end_i = start_i + bs\n        xb = x_train[start_i:end_i]\n        yb = y_train[start_i:end_i]\n        pred = model(xb)\n        loss = loss_func(pred, yb)\n\n        loss.backward()\n        opt.step()\n        opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Refactor using Dataset\n======================\n\nPyTorch has an abstract Dataset class. A Dataset can be anything that\nhas a `__len__` function (called by Python\\'s standard `len` function)\nand a `__getitem__` function as a way of indexing into it. [This\ntutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html)\nwalks through a nice example of creating a custom\n`FacialLandmarkDataset` class as a subclass of `Dataset`.\n\nPyTorch\\'s\n[TensorDataset](https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#TensorDataset)\nis a Dataset wrapping tensors. By defining a length and way of indexing,\nthis also gives us a way to iterate, index, and slice along the first\ndimension of a tensor. This will make it easier to access both the\nindependent and dependent variables in the same line as we train.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from torch.utils.data import TensorDataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Both `x_train` and `y_train` can be combined in a single\n`TensorDataset`, which will be easier to iterate over and slice.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "train_ds = TensorDataset(x_train, y_train)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Previously, we had to iterate through minibatches of `x` and `y` values\nseparately:\n\n``` {.python}\nxb = x_train[start_i:end_i]\nyb = y_train[start_i:end_i]\n```\n\nNow, we can do these two steps together:\n\n``` {.python}\nxb,yb = train_ds[i*bs : i*bs+bs]\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model, opt = get_model()\n\nfor epoch in range(epochs):\n    for i in range((n - 1) // bs + 1):\n        xb, yb = train_ds[i * bs: i * bs + bs]\n        pred = model(xb)\n        loss = loss_func(pred, yb)\n\n        loss.backward()\n        opt.step()\n        opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Refactor using `DataLoader`\n===========================\n\nPyTorch\\'s `DataLoader` is responsible for managing batches. You can\ncreate a `DataLoader` from any `Dataset`. `DataLoader` makes it easier\nto iterate over batches. Rather than having to use\n`train_ds[i*bs : i*bs+bs]`, the `DataLoader` gives us each minibatch\nautomatically.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from torch.utils.data import DataLoader\n\ntrain_ds = TensorDataset(x_train, y_train)\ntrain_dl = DataLoader(train_ds, batch_size=bs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Previously, our loop iterated over batches `(xb, yb)` like this:\n\n``` {.python}\nfor i in range((n-1)//bs + 1):\n    xb,yb = train_ds[i*bs : i*bs+bs]\n    pred = model(xb)\n```\n\nNow, our loop is much cleaner, as `(xb, yb)` are loaded automatically\nfrom the data loader:\n\n``` {.python}\nfor xb,yb in train_dl:\n    pred = model(xb)\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model, opt = get_model()\n\nfor epoch in range(epochs):\n    for xb, yb in train_dl:\n        pred = model(xb)\n        loss = loss_func(pred, yb)\n\n        loss.backward()\n        opt.step()\n        opt.zero_grad()\n\nprint(loss_func(model(xb), yb))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Thanks to PyTorch\\'s `nn.Module`, `nn.Parameter`, `Dataset`, and\n`DataLoader`, our training loop is now dramatically smaller and easier\nto understand. Let\\'s now try to add the basic features necessary to\ncreate effective models in practice.\n\nAdd validation\n==============\n\nIn section 1, we were just trying to get a reasonable training loop set\nup for use on our training data. In reality, you **always** should also\nhave a [validation\nset](https://www.fast.ai/2017/11/13/validation-sets/), in order to\nidentify if you are overfitting.\n\nShuffling the training data is\n[important](https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks)\nto prevent correlation between batches and overfitting. On the other\nhand, the validation loss will be identical whether we shuffle the\nvalidation set or not. Since shuffling takes extra time, it makes no\nsense to shuffle the validation data.\n\nWe\\'ll use a batch size for the validation set that is twice as large as\nthat for the training set. This is because the validation set does not\nneed backpropagation and thus takes less memory (it doesn\\'t need to\nstore the gradients). We take advantage of this to use a larger batch\nsize and compute the loss more quickly.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "train_ds = TensorDataset(x_train, y_train)\ntrain_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n\nvalid_ds = TensorDataset(x_valid, y_valid)\nvalid_dl = DataLoader(valid_ds, batch_size=bs * 2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will calculate and print the validation loss at the end of each\nepoch.\n\n(Note that we always call `model.train()` before training, and\n`model.eval()` before inference, because these are used by layers such\nas `nn.BatchNorm2d` and `nn.Dropout` to ensure appropriate behavior for\nthese different phases.)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model, opt = get_model()\n\nfor epoch in range(epochs):\n    model.train()\n    for xb, yb in train_dl:\n        pred = model(xb)\n        loss = loss_func(pred, yb)\n\n        loss.backward()\n        opt.step()\n        opt.zero_grad()\n\n    model.eval()\n    with torch.no_grad():\n        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n\n    print(epoch, valid_loss / len(valid_dl))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create fit() and get\\_data()\n============================\n\nWe\\'ll now do a little refactoring of our own. Since we go through a\nsimilar process twice of calculating the loss for both the training set\nand the validation set, let\\'s make that into its own function,\n`loss_batch`, which computes the loss for one batch.\n\nWe pass an optimizer in for the training set, and use it to perform\nbackprop. For the validation set, we don\\'t pass an optimizer, so the\nmethod doesn\\'t perform backprop.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def loss_batch(model, loss_func, xb, yb, opt=None):\n    loss = loss_func(model(xb), yb)\n\n    if opt is not None:\n        loss.backward()\n        opt.step()\n        opt.zero_grad()\n\n    return loss.item(), len(xb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`fit` runs the necessary operations to train our model and compute the\ntraining and validation losses for each epoch.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n\ndef fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n    for epoch in range(epochs):\n        model.train()\n        for xb, yb in train_dl:\n            loss_batch(model, loss_func, xb, yb, opt)\n\n        model.eval()\n        with torch.no_grad():\n            losses, nums = zip(\n                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n            )\n        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n\n        print(epoch, val_loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`get_data` returns dataloaders for the training and validation sets.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def get_data(train_ds, valid_ds, bs):\n    return (\n        DataLoader(train_ds, batch_size=bs, shuffle=True),\n        DataLoader(valid_ds, batch_size=bs * 2),\n    )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, our whole process of obtaining the data loaders and fitting the\nmodel can be run in 3 lines of code:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\nmodel, opt = get_model()\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can use these basic 3 lines of code to train a wide variety of\nmodels. Let\\'s see if we can use them to train a convolutional neural\nnetwork (CNN)!\n\nSwitch to CNN\n=============\n\nWe are now going to build our neural network with three convolutional\nlayers. Because none of the functions in the previous section assume\nanything about the model form, we\\'ll be able to use them to train a CNN\nwithout any modification.\n\nWe will use PyTorch\\'s predefined\n[Conv2d](https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d) class\nas our convolutional layer. We define a CNN with 3 convolutional layers.\nEach convolution is followed by a ReLU. At the end, we perform an\naverage pooling. (Note that `view` is PyTorch\\'s version of Numpy\\'s\n`reshape`)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Mnist_CNN(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n\n    def forward(self, xb):\n        xb = xb.view(-1, 1, 28, 28)\n        xb = F.relu(self.conv1(xb))\n        xb = F.relu(self.conv2(xb))\n        xb = F.relu(self.conv3(xb))\n        xb = F.avg_pool2d(xb, 4)\n        return xb.view(-1, xb.size(1))\n\nlr = 0.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[Momentum](https://cs231n.github.io/neural-networks-3/#sgd) is a\nvariation on stochastic gradient descent that takes previous updates\ninto account as well and generally leads to faster training.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = Mnist_CNN()\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Using `nn.Sequential`\n=====================\n\n`torch.nn` has another handy class we can use to simplify our code:\n[Sequential](https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential)\n. A `Sequential` object runs each of the modules contained within it, in\na sequential manner. This is a simpler way of writing our neural\nnetwork.\n\nTo take advantage of this, we need to be able to easily define a\n**custom layer** from a given function. For instance, PyTorch doesn\\'t\nhave a [view]{.title-ref} layer, and we need to create one for our\nnetwork. `Lambda` will create a layer that we can then use when defining\na network with `Sequential`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Lambda(nn.Module):\n    def __init__(self, func):\n        super().__init__()\n        self.func = func\n\n    def forward(self, x):\n        return self.func(x)\n\n\ndef preprocess(x):\n    return x.view(-1, 1, 28, 28)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The model created with `Sequential` is simple:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = nn.Sequential(\n    Lambda(preprocess),\n    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.AvgPool2d(4),\n    Lambda(lambda x: x.view(x.size(0), -1)),\n)\n\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n\nfit(epochs, model, loss_func, opt, train_dl, valid_dl)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Wrapping `DataLoader`\n=====================\n\nOur CNN is fairly concise, but it only works with MNIST, because:\n\n:   -   It assumes the input is a 28\\*28 long vector\n    -   It assumes that the final CNN grid size is 4\\*4 (since that\\'s\n        the average pooling kernel size we used)\n\nLet\\'s get rid of these two assumptions, so our model works with any 2d\nsingle channel image. First, we can remove the initial Lambda layer by\nmoving the data preprocessing into a generator:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def preprocess(x, y):\n    return x.view(-1, 1, 28, 28), y\n\n\nclass WrappedDataLoader:\n    def __init__(self, dl, func):\n        self.dl = dl\n        self.func = func\n\n    def __len__(self):\n        return len(self.dl)\n\n    def __iter__(self):\n        for b in self.dl:\n            yield (self.func(*b))\n\ntrain_dl, valid_dl = get_data(train_ds, valid_ds, bs)\ntrain_dl = WrappedDataLoader(train_dl, preprocess)\nvalid_dl = WrappedDataLoader(valid_dl, preprocess)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Next, we can replace `nn.AvgPool2d` with `nn.AdaptiveAvgPool2d`, which\nallows us to define the size of the *output* tensor we want, rather than\nthe *input* tensor we have. As a result, our model will work with any\nsize input.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = nn.Sequential(\n    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n    nn.ReLU(),\n    nn.AdaptiveAvgPool2d(1),\n    Lambda(lambda x: x.view(x.size(0), -1)),\n)\n\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s try it out:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Using your\n[Accelerator](https://pytorch.org/docs/stable/torch.html#accelerators)\n\\-\\-\\-\\-\\-\\-\\-\\-\\-\\-\\-\\-\\-\\--\n\nIf you\\'re lucky enough to have access to an accelerator such as CUDA\n(you can rent one for about \\$0.50/hour from most cloud providers) you\ncan use it to speed up your code. First check that your accelerator is\nworking in Pytorch:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# If the current accelerator is available, we will use it. Otherwise, we use the CPU.\ndevice = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else \"cpu\"\nprint(f\"Using {device} device\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let\\'s update `preprocess` to move batches to the accelerator:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def preprocess(x, y):\n    return x.view(-1, 1, 28, 28).to(device), y.to(device)\n\n\ntrain_dl, valid_dl = get_data(train_ds, valid_ds, bs)\ntrain_dl = WrappedDataLoader(train_dl, preprocess)\nvalid_dl = WrappedDataLoader(valid_dl, preprocess)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can move our model to the accelerator.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model.to(device)\nopt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You should find it runs faster now:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Closing thoughts\n================\n\nWe now have a general data pipeline and training loop which you can use\nfor training many types of models using Pytorch. To see how simple\ntraining a model can now be, take a look at the [mnist\\_sample\nnotebook](https://github.com/fastai/fastai_dev/blob/master/dev_nb/mnist_sample.ipynb).\n\nOf course, there are many things you\\'ll want to add, such as data\naugmentation, hyperparameter tuning, monitoring training, transfer\nlearning, and so forth. These features are available in the fastai\nlibrary, which has been developed using the same design approach shown\nin this tutorial, providing a natural next step for practitioners\nlooking to take their models further.\n\nWe promised at the start of this tutorial we\\'d explain through example\neach of `torch.nn`, `torch.optim`, `Dataset`, and `DataLoader`. So\nlet\\'s summarize what we\\'ve seen:\n\n> -   `torch.nn`:\n>     -   `Module`: creates a callable which behaves like a function,\n>         but can also contain state(such as neural net layer weights).\n>         It knows what `Parameter` (s) it contains and can zero all\n>         their gradients, loop through them for weight updates, etc.\n>     -   `Parameter`: a wrapper for a tensor that tells a `Module` that\n>         it has weights that need updating during backprop. Only\n>         tensors with the [requires\\_grad]{.title-ref} attribute set\n>         are updated\n>     -   `functional`: a module(usually imported into the `F` namespace\n>         by convention) which contains activation functions, loss\n>         functions, etc, as well as non-stateful versions of layers\n>         such as convolutional and linear layers.\n> -   `torch.optim`: Contains optimizers such as `SGD`, which update the\n>     weights of `Parameter` during the backward step\n> -   `Dataset`: An abstract interface of objects with a `__len__` and a\n>     `__getitem__`, including classes provided with Pytorch such as\n>     `TensorDataset`\n> -   `DataLoader`: Takes any `Dataset` and creates an iterator which\n>     returns batches of data.\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}