{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is [torch.nn]{.title-ref} *really*?\n========================================\n\n**Authors:** Jeremy Howard, [fast.ai](https://www.fast.ai). Thanks to\nRachel Thomas and Francisco Ingham.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recommend running this tutorial as a notebook, not a script. To\ndownload the notebook (`.ipynb`) file, click the link at the top of the\npage.\n\nPyTorch provides the elegantly designed modules and classes\n[torch.nn](https://pytorch.org/docs/stable/nn.html) ,\n[torch.optim](https://pytorch.org/docs/stable/optim.html) ,\n[Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset)\n, and\n[DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader)\nto help you create and train neural networks. In order to fully utilize\ntheir power and customize them for your problem, you need to really\nunderstand exactly what they\\'re doing. To develop this understanding,\nwe will first train basic neural net on the MNIST data set without using\nany features from these models; we will initially only use the most\nbasic PyTorch tensor functionality. Then, we will incrementally add one\nfeature from `torch.nn`, `torch.optim`, `Dataset`, or `DataLoader` at a\ntime, showing exactly what each piece does, and how it works to make the\ncode either more concise, or more flexible.\n\n**This tutorial assumes you already have PyTorch installed, and are\nfamiliar with the basics of tensor operations.** (If you\\'re familiar\nwith Numpy array operations, you\\'ll find the PyTorch tensor operations\nused here nearly identical).\n\nMNIST data setup\n================\n\nWe will use the classic\n[MNIST](https://yann.lecun.com/exdb/mnist/index.html) dataset, which\nconsists of black-and-white images of hand-drawn digits (between 0 and\n9).\n\nWe will use [pathlib](https://docs.python.org/3/library/pathlib.html)\nfor dealing with paths (part of the Python 3 standard library), and will\ndownload the dataset using\n[requests](http://docs.python-requests.org/en/master/). We will only\nimport modules when we use them, so you can see exactly what\\'s being\nused at each point.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from pathlib import Path\nimport requests\n\nDATA_PATH = Path(\"data\")\nPATH = DATA_PATH / \"mnist\"\n\nPATH.mkdir(parents=True, exist_ok=True)\n\nURL = \"https://github.com/pytorch/tutorials/raw/main/_static/\"\nFILENAME = \"mnist.pkl.gz\"\n\nif not (PATH / FILENAME).exists():\n content = requests.get(URL + FILENAME).content\n (PATH / FILENAME).open(\"wb\").write(content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset is in numpy array format, and has been stored using pickle,\na python-specific format for serializing data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pickle\nimport gzip\n\nwith gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each image is 28 x 28, and is being stored as a flattened row of length\n784 (=28x28). Let\\'s take a look at one; we need to reshape it to 2d\nfirst.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from matplotlib import pyplot\nimport numpy as np\n\npyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n# ``pyplot.show()`` only if not on Colab\ntry:\n import google.colab\nexcept ImportError:\n pyplot.show()\nprint(x_train.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch uses `torch.tensor`, rather than numpy arrays, so we need to\nconvert our data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\n\nx_train, y_train, x_valid, y_valid = map(\n torch.tensor, (x_train, y_train, x_valid, y_valid)\n)\nn, c = x_train.shape\nprint(x_train, y_train)\nprint(x_train.shape)\nprint(y_train.min(), y_train.max())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neural net from scratch (without `torch.nn`)\n============================================\n\nLet\\'s first create a model using nothing but PyTorch tensor operations.\nWe\\'re assuming you\\'re already familiar with the basics of neural\nnetworks. (If you\\'re not, you can learn them at\n[course.fast.ai](https://course.fast.ai)).\n\nPyTorch provides methods to create random or zero-filled tensors, which\nwe will use to create our weights and bias for a simple linear model.\nThese are just regular tensors, with one very special addition: we tell\nPyTorch that they require a gradient. This causes PyTorch to record all\nof the operations done on the tensor, so that it can calculate the\ngradient during back-propagation *automatically*!\n\nFor the weights, we set `requires_grad` **after** the initialization,\nsince we don\\'t want that step included in the gradient. (Note that a\ntrailing `_` in PyTorch signifies that the operation is performed\nin-place.)\n\n```{=html}\n
We are initializing the weights here withXavier initialisation(by multiplying with 1/sqrt(n)
).
You can use the standard python debugger to step through PyTorchcode, allowing you to check the various variable values at each step.Uncomment set_trace()
below to try it out.
nn.Module
(uppercase M) is a PyTorch specific concept, and is aclass we'll be using a lot. nn.Module
is not to be confused with the Pythonconcept of a (lowercase m
) module,which is a file of Python code that can be imported.