{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Learn the Basics](intro.html) \\|\\|\n[Quickstart](quickstart_tutorial.html) \\|\\|\n[Tensors](tensorqs_tutorial.html) \\|\\| [Datasets &\nDataLoaders](data_tutorial.html) \\|\\|\n[Transforms](transforms_tutorial.html) \\|\\| [Build\nModel](buildmodel_tutorial.html) \\|\\| **Autograd** \\|\\|\n[Optimization](optimization_tutorial.html) \\|\\| [Save & Load\nModel](saveloadrun_tutorial.html)\n\nAutomatic Differentiation with `torch.autograd`\n===============================================\n\nWhen training neural networks, the most frequently used algorithm is\n**back propagation**. In this algorithm, parameters (model weights) are\nadjusted according to the **gradient** of the loss function with respect\nto the given parameter.\n\nTo compute those gradients, PyTorch has a built-in differentiation\nengine called `torch.autograd`. It supports automatic computation of\ngradient for any computational graph.\n\nConsider the simplest one-layer neural network, with input `x`,\nparameters `w` and `b`, and some loss function. It can be defined in\nPyTorch in the following manner:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\n\nx = torch.ones(5) # input tensor\ny = torch.zeros(3) # expected output\nw = torch.randn(5, 3, requires_grad=True)\nb = torch.randn(3, requires_grad=True)\nz = torch.matmul(x, w)+b\nloss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tensors, Functions and Computational graph\n==========================================\n\nThis code defines the following **computational graph**:\n\n\n\nIn this network, `w` and `b` are **parameters**, which we need to\noptimize. Thus, we need to be able to compute the gradients of loss\nfunction with respect to those variables. In order to do that, we set\nthe `requires_grad` property of those tensors.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{=html}\n
You can set the value of requires_grad
when creating atensor, or later by using x.requires_grad_(True)
method.
grad
properties for the leafnodes of the computational graph, which have requires_grad
propertyset to True
. For all other nodes in our graph, gradients will not beavailable.- We can only perform gradient calculations usingbackward
once on a given graph, for performance reasons. If we needto do several backward
calls on the same graph, we need to passretain_graph=True
to the backward
call.An important thing to note is that the graph is recreated from scratch; after each.backward()
call, autograd starts populating a new graph. This isexactly what allows you to use control flow statements in your model;you can change the shape, size and operations at every iteration ifneeded.
Previously we were calling backward()
function withoutparameters. This is essentially equivalent to callingbackward(torch.tensor(1.0))
, which is a useful way to compute thegradients in case of a scalar-valued function, such as loss duringneural network training.