{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[Learn the Basics](intro.html) \\|\\|\n[Quickstart](quickstart_tutorial.html) \\|\\|\n[Tensors](tensorqs_tutorial.html) \\|\\| [Datasets &\nDataLoaders](data_tutorial.html) \\|\\|\n[Transforms](transforms_tutorial.html) \\|\\| [Build\nModel](buildmodel_tutorial.html) \\|\\|\n[Autograd](autogradqs_tutorial.html) \\|\\| **Optimization** \\|\\| [Save &\nLoad Model](saveloadrun_tutorial.html)\n\nOptimizing Model Parameters\n===========================\n\nNow that we have a model and data it\\'s time to train, validate and test\nour model by optimizing its parameters on our data. Training a model is\nan iterative process; in each iteration the model makes a guess about\nthe output, calculates the error in its guess (*loss*), collects the\nderivatives of the error with respect to its parameters (as we saw in\nthe [previous section](autograd_tutorial.html)), and **optimizes** these\nparameters using gradient descent. For a more detailed walkthrough of\nthis process, check out this video on [backpropagation from\n3Blue1Brown](https://www.youtube.com/watch?v=tIeHLnjs5U8).\n\nPrerequisite Code\n-----------------\n\nWe load the code from the previous sections on [Datasets &\nDataLoaders](data_tutorial.html) and [Build\nModel](buildmodel_tutorial.html).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor\n\ntraining_data = datasets.FashionMNIST(\n    root=\"data\",\n    train=True,\n    download=True,\n    transform=ToTensor()\n)\n\ntest_data = datasets.FashionMNIST(\n    root=\"data\",\n    train=False,\n    download=True,\n    transform=ToTensor()\n)\n\ntrain_dataloader = DataLoader(training_data, batch_size=64)\ntest_dataloader = DataLoader(test_data, batch_size=64)\n\nclass NeuralNetwork(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.flatten = nn.Flatten()\n        self.linear_relu_stack = nn.Sequential(\n            nn.Linear(28*28, 512),\n            nn.ReLU(),\n            nn.Linear(512, 512),\n            nn.ReLU(),\n            nn.Linear(512, 10),\n        )\n\n    def forward(self, x):\n        x = self.flatten(x)\n        logits = self.linear_relu_stack(x)\n        return logits\n\nmodel = NeuralNetwork()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Hyperparameters\n===============\n\nHyperparameters are adjustable parameters that let you control the model\noptimization process. Different hyperparameter values can impact model\ntraining and convergence rates ([read\nmore](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)\nabout hyperparameter tuning)\n\nWe define the following hyperparameters for training:\n\n:   -   **Number of Epochs** - the number of times to iterate over the\n        dataset\n    -   **Batch Size** - the number of data samples propagated through\n        the network before the parameters are updated\n    -   **Learning Rate** - how much to update models parameters at each\n        batch/epoch. Smaller values yield slow learning speed, while\n        large values may result in unpredictable behavior during\n        training.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "learning_rate = 1e-3\nbatch_size = 64\nepochs = 5"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Optimization Loop\n=================\n\nOnce we set our hyperparameters, we can then train and optimize our\nmodel with an optimization loop. Each iteration of the optimization loop\nis called an **epoch**.\n\nEach epoch consists of two main parts:\n\n:   -   **The Train Loop** - iterate over the training dataset and try\n        to converge to optimal parameters.\n    -   **The Validation/Test Loop** - iterate over the test dataset to\n        check if model performance is improving.\n\nLet\\'s briefly familiarize ourselves with some of the concepts used in\nthe training loop. Jump ahead to see the\n`full-impl-label`{.interpreted-text role=\"ref\"} of the optimization\nloop.\n\nLoss Function\n-------------\n\nWhen presented with some training data, our untrained network is likely\nnot to give the correct answer. **Loss function** measures the degree of\ndissimilarity of obtained result to the target value, and it is the loss\nfunction that we want to minimize during training. To calculate the loss\nwe make a prediction using the inputs of our given data sample and\ncompare it against the true data label value.\n\nCommon loss functions include\n[nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss)\n(Mean Square Error) for regression tasks, and\n[nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss)\n(Negative Log Likelihood) for classification.\n[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)\ncombines `nn.LogSoftmax` and `nn.NLLLoss`.\n\nWe pass our model\\'s output logits to `nn.CrossEntropyLoss`, which will\nnormalize the logits and compute the prediction error.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Initialize the loss function\nloss_fn = nn.CrossEntropyLoss()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Optimizer\n=========\n\nOptimization is the process of adjusting model parameters to reduce\nmodel error in each training step. **Optimization algorithms** define\nhow this process is performed (in this example we use Stochastic\nGradient Descent). All optimization logic is encapsulated in the\n`optimizer` object. Here, we use the SGD optimizer; additionally, there\nare many [different\noptimizers](https://pytorch.org/docs/stable/optim.html) available in\nPyTorch such as ADAM and RMSProp, that work better for different kinds\nof models and data.\n\nWe initialize the optimizer by registering the model\\'s parameters that\nneed to be trained, and passing in the learning rate hyperparameter.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Inside the training loop, optimization happens in three steps:\n\n:   -   Call `optimizer.zero_grad()` to reset the gradients of model\n        parameters. Gradients by default add up; to prevent\n        double-counting, we explicitly zero them at each iteration.\n    -   Backpropagate the prediction loss with a call to\n        `loss.backward()`. PyTorch deposits the gradients of the loss\n        w.r.t. each parameter.\n    -   Once we have our gradients, we call `optimizer.step()` to adjust\n        the parameters by the gradients collected in the backward pass.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Full Implementation {#full-impl-label}\n===================\n\nWe define `train_loop` that loops over our optimization code, and\n`test_loop` that evaluates the model\\'s performance against our test\ndata.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def train_loop(dataloader, model, loss_fn, optimizer):\n    size = len(dataloader.dataset)\n    # Set the model to training mode - important for batch normalization and dropout layers\n    # Unnecessary in this situation but added for best practices\n    model.train()\n    for batch, (X, y) in enumerate(dataloader):\n        # Compute prediction and loss\n        pred = model(X)\n        loss = loss_fn(pred, y)\n\n        # Backpropagation\n        loss.backward()\n        optimizer.step()\n        optimizer.zero_grad()\n\n        if batch % 100 == 0:\n            loss, current = loss.item(), batch * batch_size + len(X)\n            print(f\"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]\")\n\n\ndef test_loop(dataloader, model, loss_fn):\n    # Set the model to evaluation mode - important for batch normalization and dropout layers\n    # Unnecessary in this situation but added for best practices\n    model.eval()\n    size = len(dataloader.dataset)\n    num_batches = len(dataloader)\n    test_loss, correct = 0, 0\n\n    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode\n    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True\n    with torch.no_grad():\n        for X, y in dataloader:\n            pred = model(X)\n            test_loss += loss_fn(pred, y).item()\n            correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n\n    test_loss /= num_batches\n    correct /= size\n    print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We initialize the loss function and optimizer, and pass it to\n`train_loop` and `test_loop`. Feel free to increase the number of epochs\nto track the model\\'s improving performance.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "loss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)\n\nepochs = 10\nfor t in range(epochs):\n    print(f\"Epoch {t+1}\\n-------------------------------\")\n    train_loop(train_dataloader, model, loss_fn, optimizer)\n    test_loop(test_dataloader, model, loss_fn)\nprint(\"Done!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Further Reading\n===============\n\n-   [Loss\n    Functions](https://pytorch.org/docs/stable/nn.html#loss-functions)\n-   [torch.optim](https://pytorch.org/docs/stable/optim.html)\n-   [Warmstart Training a\n    Model](https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html)\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}