{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Optional: Data Parallelism\n==========================\n\n**Authors**: [Sung Kim](https://github.com/hunkim) and [Jenny\nKang](https://github.com/jennykang)\n\nIn this tutorial, we will learn how to use multiple GPUs using\n`DataParallel`.\n\nIt\\'s very easy to use GPUs with PyTorch. You can put the model on a\nGPU:\n\n``` {.python}\ndevice = torch.device(\"cuda:0\")\nmodel.to(device)\n```\n\nThen, you can copy all your tensors to the GPU:\n\n``` {.python}\nmytensor = my_tensor.to(device)\n```\n\nPlease note that just calling `my_tensor.to(device)` returns a new copy\nof `my_tensor` on GPU instead of rewriting `my_tensor`. You need to\nassign it to a new tensor and use that tensor on the GPU.\n\nIt\\'s natural to execute your forward, backward propagations on multiple\nGPUs. However, Pytorch will only use one GPU by default. You can easily\nrun your operations on multiple GPUs by making your model run parallelly\nusing `DataParallel`:\n\n``` {.python}\nmodel = nn.DataParallel(model)\n```\n\nThat\\'s the core behind this tutorial. We will explore it in more detail\nbelow.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Imports and parameters\n======================\n\nImport PyTorch modules and define parameters.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nimport torch.nn as nn\nfrom torch.utils.data import Dataset, DataLoader\n\n# Parameters and DataLoaders\ninput_size = 5\noutput_size = 2\n\nbatch_size = 30\ndata_size = 100"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Device\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Dummy DataSet\n=============\n\nMake a dummy (random) dataset. You just need to implement the getitem\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class RandomDataset(Dataset):\n\n    def __init__(self, size, length):\n        self.len = length\n        self.data = torch.randn(length, size)\n\n    def __getitem__(self, index):\n        return self.data[index]\n\n    def __len__(self):\n        return self.len\n\nrand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),\n                         batch_size=batch_size, shuffle=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Simple Model\n============\n\nFor the demo, our model just gets an input, performs a linear operation,\nand gives an output. However, you can use `DataParallel` on any model\n(CNN, RNN, Capsule Net etc.)\n\nWe\\'ve placed a print statement inside the model to monitor the size of\ninput and output tensors. Please pay attention to what is printed at\nbatch rank 0.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Model(nn.Module):\n    # Our model\n\n    def __init__(self, input_size, output_size):\n        super(Model, self).__init__()\n        self.fc = nn.Linear(input_size, output_size)\n\n    def forward(self, input):\n        output = self.fc(input)\n        print(\"\\tIn Model: input size\", input.size(),\n              \"output size\", output.size())\n\n        return output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create Model and DataParallel\n=============================\n\nThis is the core part of the tutorial. First, we need to make a model\ninstance and check if we have multiple GPUs. If we have multiple GPUs,\nwe can wrap our model using `nn.DataParallel`. Then we can put our model\non GPUs by `model.to(device)`\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model = Model(input_size, output_size)\nif torch.cuda.device_count() > 1:\n  print(\"Let's use\", torch.cuda.device_count(), \"GPUs!\")\n  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs\n  model = nn.DataParallel(model)\n\nmodel.to(device)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the Model\n=============\n\nNow we can see the sizes of input and output tensors.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "for data in rand_loader:\n    input = data.to(device)\n    output = model(input)\n    print(\"Outside: input size\", input.size(),\n          \"output_size\", output.size())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Results\n=======\n\nIf you have no GPU or one GPU, when we batch 30 inputs and 30 outputs,\nthe model gets 30 and outputs 30 as expected. But if you have multiple\nGPUs, then you can get results like this.\n\n2 GPUs\n------\n\nIf you have 2, you will see:\n\n``` {.bash}\n# on 2 GPUs\nLet's use 2 GPUs!\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\n    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])\n    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])\nOutside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])\n```\n\n3 GPUs\n------\n\nIf you have 3 GPUs, you will see:\n\n``` {.bash}\nLet's use 3 GPUs!\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\n    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\nOutside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])\n```\n\n8 GPUs\n------\n\nIf you have 8, you will see:\n\n``` {.bash}\nLet's use 8 GPUs!\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\nOutside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\n    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])\nOutside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Summary\n=======\n\nDataParallel splits your data automatically and sends job orders to\nmultiple models on several GPUs. After each model finishes their job,\nDataParallel collects and merges the results before returning it to you.\n\nFor more information, please check out\n[https://pytorch.org/tutorials/beginner/former\\\\\\_torchies/parallelism\\\\\\_tutorial.html](https://pytorch.org/tutorials/beginner/former\\_torchies/parallelism\\_tutorial.html).\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}