{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "(Prototype) MaskedTensor Overview\n=================================\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This tutorial is designed to serve as a starting point for using\nMaskedTensors and discuss its masking semantics.\n\nMaskedTensor serves as an extension to `torch.Tensor`{.interpreted-text\nrole=\"class\"} that provides the user with the ability to:\n\n-   use any masked semantics (for example, variable length tensors,\n    nan\\* operators, etc.)\n-   differentiation between 0 and NaN gradients\n-   various sparse applications (see tutorial below)\n\nFor a more detailed introduction on what MaskedTensors are, please find\nthe [torch.masked\ndocumentation](https://pytorch.org/docs/master/masked.html).\n\nUsing MaskedTensor\n==================\n\nIn this section we discuss how to use MaskedTensor including how to\nconstruct, access, the data and mask, as well as indexing and slicing.\n\nPreparation\n-----------\n\nWe\\'ll begin by doing the necessary setup for the tutorial:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nfrom torch.masked import masked_tensor, as_masked_tensor\nimport warnings\n\n# Disable prototype warnings and such\nwarnings.filterwarnings(action='ignore', category=UserWarning)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Construction\n============\n\nThere are a few different ways to construct a MaskedTensor:\n\n-   The first way is to directly invoke the MaskedTensor class\n-   The second (and our recommended way) is to use\n    `masked.masked_tensor`{.interpreted-text role=\"func\"} and\n    `masked.as_masked_tensor`{.interpreted-text role=\"func\"} factory\n    functions, which are analogous to `torch.tensor`{.interpreted-text\n    role=\"func\"} and `torch.as_tensor`{.interpreted-text role=\"func\"}\n\nThroughout this tutorial, we will be assuming the import line: [from\ntorch.masked import masked\\_tensor]{.title-ref}.\n\nAccessing the data and mask\n===========================\n\nThe underlying fields in a MaskedTensor can be accessed through:\n\n-   the `MaskedTensor.get_data`{.interpreted-text role=\"meth\"} function\n-   the `MaskedTensor.get_mask`{.interpreted-text role=\"meth\"} function.\n    Recall that `True` indicates \\\"specified\\\" or \\\"valid\\\" while\n    `False` indicates \\\"unspecified\\\" or \\\"invalid\\\".\n\nIn general, the underlying data that is returned may not be valid in the\nunspecified entries, so we recommend that when users require a Tensor\nwithout any masked entries, that they use\n`MaskedTensor.to_tensor`{.interpreted-text role=\"meth\"} (as shown above)\nto return a Tensor with filled values.\n\nIndexing and slicing\n====================\n\n`MaskedTensor`{.interpreted-text role=\"class\"} is a Tensor subclass,\nwhich means that it inherits the same semantics for indexing and slicing\nas `torch.Tensor`{.interpreted-text role=\"class\"}. Below are some\nexamples of common indexing and slicing patterns:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = torch.arange(24).reshape(2, 3, 4)\nmask = data % 2 == 0\n\nprint(\"data:\\n\", data)\nprint(\"mask:\\n\", mask)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# float is used for cleaner visualization when being printed\nmt = masked_tensor(data.float(), mask)\n\nprint(\"mt[0]:\\n\", mt[0])\nprint(\"mt[:, :, 2:4]:\\n\", mt[:, :, 2:4])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Why is MaskedTensor useful?\n===========================\n\nBecause of `MaskedTensor`{.interpreted-text role=\"class\"}\\'s treatment\nof specified and unspecified values as a first-class citizen instead of\nan afterthought (with filled values, nans, etc.), it is able to solve\nfor several of the shortcomings that regular Tensors are unable to;\nindeed, `MaskedTensor`{.interpreted-text role=\"class\"} was born in a\nlarge part due to these recurring issues.\n\nBelow, we will discuss some of the most common issues that are still\nunresolved in PyTorch today and illustrate how\n`MaskedTensor`{.interpreted-text role=\"class\"} can solve these problems.\n\nDistinguishing between 0 and NaN gradient\n-----------------------------------------\n\nOne issue that `torch.Tensor`{.interpreted-text role=\"class\"} runs into\nis the inability to distinguish between gradients that are undefined\n(NaN) vs. gradients that are actually 0. Because PyTorch does not have a\nway of marking a value as specified/valid vs. unspecified/invalid, it is\nforced to rely on NaN or 0 (depending on the use case), leading to\nunreliable semantics since many operations aren\\'t meant to handle NaN\nvalues properly. What is even more confusing is that sometimes depending\non the order of operations, the gradient could vary (for example,\ndepending on how early in the chain of operations a NaN value\nmanifests).\n\n`MaskedTensor`{.interpreted-text role=\"class\"} is the perfect solution\nfor this!\n\n### torch.where\n\nIn [Issue 10729](https://github.com/pytorch/pytorch/issues/10729), we\nnotice a case where the order of operations can matter when using\n`torch.where`{.interpreted-text role=\"func\"} because we have trouble\ndifferentiating between if the 0 is a real 0 or one from undefined\ngradients. Therefore, we remain consistent and mask out the results:\n\nCurrent result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100], requires_grad=True, dtype=torch.float)\ny = torch.where(x < 0, torch.exp(x), torch.ones_like(x))\ny.sum().backward()\nx.grad"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`MaskedTensor`{.interpreted-text role=\"class\"} result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.tensor([-10., -5, 0, 5, 10, 50, 60, 70, 80, 90, 100])\nmask = x < 0\nmx = masked_tensor(x, mask, requires_grad=True)\nmy = masked_tensor(torch.ones_like(x), ~mask, requires_grad=True)\ny = torch.where(mask, torch.exp(mx), my)\ny.sum().backward()\nmx.grad"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The gradient here is only provided to the selected subset. Effectively,\nthis changes the gradient of [where]{.title-ref} to mask out elements\ninstead of setting them to zero.\n\nAnother torch.where\n===================\n\n[Issue 52248](https://github.com/pytorch/pytorch/issues/52248) is\nanother example.\n\nCurrent result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "a = torch.randn((), requires_grad=True)\nb = torch.tensor(False)\nc = torch.ones(())\nprint(\"torch.where(b, a/0, c):\\n\", torch.where(b, a/0, c))\nprint(\"torch.autograd.grad(torch.where(b, a/0, c), a):\\n\", torch.autograd.grad(torch.where(b, a/0, c), a))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`MaskedTensor`{.interpreted-text role=\"class\"} result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "a = masked_tensor(torch.randn(()), torch.tensor(True), requires_grad=True)\nb = torch.tensor(False)\nc = torch.ones(())\nprint(\"torch.where(b, a/0, c):\\n\", torch.where(b, a/0, c))\nprint(\"torch.autograd.grad(torch.where(b, a/0, c), a):\\n\", torch.autograd.grad(torch.where(b, a/0, c), a))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This issue is similar (and even links to the next issue below) in that\nit expresses frustration with unexpected behavior because of the\ninability to differentiate \\\"no gradient\\\" vs \\\"zero gradient\\\", which\nin turn makes working with other ops difficult to reason about.\n\nWhen using mask, x/0 yields NaN grad\n====================================\n\nIn [Issue 4132](https://github.com/pytorch/pytorch/issues/4132), the\nuser proposes that [x.grad]{.title-ref} should be [\\[0, 1\\]]{.title-ref}\ninstead of the [\\[nan, 1\\]]{.title-ref}, whereas\n`MaskedTensor`{.interpreted-text role=\"class\"} makes this very clear by\nmasking out the gradient altogether.\n\nCurrent result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.tensor([1., 1.], requires_grad=True)\ndiv = torch.tensor([0., 1.])\ny = x/div # => y is [inf, 1]\nmask = (div != 0)  # => mask is [0, 1]\ny[mask].backward()\nx.grad"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`MaskedTensor`{.interpreted-text role=\"class\"} result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.tensor([1., 1.], requires_grad=True)\ndiv = torch.tensor([0., 1.])\ny = x/div # => y is [inf, 1]\nmask = (div != 0) # => mask is [0, 1]\nloss = as_masked_tensor(y, mask)\nloss.sum().backward()\nx.grad"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`torch.nansum`{.interpreted-text role=\"func\"} and `torch.nanmean`{.interpreted-text role=\"func\"}\n================================================================================================\n\nIn [Issue 67180](https://github.com/pytorch/pytorch/issues/67180), the\ngradient isn\\'t calculate properly (a longstanding issue), whereas\n`MaskedTensor`{.interpreted-text role=\"class\"} handles it correctly.\n\nCurrent result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "a = torch.tensor([1., 2., float('nan')])\nb = torch.tensor(1.0, requires_grad=True)\nc = a * b\nc1 = torch.nansum(c)\nbgrad1, = torch.autograd.grad(c1, b, retain_graph=True)\nbgrad1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`MaskedTensor`{.interpreted-text role=\"class\"} result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "a = torch.tensor([1., 2., float('nan')])\nb = torch.tensor(1.0, requires_grad=True)\nmt = masked_tensor(a, ~torch.isnan(a))\nc = mt * b\nc1 = torch.sum(c)\nbgrad1, = torch.autograd.grad(c1, b, retain_graph=True)\nbgrad1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Safe Softmax\n============\n\nSafe softmax is another great example of [an\nissue](https://github.com/pytorch/pytorch/issues/55056) that arises\nfrequently. In a nutshell, if there is an entire batch that is \\\"masked\nout\\\" or consists entirely of padding (which, in the softmax case,\ntranslates to being set [-inf]{.title-ref}), then this will result in\nNaNs, which can lead to training divergence.\n\nLuckily, `MaskedTensor`{.interpreted-text role=\"class\"} has solved this\nissue. Consider this setup:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = torch.randn(3, 3)\nmask = torch.tensor([[True, False, False], [True, False, True], [False, False, False]])\nx = data.masked_fill(~mask, float('-inf'))\nmt = masked_tensor(data, mask)\nprint(\"x:\\n\", x)\nprint(\"mt:\\n\", mt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For example, we want to calculate the softmax along [dim=0]{.title-ref}.\nNote that the second column is \\\"unsafe\\\" (i.e. entirely masked out), so\nwhen the softmax is calculated, the result will yield [0/0 =\nnan]{.title-ref} since [exp(-inf) = 0]{.title-ref}. However, what we\nwould really like is for the gradients to be masked out since they are\nunspecified and would be invalid for training.\n\nPyTorch result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x.softmax(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "`MaskedTensor`{.interpreted-text role=\"class\"} result:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mt.softmax(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Implementing missing torch.nan\\* operators\n==========================================\n\nIn [Issue 61474](https://github.com/pytorch/pytorch/issues/61474), there\nis a request to add additional operators to cover the various\n[torch.nan\\*]{.title-ref} applications, such as `torch.nanmax`,\n`torch.nanmin`, etc.\n\nIn general, these problems lend themselves more naturally to masked\nsemantics, so instead of introducing additional operators, we propose\nusing `MaskedTensor`{.interpreted-text role=\"class\"} instead. Since\n[nanmean has already\nlanded](https://github.com/pytorch/pytorch/issues/21987), we can use it\nas a comparison point:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.arange(16).float()\ny = x * x.fmod(4)\nz = y.masked_fill(y == 0, float('nan'))  # we want to get the mean of y when ignoring the zeros"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"y:\\n\", y)\n# z is just y with the zeros replaced with nan's\nprint(\"z:\\n\", z)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"y.mean():\\n\", y.mean())\nprint(\"z.nanmean():\\n\", z.nanmean())\n# MaskedTensor successfully ignores the 0's\nprint(\"torch.mean(masked_tensor(y, y != 0)):\\n\", torch.mean(masked_tensor(y, y != 0)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In the above example, we\\'ve constructed a [y]{.title-ref} and would\nlike to calculate the mean of the series while ignoring the zeros.\n[torch.nanmean]{.title-ref} can be used to do this, but we don\\'t have\nimplementations for the rest of the [torch.nan\\*]{.title-ref}\noperations. `MaskedTensor`{.interpreted-text role=\"class\"} solves this\nissue by being able to use the base operation, and we already have\nsupport for the other operations listed in the issue. For example:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "torch.argmin(masked_tensor(y, y != 0))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Indeed, the index of the minimum argument when ignoring the 0\\'s is the\n1 in index 1.\n\n`MaskedTensor`{.interpreted-text role=\"class\"} can also support\nreductions when the data is fully masked out, which is equivalent to the\ncase above when the data Tensor is completely `nan`. `nanmean` would\nreturn `nan` (an ambiguous return value), while MaskedTensor would more\naccurately indicate a masked out result.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "x = torch.empty(16).fill_(float('nan'))\nprint(\"x:\\n\", x)\nprint(\"torch.nanmean(x):\\n\", torch.nanmean(x))\nprint(\"torch.nanmean via maskedtensor:\\n\", torch.mean(masked_tensor(x, ~torch.isnan(x))))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This is a similar problem to safe softmax where [0/0 = nan]{.title-ref}\nwhen what we really want is an undefined value.\n\nConclusion\n==========\n\nIn this tutorial, we\\'ve introduced what MaskedTensors are, demonstrated\nhow to use them, and motivated their value through a series of examples\nand issues that they\\'ve helped resolve.\n\nFurther Reading\n===============\n\nTo continue learning more, you can find our [MaskedTensor Sparsity\ntutorial](https://pytorch.org/tutorials/prototype/maskedtensor_sparsity.html)\nto see how MaskedTensor enables sparsity and the different storage\nformats we currently support.\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}