{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "(Prototype) MaskedTensor Advanced Semantics\n===========================================\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Before working on this tutorial, please make sure to review our\n[MaskedTensor Overview tutorial\n\\<https://pytorch.org/tutorials/prototype/maskedtensor\\_overview.html\\>]{.title-ref}.\n\nThe purpose of this tutorial is to help users understand how some of the\nadvanced semantics work and how they came to be. We will focus on two\nparticular ones:\n\n*. Differences between MaskedTensor and \\`NumPy\\'s MaskedArray\n\\<https://numpy.org/doc/stable/reference/maskedarray.html\\>\\`\\_\\_*.\nReduction semantics\n\nPreparation\n===========\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nfrom torch.masked import masked_tensor\nimport numpy as np\nimport warnings\n\n# Disable prototype warnings and such\nwarnings.filterwarnings(action='ignore', category=UserWarning)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "MaskedTensor vs NumPy\\'s MaskedArray\n====================================\n\nNumPy\\'s `MaskedArray` has a few fundamental semantics differences from\nMaskedTensor.\n\n*. Their factory function and basic definition inverts the mask (similar to \\`\\`torch.nn.MHA\\`\\`); that is, MaskedTensor uses \\`\\`True\\`\\` to denote \\\"specified\\\" and \\`\\`False\\`\\` to denote \\\"unspecified\\\", or \\\"valid\\\"/\\\"invalid\\\", whereas NumPy does the opposite. We believe that our mask definition is not only more intuitive, but it also aligns more with the existing semantics in PyTorch as a whole.*. Intersection semantics. In NumPy, if one of two elements are masked out, the resulting element will be\n\n:   masked out as well \\-- in practice, they [apply the logical\\_or\n    operator](https://github.com/numpy/numpy/blob/68299575d8595d904aff6f28e12d21bf6428a4ba/numpy/ma/core.py#L1016-L1024).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = torch.arange(5.)\nmask = torch.tensor([True, True, False, True, False])\nnpm0 = np.ma.masked_array(data.numpy(), (~mask).numpy())\nnpm1 = np.ma.masked_array(data.numpy(), (mask).numpy())\n\nprint(\"npm0:\\n\", npm0)\nprint(\"npm1:\\n\", npm1)\nprint(\"npm0 + npm1:\\n\", npm0 + npm1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Meanwhile, MaskedTensor does not support addition or binary operators\nwith masks that don\\'t match \\--to understand why, please find the\n`section on reductions <reduction-semantics>`{.interpreted-text\nrole=\"ref\"}.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mt0 = masked_tensor(data, mask)\nmt1 = masked_tensor(data, ~mask)\nprint(\"mt0:\\n\", mt0)\nprint(\"mt1:\\n\", mt1)\n\ntry:\n    mt0 + mt1\nexcept ValueError as e:\n    print (\"mt0 + mt1 failed. Error: \", e)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "However, if this behavior is desired, MaskedTensor does support these\nsemantics by giving access to the data and masks and conveniently\nconverting a MaskedTensor to a Tensor with masked values filled in using\n`to_tensor`{.interpreted-text role=\"func\"}. For example:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "t0 = mt0.to_tensor(0)\nt1 = mt1.to_tensor(0)\nmt2 = masked_tensor(t0 + t1, mt0.get_mask() & mt1.get_mask())\n\nprint(\"t0:\\n\", t0)\nprint(\"t1:\\n\", t1)\nprint(\"mt2 (t0 + t1):\\n\", mt2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that the mask is [mt0.get\\_mask() & mt1.get\\_mask()]{.title-ref}\nsince `MaskedTensor`{.interpreted-text role=\"class\"}\\'s mask is the\ninverse of NumPy\\'s.\n\nReduction Semantics\n===================\n\nRecall in [MaskedTensor\\'s Overview\ntutorial](https://pytorch.org/tutorials/prototype/maskedtensor_overview.html)\nwe discussed \\\"Implementing missing torch.nan\\* ops\\\". Those are\nexamples of reductions \\-- operators that remove one (or more)\ndimensions from a Tensor and then aggregate the result. In this section,\nwe will use reduction semantics to motivate our strict requirements\naround matching masks from above.\n\nFundamentally, :class:\\`MaskedTensor\\`s perform the same reduction\noperation while ignoring the masked out (unspecified) values. By way of\nexample:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = torch.arange(12, dtype=torch.float).reshape(3, 4)\nmask = torch.randint(2, (3, 4), dtype=torch.bool)\nmt = masked_tensor(data, mask)\n\nprint(\"data:\\n\", data)\nprint(\"mask:\\n\", mask)\nprint(\"mt:\\n\", mt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, the different reductions (all on dim=1):\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"torch.sum:\\n\", torch.sum(mt, 1))\nprint(\"torch.mean:\\n\", torch.mean(mt, 1))\nprint(\"torch.prod:\\n\", torch.prod(mt, 1))\nprint(\"torch.amin:\\n\", torch.amin(mt, 1))\nprint(\"torch.amax:\\n\", torch.amax(mt, 1))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Of note, the value under a masked out element is not guaranteed to have\nany specific value, especially if the row or column is entirely masked\nout (the same is true for normalizations). For more details on masked\nsemantics, you can find this\n[RFC](https://github.com/pytorch/rfcs/pull/27).\n\nNow, we can revisit the question: why do we enforce the invariant that\nmasks must match for binary operators? In other words, why don\\'t we use\nthe same semantics as `np.ma.masked_array`? Consider the following\nexample:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data0 = torch.arange(10.).reshape(2, 5)\ndata1 = torch.arange(10.).reshape(2, 5) + 10\nmask0 = torch.tensor([[True, True, False, False, False], [False, False, False, True, True]])\nmask1 = torch.tensor([[False, False, False, True, True], [True, True, False, False, False]])\nnpm0 = np.ma.masked_array(data0.numpy(), (mask0).numpy())\nnpm1 = np.ma.masked_array(data1.numpy(), (mask1).numpy())\n\nprint(\"npm0:\", npm0)\nprint(\"npm1:\", npm1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, let\\'s try addition:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"(npm0 + npm1).sum(0):\\n\", (npm0 + npm1).sum(0))\nprint(\"npm0.sum(0) + npm1.sum(0):\\n\", npm0.sum(0) + npm1.sum(0))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Sum and addition should clearly be associative, but with NumPy\\'s\nsemantics, they are not, which can certainly be confusing for the user.\n\n`MaskedTensor`{.interpreted-text role=\"class\"}, on the other hand, will\nsimply not allow this operation since [mask0 != mask1]{.title-ref}. That\nbeing said, if the user wishes, there are ways around this (for example,\nfilling in the MaskedTensor\\'s undefined elements with 0 values using\n`to_tensor`{.interpreted-text role=\"func\"} like shown below), but the\nuser must now be more explicit with their intentions.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mt0 = masked_tensor(data0, ~mask0)\nmt1 = masked_tensor(data1, ~mask1)\n\n(mt0.to_tensor(0) + mt1.to_tensor(0)).sum(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Conclusion\n==========\n\nIn this tutorial, we have learned about the different design decisions\nbehind MaskedTensor and NumPy\\'s MaskedArray, as well as reduction\nsemantics. In general, MaskedTensor is designed to avoid ambiguity and\nconfusing semantics (for example, we try to preserve the associative\nproperty amongst binary operations), which in turn can necessitate the\nuser to be more intentional with their code at times, but we believe\nthis to be the better move. If you have any thoughts on this, please\n[let us know](https://github.com/pytorch/pytorch/issues)!\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}