{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch Numeric Suite Tutorial\n==============================\n\nIntroduction\n------------\n\nQuantization is good when it works, but it's difficult to know what\\'s\nwrong when it doesn\\'t satisfy the accuracy we expect. Debugging the\naccuracy issue of quantization is not easy and time consuming.\n\nOne important step of debugging is to measure the statistics of the\nfloat model and its corresponding quantized model to know where are they\ndiffer most. We built a suite of numeric tools called PyTorch Numeric\nSuite in PyTorch quantization to enable the measurement of the\nstatistics between quantized module and float module to support\nquantization debugging efforts. Even for the quantized model with good\naccuracy, PyTorch Numeric Suite can still be used as the profiling tool\nto better understand the quantization error within the model and provide\nthe guidance for further optimization.\n\nPyTorch Numeric Suite currently supports models quantized through both\nstatic quantization and dynamic quantization with unified APIs.\n\nIn this tutorial we will first use ResNet18 as an example to show how to\nuse PyTorch Numeric Suite to measure the statistics between static\nquantized model and float model in eager mode. Then we will use LSTM\nbased sequence model as an example to show the usage of PyTorch Numeric\nSuite for dynamic quantized model.\n\nNumeric Suite for Static Quantization\n-------------------------------------\n\n### Setup\n\nWe'll start by doing the necessary imports:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\nimport torch\nimport torch.nn as nn\nimport torchvision\nfrom torchvision import models, datasets\nimport torchvision.transforms as transforms\nimport os\nimport torch.quantization\nimport torch.quantization._numeric_suite as ns\nfrom torch.quantization import (\n default_eval_fn,\n default_qconfig,\n quantize,\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we load the pretrained float ResNet18 model, and quantize it into\nqmodel. We cannot compare two arbitrary models, only a float model and\nthe quantized model derived from it can be compared.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "float_model = torchvision.models.quantization.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1, quantize=False)\nfloat_model.to('cpu')\nfloat_model.eval()\nfloat_model.fuse_model()\nfloat_model.qconfig = torch.quantization.default_qconfig\nimg_data = [(torch.rand(2, 3, 10, 10, dtype=torch.float), torch.randint(0, 1, (2,), dtype=torch.long)) for _ in range(2)]\nqmodel = quantize(float_model, default_eval_fn, [img_data], inplace=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Compare the weights of float and quantized models\n====================================================\n\nThe first thing we usually want to compare are the weights of quantized\nmodel and float model. We can call `compare_weights()` from PyTorch\nNumeric Suite to get a dictionary `wt_compare_dict` with key\ncorresponding to module names and each entry is a dictionary with two\nkeys \\'float\\' and \\'quantized\\', containing the float and quantized\nweights. `compare_weights()` takes in floating point and quantized state\ndict and returns a dict, with keys corresponding to the floating point\nweights and values being a dictionary of floating point and quantized\nweights\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "wt_compare_dict = ns.compare_weights(float_model.state_dict(), qmodel.state_dict())\n\nprint('keys of wt_compare_dict:')\nprint(wt_compare_dict.keys())\n\nprint(\"\\nkeys of wt_compare_dict entry for conv1's weight:\")\nprint(wt_compare_dict['conv1.weight'].keys())\nprint(wt_compare_dict['conv1.weight']['float'].shape)\nprint(wt_compare_dict['conv1.weight']['quantized'].shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once get `wt_compare_dict`, users can process this dictionary in\nwhatever way they want. Here as an example we compute the quantization\nerror of the weights of float and quantized models as following. Compute\nthe Signal-to-Quantization-Noise Ratio (SQNR) of the quantized tensor\n`y`. The SQNR reflects the relationship between the maximum nominal\nsignal strength and the quantization error introduced in the\nquantization. Higher SQNR corresponds to lower quantization error.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def compute_error(x, y):\n Ps = torch.norm(x)\n Pn = torch.norm(x-y)\n return 20*torch.log10(Ps/Pn)\n\nfor key in wt_compare_dict:\n print(key, compute_error(wt_compare_dict[key]['float'], wt_compare_dict[key]['quantized'].dequantize()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As another example `wt_compare_dict` can also be used to plot the\nhistogram of the weights of floating point and quantized models.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nf = wt_compare_dict['conv1.weight']['float'].flatten()\nplt.hist(f, bins = 100)\nplt.title(\"Floating point model weights of conv1\")\nplt.show()\n\nq = wt_compare_dict['conv1.weight']['quantized'].flatten().dequantize()\nplt.hist(q, bins = 100)\nplt.title(\"Quantized model weights of conv1\")\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Compare float point and quantized models at corresponding locations\n======================================================================\n\nThe second tool allows for comparison of weights and activations between\nfloat and quantized models at corresponding locations for the same input\nas shown in the figure below. Red arrows indicate the locations of the\ncomparison.\n\n![](https://pytorch.org/tutorials/_static/img/compare_output.png)\n\nWe call `compare_model_outputs()` from PyTorch Numeric Suite to get the\nactivations in float model and quantized model at corresponding\nlocations for the given input data. This API returns a dict with module\nnames being keys. Each entry is itself a dict with two keys \\'float\\'\nand \\'quantized\\' containing the activations.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = img_data[0][0]\n\n# Take in floating point and quantized model as well as input data, and returns a dict, with keys\n# corresponding to the quantized module names and each entry being a dictionary with two keys 'float' and\n# 'quantized', containing the activations of floating point and quantized model at matching locations.\nact_compare_dict = ns.compare_model_outputs(float_model, qmodel, data)\n\nprint('keys of act_compare_dict:')\nprint(act_compare_dict.keys())\n\nprint(\"\\nkeys of act_compare_dict entry for conv1's output:\")\nprint(act_compare_dict['conv1.stats'].keys())\nprint(act_compare_dict['conv1.stats']['float'][0].shape)\nprint(act_compare_dict['conv1.stats']['quantized'][0].shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dict can be used to compare and compute the quantization error of\nthe activations of float and quantized models as following.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in act_compare_dict:\n print(key, compute_error(act_compare_dict[key]['float'][0], act_compare_dict[key]['quantized'][0].dequantize()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to do the comparison for more than one input data, we can do\nthe following. Prepare the model by attaching the logger to both\nfloating point module and quantized module if they are in the\n`white_list`. Default logger is `OutputLogger`, and default white\\_list\nis `DEFAULT_NUMERIC_SUITE_COMPARE_MODEL_OUTPUT_WHITE_LIST`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ns.prepare_model_outputs(float_model, qmodel)\n\nfor data in img_data:\n float_model(data[0])\n qmodel(data[0])\n\n# Find the matching activation between floating point and quantized modules, and return a dict with key\n# corresponding to quantized module names and each entry being a dictionary with two keys 'float'\n# and 'quantized', containing the matching floating point and quantized activations logged by the logger\nact_compare_dict = ns.get_matching_activations(float_model, qmodel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default logger used in above APIs is `OutputLogger`, which is used\nto log the outputs of the modules. We can inherit from base `Logger`\nclass and create our own logger to perform different functionalities.\nFor example we can make a new `MyOutputLogger` class as below.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class MyOutputLogger(ns.Logger):\n r\"\"\"Customized logger class\n \"\"\"\n\n def __init__(self):\n super(MyOutputLogger, self).__init__()\n\n def forward(self, x):\n # Custom functionalities\n # ...\n return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then we can pass this logger into above APIs such as:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = img_data[0][0]\nact_compare_dict = ns.compare_model_outputs(float_model, qmodel, data, logger_cls=MyOutputLogger)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ns.prepare_model_outputs(float_model, qmodel, MyOutputLogger)\nfor data in img_data:\n float_model(data[0])\n qmodel(data[0])\nact_compare_dict = ns.get_matching_activations(float_model, qmodel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Compare a module in a quantized model with its float point equivalent, with the same input data\n==================================================================================================\n\nThe third tool allows for comparing a quantized module in a model with\nits float point counterpart, feeding both of them the same input and\ncomparing their outputs as shown below.\n\n![](https://pytorch.org/tutorials/_static/img/compare_stub.png)\n\nIn practice we call prepare\\_model\\_with\\_stubs() to swap the quantized\nmodule that we want to compare with the Shadow module, which is\nillustrated as below:\n\n![](https://pytorch.org/tutorials/_static/img/shadow.png)\n\nThe Shadow module takes quantized module, float module and logger as\ninput, and creates a forward path inside to make the float module to\nshadow quantized module sharing the same input tensor.\n\nThe logger can be customizable, default logger is `ShadowLogger` and it\nwill save the outputs of the quantized module and float module that can\nbe used to compute the module level quantization error.\n\nNotice before each call of `compare_model_outputs()` and\n`compare_model_stub()` we need to have clean float and quantized model.\nThis is because `compare_model_outputs()` and `compare_model_stub()`\nmodify float and quantized model inplace, and it will cause unexpected\nresults if call one right after another.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "float_model = torchvision.models.quantization.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1, quantize=False)\nfloat_model.to('cpu')\nfloat_model.eval()\nfloat_model.fuse_model()\nfloat_model.qconfig = torch.quantization.default_qconfig\nimg_data = [(torch.rand(2, 3, 10, 10, dtype=torch.float), torch.randint(0, 1, (2,), dtype=torch.long)) for _ in range(2)]\nqmodel = quantize(float_model, default_eval_fn, [img_data], inplace=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following example we call `compare_model_stub()` from PyTorch\nNumeric Suite to compare `QuantizableBasicBlock` module with its float\npoint equivalent. This API returns a dict with key corresponding to\nmodule names and each entry being a dictionary with two keys \\'float\\'\nand \\'quantized\\', containing the output tensors of quantized and its\nmatching float shadow module.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = img_data[0][0]\nmodule_swap_list = [torchvision.models.quantization.resnet.QuantizableBasicBlock]\n\n# Takes in floating point and quantized model as well as input data, and returns a dict with key\n# corresponding to module names and each entry being a dictionary with two keys 'float' and\n# 'quantized', containing the output tensors of quantized module and its matching floating point shadow module.\nob_dict = ns.compare_model_stub(float_model, qmodel, module_swap_list, data)\n\nprint('keys of ob_dict:')\nprint(ob_dict.keys())\n\nprint(\"\\nkeys of ob_dict entry for layer1.0's output:\")\nprint(ob_dict['layer1.0.stats'].keys())\nprint(ob_dict['layer1.0.stats']['float'][0].shape)\nprint(ob_dict['layer1.0.stats']['quantized'][0].shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dict can be then used to compare and compute the module level\nquantization error.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in ob_dict:\n print(key, compute_error(ob_dict[key]['float'][0], ob_dict[key]['quantized'][0].dequantize()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to do the comparison for more than one input data, we can do\nthe following.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ns.prepare_model_with_stubs(float_model, qmodel, module_swap_list, ns.ShadowLogger)\nfor data in img_data:\n qmodel(data[0])\nob_dict = ns.get_logger_dict(qmodel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default logger used in above APIs is `ShadowLogger`, which is used\nto log the outputs of the quantized module and its matching float shadow\nmodule. We can inherit from base `Logger` class and create our own\nlogger to perform different functionalities. For example we can make a\nnew `MyShadowLogger` class as below.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class MyShadowLogger(ns.Logger):\n r\"\"\"Customized logger class\n \"\"\"\n\n def __init__(self):\n super(MyShadowLogger, self).__init__()\n\n def forward(self, x, y):\n # Custom functionalities\n # ...\n return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And then we can pass this logger into above APIs such as:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = img_data[0][0]\nob_dict = ns.compare_model_stub(float_model, qmodel, module_swap_list, data, logger_cls=MyShadowLogger)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ns.prepare_model_with_stubs(float_model, qmodel, module_swap_list, MyShadowLogger)\nfor data in img_data:\n qmodel(data[0])\nob_dict = ns.get_logger_dict(qmodel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numeric Suite for Dynamic Quantization\n======================================\n\nNumeric Suite APIs are designed in such as way that they work for both\ndynamic quantized model and static quantized model. We will use a model\nwith both LSTM and Linear modules to demonstrate the usage of Numeric\nSuite on dynamic quantized model. This model is the same one used in the\ntutorial of dynamic quantization on LSTM word language model \\[1\\].\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup\n=====\n\nFirst we define the model as below. Notice that within this model only\n`nn.LSTM` and `nn.Linear` modules will be quantized dynamically and\n`nn.Embedding` will remain as floating point module after quantization.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class LSTMModel(nn.Module):\n \"\"\"Container module with an encoder, a recurrent module, and a decoder.\"\"\"\n\n def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5):\n super(LSTMModel, self).__init__()\n self.encoder = nn.Embedding(ntoken, ninp)\n self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)\n self.decoder = nn.Linear(nhid, ntoken)\n\n self.init_weights()\n\n self.nhid = nhid\n self.nlayers = nlayers\n\n def init_weights(self):\n initrange = 0.1\n self.encoder.weight.data.uniform_(-initrange, initrange)\n self.decoder.bias.data.zero_()\n self.decoder.weight.data.uniform_(-initrange, initrange)\n\n def forward(self, input, hidden):\n emb = self.encoder(input)\n output, hidden = self.rnn(emb, hidden)\n decoded = self.decoder(output)\n return decoded, hidden\n\n def init_hidden(self, bsz):\n weight = next(self.parameters())\n return (weight.new_zeros(self.nlayers, bsz, self.nhid),\n weight.new_zeros(self.nlayers, bsz, self.nhid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we create the `float_model` and quantize it into qmodel.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ntokens = 10\n\nfloat_model = LSTMModel(\n ntoken = ntokens,\n ninp = 512,\n nhid = 256,\n nlayers = 5,\n)\n\nfloat_model.eval()\n\nqmodel = torch.quantization.quantize_dynamic(\n float_model, {nn.LSTM, nn.Linear}, dtype=torch.qint8\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Compare the weights of float and quantized models\n====================================================\n\nWe first call `compare_weights()` from PyTorch Numeric Suite to get a\ndictionary `wt_compare_dict` with key corresponding to module names and\neach entry is a dictionary with two keys \\'float\\' and \\'quantized\\',\ncontaining the float and quantized weights.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "wt_compare_dict = ns.compare_weights(float_model.state_dict(), qmodel.state_dict())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we get `wt_compare_dict`, it can be used to compare and compute the\nquantization error of the weights of float and quantized models as\nfollowing.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in wt_compare_dict:\n if wt_compare_dict[key]['quantized'].is_quantized:\n print(key, compute_error(wt_compare_dict[key]['float'], wt_compare_dict[key]['quantized'].dequantize()))\n else:\n print(key, compute_error(wt_compare_dict[key]['float'], wt_compare_dict[key]['quantized']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Inf value in `encoder.weight` entry above is because encoder module\nis not quantized and the weights are the same in both floating point and\nquantized models.\n\n2. Compare float point and quantized models at corresponding locations\n======================================================================\n\nThen we call `compare_model_outputs()` from PyTorch Numeric Suite to get\nthe activations in float model and quantized model at corresponding\nlocations for the given input data. This API returns a dict with module\nnames being keys. Each entry is itself a dict with two keys \\'float\\'\nand \\'quantized\\' containing the activations. Notice that this sequence\nmodel has two inputs, and we can pass both inputs into\n`compare_model_outputs()` and `compare_model_stub()`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "input_ = torch.randint(ntokens, (1, 1), dtype=torch.long)\nhidden = float_model.init_hidden(1)\n\nact_compare_dict = ns.compare_model_outputs(float_model, qmodel, input_, hidden)\nprint(act_compare_dict.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dict can be used to compare and compute the quantization error of\nthe activations of float and quantized models as following. The LSTM\nmodule in this model has two outputs, in this example we compute the\nerror of the first output.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in act_compare_dict:\n print(key, compute_error(act_compare_dict[key]['float'][0][0], act_compare_dict[key]['quantized'][0][0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Compare a module in a quantized model with its float point equivalent, with the same input data\n==================================================================================================\n\nNext we call `compare_model_stub()` from PyTorch Numeric Suite to\ncompare LSTM and Linear module with its float point equivalent. This API\nreturns a dict with key corresponding to module names and each entry\nbeing a dictionary with two keys \\'float\\' and \\'quantized\\', containing\nthe output tensors of quantized and its matching float shadow module.\n\nWe reset the model first.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "float_model = LSTMModel(\n ntoken = ntokens,\n ninp = 512,\n nhid = 256,\n nlayers = 5,\n)\nfloat_model.eval()\n\nqmodel = torch.quantization.quantize_dynamic(\n float_model, {nn.LSTM, nn.Linear}, dtype=torch.qint8\n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we call `compare_model_stub()` from PyTorch Numeric Suite to\ncompare LSTM and Linear module with its float point equivalent. This API\nreturns a dict with key corresponding to module names and each entry\nbeing a dictionary with two keys \\'float\\' and \\'quantized\\', containing\nthe output tensors of quantized and its matching float shadow module.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "module_swap_list = [nn.Linear, nn.LSTM]\nob_dict = ns.compare_model_stub(float_model, qmodel, module_swap_list, input_, hidden)\nprint(ob_dict.keys())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dict can be then used to compare and compute the module level\nquantization error.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for key in ob_dict:\n print(key, compute_error(ob_dict[key]['float'][0], ob_dict[key]['quantized'][0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SQNR of 40 dB is high and this is a situation where we have very good\nnumerical alignment between the floating point and quantized model.\n\nConclusion\n==========\n\nIn this tutorial, we demonstrated how to use PyTorch Numeric Suite to\nmeasure and compare the statistics between quantized model and float\nmodel in eager mode with unified APIs for both static quantization and\ndynamic quantization.\n\nThanks for reading! As always, we welcome any feedback, so please create\nan issue [here](https://github.com/pytorch/pytorch/issues) if you have\nany.\n\nReferences\n==========\n\n\\[1\\] [DYNAMIC QUANTIZATION ON AN LSTM WORD LANGUAGE\nMODEL](https://pytorch.org/tutorials/advanced/dynamic_quantization_tutorial.html).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }