{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Model Freezing in TorchScript\n=============================\n\n```{=html}\n
WARNING:
\n```\n```{=html}\n
\n```\n```{=html}\n

TorchScript is no longer in active development.

\n```\n```{=html}\n
\n```\nIn this tutorial, we introduce the syntax for *model freezing* in\nTorchScript. Freezing is the process of inlining Pytorch module\nparameters and attributes values into the TorchScript internal\nrepresentation. Parameter and attribute values are treated as final\nvalues and they cannot be modified in the resulting Frozen module.\n\nBasic Syntax\n------------\n\nModel freezing can be invoked using API below:\n\n> `torch.jit.freeze(mod : ScriptModule, names : str[]) -> ScriptModule`\n\nNote the input module can either be the result of scripting or tracing.\nSee\n\n\nNext, we demonstrate how freezing works using an example:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch, time\n\nclass Net(torch.nn.Module):\n def __init__(self):\n super(Net, self).__init__()\n self.conv1 = torch.nn.Conv2d(1, 32, 3, 1)\n self.conv2 = torch.nn.Conv2d(32, 64, 3, 1)\n self.dropout1 = torch.nn.Dropout2d(0.25)\n self.dropout2 = torch.nn.Dropout2d(0.5)\n self.fc1 = torch.nn.Linear(9216, 128)\n self.fc2 = torch.nn.Linear(128, 10)\n\n def forward(self, x):\n x = self.conv1(x)\n x = torch.nn.functional.relu(x)\n x = self.conv2(x)\n x = torch.nn.functional.max_pool2d(x, 2)\n x = self.dropout1(x)\n x = torch.flatten(x, 1)\n x = self.fc1(x)\n x = torch.nn.functional.relu(x)\n x = self.dropout2(x)\n x = self.fc2(x)\n output = torch.nn.functional.log_softmax(x, dim=1)\n return output\n\n @torch.jit.export\n def version(self):\n return 1.0\n\nnet = torch.jit.script(Net())\nfnet = torch.jit.freeze(net)\n\nprint(net.conv1.weight.size())\nprint(net.conv1.bias)\n\ntry:\n print(fnet.conv1.bias)\n # without exception handling, prints:\n # RuntimeError: __torch__.z.___torch_mangle_3.Net does not have a field\n # with name 'conv1'\nexcept RuntimeError:\n print(\"field 'conv1' is inlined. It does not exist in 'fnet'\")\n\ntry:\n fnet.version()\n # without exception handling, prints:\n # RuntimeError: __torch__.z.___torch_mangle_3.Net does not have a field\n # with name 'version'\nexcept RuntimeError:\n print(\"method 'version' is not deleted in fnet. Only 'forward' is preserved\")\n\nfnet2 = torch.jit.freeze(net, [\"version\"])\n\nprint(fnet2.version())\n\nB=1\nwarmup = 1\niter = 1000\ninput = torch.rand(B, 1,28, 28)\n\nstart = time.time()\nfor i in range(warmup):\n net(input)\nend = time.time()\nprint(\"Scripted - Warm up time: {0:7.4f}\".format(end-start), flush=True)\n\nstart = time.time()\nfor i in range(warmup):\n fnet(input)\nend = time.time()\nprint(\"Frozen - Warm up time: {0:7.4f}\".format(end-start), flush=True)\n\nstart = time.time()\nfor i in range(iter):\n input = torch.rand(B, 1,28, 28)\n net(input)\nend = time.time()\nprint(\"Scripted - Inference: {0:5.2f}\".format(end-start), flush=True)\n\nstart = time.time()\nfor i in range(iter):\n input = torch.rand(B, 1,28, 28)\n fnet2(input)\nend = time.time()\nprint(\"Frozen - Inference time: {0:5.2f}\".format(end-start), flush =True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On my machine, I measured the time:\n\n- Scripted - Warm up time: 0.0107\n- Frozen - Warm up time: 0.0048\n- Scripted - Inference: 1.35\n- Frozen - Inference time: 1.17\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our example, warm up time measures the first two runs. The frozen\nmodel is 50% faster than the scripted model. On some more complex\nmodels, we observed even higher speed up of warm up time. freezing\nachieves this speed up because it is doing some the work TorchScript has\nto do when the first couple runs are initiated.\n\nInference time measures inference execution time after the model is\nwarmed up. Although we observed significant variation in execution time,\nthe frozen model is often about 15% faster than the scripted model. When\ninput is larger, we observe a smaller speed up because the execution is\ndominated by tensor operations.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Conclusion\n==========\n\nIn this tutorial, we learned about model freezing. Freezing is a useful\ntechnique to optimize models for inference and it also can significantly\nreduce TorchScript warmup time.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }