{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# For tips on running notebooks in Google Colab, see\n# https://codelin.vip/beginner/colab\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Deploying a Seq2Seq Model with TorchScript\n==========================================\n\n**Author:** [Matthew Inkawhich](https://github.com/MatthewInkawhich)\n\n```{=html}\n<div style=\"background-color: #e94f3b; color: #fff; font-weight: 700; padding-left: 10px; padding-top: 5px; padding-bottom: 5px\"><strong>WARNING:</strong></div>\n```\n```{=html}\n<div style=\"background-color: #f3f4f7; padding-left: 10px; padding-top: 10px; padding-bottom: 10px; padding-right: 10px\">\n```\n```{=html}\n<p>TorchScript is no longer in active development.</p>\n```\n```{=html}\n</div>\n```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This tutorial will walk through the process of transitioning a\nsequence-to-sequence model to TorchScript using the TorchScript API. The\nmodel that we will convert is the chatbot model from the [Chatbot\ntutorial](https://pytorch.org/tutorials/beginner/chatbot_tutorial.html).\nYou can either treat this tutorial as a \"Part 2\" to the Chatbot tutorial\nand deploy your own pretrained model, or you can start with this\ndocument and use a pretrained model that we host. In the latter case,\nyou can reference the original Chatbot tutorial for details regarding\ndata preprocessing, model theory and definition, and model training.\n\nWhat is TorchScript?\n====================\n\nDuring the research and development phase of a deep learning-based\nproject, it is advantageous to interact with an **eager**, imperative\ninterface like PyTorch's. This gives users the ability to write\nfamiliar, idiomatic Python, allowing for the use of Python data\nstructures, control flow operations, print statements, and debugging\nutilities. Although the eager interface is a beneficial tool for\nresearch and experimentation applications, when it comes time to deploy\nthe model in a production environment, having a **graph**-based model\nrepresentation is very beneficial. A deferred graph representation\nallows for optimizations such as out-of-order execution, and the ability\nto target highly optimized hardware architectures. Also, a graph-based\nrepresentation enables framework-agnostic model exportation. PyTorch\nprovides mechanisms for incrementally converting eager-mode code into\nTorchScript, a statically analyzable and optimizable subset of Python\nthat Torch uses to represent deep learning programs independently from\nthe Python runtime.\n\nThe API for converting eager-mode PyTorch programs into TorchScript is\nfound in the `torch.jit` module. This module has two core modalities for\nconverting an eager-mode model to a TorchScript graph representation:\n**tracing** and **scripting**. The `torch.jit.trace` function takes a\nmodule or function and a set of example inputs. It then runs the example\ninput through the function or module while tracing the computational\nsteps that are encountered, and outputs a graph-based function that\nperforms the traced operations. **Tracing** is great for straightforward\nmodules and functions that do not involve data-dependent control flow,\nsuch as standard convolutional neural networks. However, if a function\nwith data-dependent if statements and loops is traced, only the\noperations called along the execution route taken by the example input\nwill be recorded. In other words, the control flow itself is not\ncaptured. To convert modules and functions containing data-dependent\ncontrol flow, a **scripting** mechanism is provided. The\n`torch.jit.script` function/decorator takes a module or function and\ndoes not requires example inputs. Scripting then explicitly converts the\nmodule or function code to TorchScript, including all control flows. One\ncaveat with using scripting is that it only supports a subset of Python,\nso you might need to rewrite the code to make it compatible with the\nTorchScript syntax.\n\nFor all details relating to the supported features, see the [TorchScript\nlanguage reference](https://pytorch.org/docs/master/jit.html). To\nprovide the maximum flexibility, you can also mix tracing and scripting\nmodes together to represent your whole program, and these techniques can\nbe applied incrementally.\n\n![](https://pytorch.org/tutorials/_static/img/chatbot/pytorch_workflow.png){.align-center}\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Acknowledgments\n===============\n\nThis tutorial was inspired by the following sources:\n\n1)  Yuan-Kuei Wu\\'s pytorch-chatbot implementation:\n    <https://github.com/ywk991112/pytorch-chatbot>\n2)  Sean Robertson\\'s practical-pytorch seq2seq-translation example:\n    <https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation>\n3)  FloydHub\\'s Cornell Movie Corpus preprocessing code:\n    <https://github.com/floydhub/textutil-preprocess-cornell-movie-corpus>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Prepare Environment\n===================\n\nFirst, we will import the required modules and set some constants. If\nyou are planning on using your own model, be sure that the `MAX_LENGTH`\nconstant is set correctly. As a reminder, this constant defines the\nmaximum allowed sentence length during training and the maximum length\noutput that the model is capable of producing.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport re\nimport os\nimport unicodedata\nimport numpy as np\n\ndevice = torch.device(\"cpu\")\n\n\nMAX_LENGTH = 10  # Maximum sentence length\n\n# Default word tokens\nPAD_token = 0  # Used for padding short sentences\nSOS_token = 1  # Start-of-sentence token\nEOS_token = 2  # End-of-sentence token"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Model Overview\n==============\n\nAs mentioned, the model that we are using is a\n[sequence-to-sequence](https://arxiv.org/abs/1409.3215) (seq2seq) model.\nThis type of model is used in cases when our input is a variable-length\nsequence, and our output is also a variable length sequence that is not\nnecessarily a one-to-one mapping of the input. A seq2seq model is\ncomprised of two recurrent neural networks (RNNs) that work\ncooperatively: an **encoder** and a **decoder**.\n\n![](https://pytorch.org/tutorials/_static/img/chatbot/seq2seq_ts.png){.align-center}\n\nImage source:\n<https://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/>\n\nEncoder\n-------\n\nThe encoder RNN iterates through the input sentence one token\n(e.g.\u00a0word) at a time, at each time step outputting an \"output\" vector\nand a \"hidden state\" vector. The hidden state vector is then passed to\nthe next time step, while the output vector is recorded. The encoder\ntransforms the context it saw at each point in the sequence into a set\nof points in a high-dimensional space, which the decoder will use to\ngenerate a meaningful output for the given task.\n\nDecoder\n-------\n\nThe decoder RNN generates the response sentence in a token-by-token\nfashion. It uses the encoder's context vectors, and internal hidden\nstates to generate the next word in the sequence. It continues\ngenerating words until it outputs an *EOS\\_token*, representing the end\nof the sentence. We use an [attention\nmechanism](https://arxiv.org/abs/1409.0473) in our decoder to help it to\n\"pay attention\" to certain parts of the input when generating the\noutput. For our model, we implement [Luong et\nal.](https://arxiv.org/abs/1508.04025)'s \"Global attention\" module, and\nuse it as a submodule in our decode model.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Data Handling\n=============\n\nAlthough our models conceptually deal with sequences of tokens, in\nreality, they deal with numbers like all machine learning models do. In\nthis case, every word in the model's vocabulary, which was established\nbefore training, is mapped to an integer index. We use a `Voc` object to\ncontain the mappings from word to index, as well as the total number of\nwords in the vocabulary. We will load the object later before we run the\nmodel.\n\nAlso, in order for us to be able to run evaluations, we must provide a\ntool for processing our string inputs. The `normalizeString` function\nconverts all characters in a string to lowercase and removes all\nnon-letter characters. The `indexesFromSentence` function takes a\nsentence of words and returns the corresponding sequence of word\nindexes.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class Voc:\n    def __init__(self, name):\n        self.name = name\n        self.trimmed = False\n        self.word2index = {}\n        self.word2count = {}\n        self.index2word = {PAD_token: \"PAD\", SOS_token: \"SOS\", EOS_token: \"EOS\"}\n        self.num_words = 3  # Count SOS, EOS, PAD\n\n    def addSentence(self, sentence):\n        for word in sentence.split(' '):\n            self.addWord(word)\n\n    def addWord(self, word):\n        if word not in self.word2index:\n            self.word2index[word] = self.num_words\n            self.word2count[word] = 1\n            self.index2word[self.num_words] = word\n            self.num_words += 1\n        else:\n            self.word2count[word] += 1\n\n    # Remove words below a certain count threshold\n    def trim(self, min_count):\n        if self.trimmed:\n            return\n        self.trimmed = True\n        keep_words = []\n        for k, v in self.word2count.items():\n            if v >= min_count:\n                keep_words.append(k)\n\n        print('keep_words {} / {} = {:.4f}'.format(\n            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)\n        ))\n        # Reinitialize dictionaries\n        self.word2index = {}\n        self.word2count = {}\n        self.index2word = {PAD_token: \"PAD\", SOS_token: \"SOS\", EOS_token: \"EOS\"}\n        self.num_words = 3 # Count default tokens\n        for word in keep_words:\n            self.addWord(word)\n\n\n# Lowercase and remove non-letter characters\ndef normalizeString(s):\n    s = s.lower()\n    s = re.sub(r\"([.!?])\", r\" \\1\", s)\n    s = re.sub(r\"[^a-zA-Z.!?]+\", r\" \", s)\n    return s\n\n\n# Takes string sentence, returns sentence of word indexes\ndef indexesFromSentence(voc, sentence):\n    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Encoder\n==============\n\nWe implement our encoder's RNN with the `torch.nn.GRU` module which we\nfeed a batch of sentences (vectors of word embeddings) and it internally\niterates through the sentences one token at a time calculating the\nhidden states. We initialize this module to be bidirectional, meaning\nthat we have two independent GRUs: one that iterates through the\nsequences in chronological order, and another that iterates in reverse\norder. We ultimately return the sum of these two GRUs' outputs. Since\nour model was trained using batching, our `EncoderRNN` model's `forward`\nfunction expects a padded input batch. To batch variable-length\nsentences, we allow a maximum of *MAX\\_LENGTH* tokens in a sentence, and\nall sentences in the batch that have less than *MAX\\_LENGTH* tokens are\npadded at the end with our dedicated *PAD\\_token* tokens. To use padded\nbatches with a PyTorch RNN module, we must wrap the forward pass call\nwith `torch.nn.utils.rnn.pack_padded_sequence` and\n`torch.nn.utils.rnn.pad_packed_sequence` data transformations. Note that\nthe `forward` function also takes an `input_lengths` list, which\ncontains the length of each sentence in the batch. This input is used by\nthe `torch.nn.utils.rnn.pack_padded_sequence` function when padding.\n\nTorchScript Notes:\n------------------\n\nSince the encoder's `forward` function does not contain any\ndata-dependent control flow, we will use **tracing** to convert it to\nscript mode. When tracing a module, we can leave the module definition\nas-is. We will initialize all models towards the end of this document\nbefore we run evaluations.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class EncoderRNN(nn.Module):\n    def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):\n        super(EncoderRNN, self).__init__()\n        self.n_layers = n_layers\n        self.hidden_size = hidden_size\n        self.embedding = embedding\n\n        # Initialize GRU; the ``input_size`` and ``hidden_size`` parameters are both set to 'hidden_size'\n        #   because our input size is a word embedding with number of features == hidden_size\n        self.gru = nn.GRU(hidden_size, hidden_size, n_layers,\n                          dropout=(0 if n_layers == 1 else dropout), bidirectional=True)\n\n    def forward(self, input_seq, input_lengths, hidden=None):\n        # type: (Tensor, Tensor, Optional[Tensor]) -> Tuple[Tensor, Tensor]\n        # Convert word indexes to embeddings\n        embedded = self.embedding(input_seq)\n        # Pack padded batch of sequences for RNN module\n        packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)\n        # Forward pass through GRU\n        outputs, hidden = self.gru(packed, hidden)\n        # Unpack padding\n        outputs, _ = torch.nn.utils.rnn.pad_packed_sequence(outputs)\n        # Sum bidirectional GRU outputs\n        outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]\n        # Return output and final hidden state\n        return outputs, hidden"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Decoder's Attention Module\n=================================\n\nNext, we'll define our attention module (`Attn`). Note that this module\nwill be used as a submodule in our decoder model. Luong et al.\u00a0consider\nvarious \"score functions\", which take the current decoder RNN output and\nthe entire encoder output, and return attention \"energies\". This\nattention energies tensor is the same size as the encoder output, and\nthe two are ultimately multiplied, resulting in a weighted tensor whose\nlargest values represent the most important parts of the query sentence\nat a particular time-step of decoding.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Luong attention layer\nclass Attn(nn.Module):\n    def __init__(self, method, hidden_size):\n        super(Attn, self).__init__()\n        self.method = method\n        if self.method not in ['dot', 'general', 'concat']:\n            raise ValueError(self.method, \"is not an appropriate attention method.\")\n        self.hidden_size = hidden_size\n        if self.method == 'general':\n            self.attn = nn.Linear(self.hidden_size, hidden_size)\n        elif self.method == 'concat':\n            self.attn = nn.Linear(self.hidden_size * 2, hidden_size)\n            self.v = nn.Parameter(torch.FloatTensor(hidden_size))\n\n    def dot_score(self, hidden, encoder_output):\n        return torch.sum(hidden * encoder_output, dim=2)\n\n    def general_score(self, hidden, encoder_output):\n        energy = self.attn(encoder_output)\n        return torch.sum(hidden * energy, dim=2)\n\n    def concat_score(self, hidden, encoder_output):\n        energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()\n        return torch.sum(self.v * energy, dim=2)\n\n    def forward(self, hidden, encoder_outputs):\n        # Calculate the attention weights (energies) based on the given method\n        if self.method == 'general':\n            attn_energies = self.general_score(hidden, encoder_outputs)\n        elif self.method == 'concat':\n            attn_energies = self.concat_score(hidden, encoder_outputs)\n        elif self.method == 'dot':\n            attn_energies = self.dot_score(hidden, encoder_outputs)\n\n        # Transpose max_length and batch_size dimensions\n        attn_energies = attn_energies.t()\n\n        # Return the softmax normalized probability scores (with added dimension)\n        return F.softmax(attn_energies, dim=1).unsqueeze(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Decoder\n==============\n\nSimilarly to the `EncoderRNN`, we use the `torch.nn.GRU` module for our\ndecoder's RNN. This time, however, we use a unidirectional GRU. It is\nimportant to note that unlike the encoder, we will feed the decoder RNN\none word at a time. We start by getting the embedding of the current\nword and applying a\n[dropout](https://pytorch.org/docs/stable/nn.html?highlight=dropout#torch.nn.Dropout).\nNext, we forward the embedding and the last hidden state to the GRU and\nobtain a current GRU output and hidden state. We then use our `Attn`\nmodule as a layer to obtain the attention weights, which we multiply by\nthe encoder's output to obtain our attended encoder output. We use this\nattended encoder output as our `context` tensor, which represents a\nweighted sum indicating what parts of the encoder's output to pay\nattention to. From here, we use a linear layer and softmax normalization\nto select the next word in the output sequence.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# TorchScript Notes:\n# ~~~~~~~~~~~~~~~~~~~~~~\n#\n# Similarly to the ``EncoderRNN``, this module does not contain any\n# data-dependent control flow. Therefore, we can once again use\n# **tracing** to convert this model to TorchScript after it\n# is initialized and its parameters are loaded.\n#\n\nclass LuongAttnDecoderRNN(nn.Module):\n    def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):\n        super(LuongAttnDecoderRNN, self).__init__()\n\n        # Keep for reference\n        self.attn_model = attn_model\n        self.hidden_size = hidden_size\n        self.output_size = output_size\n        self.n_layers = n_layers\n        self.dropout = dropout\n\n        # Define layers\n        self.embedding = embedding\n        self.embedding_dropout = nn.Dropout(dropout)\n        self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))\n        self.concat = nn.Linear(hidden_size * 2, hidden_size)\n        self.out = nn.Linear(hidden_size, output_size)\n\n        self.attn = Attn(attn_model, hidden_size)\n\n    def forward(self, input_step, last_hidden, encoder_outputs):\n        # Note: we run this one step (word) at a time\n        # Get embedding of current input word\n        embedded = self.embedding(input_step)\n        embedded = self.embedding_dropout(embedded)\n        # Forward through unidirectional GRU\n        rnn_output, hidden = self.gru(embedded, last_hidden)\n        # Calculate attention weights from the current GRU output\n        attn_weights = self.attn(rnn_output, encoder_outputs)\n        # Multiply attention weights to encoder outputs to get new \"weighted sum\" context vector\n        context = attn_weights.bmm(encoder_outputs.transpose(0, 1))\n        # Concatenate weighted context vector and GRU output using Luong eq. 5\n        rnn_output = rnn_output.squeeze(0)\n        context = context.squeeze(1)\n        concat_input = torch.cat((rnn_output, context), 1)\n        concat_output = torch.tanh(self.concat(concat_input))\n        # Predict next word using Luong eq. 6\n        output = self.out(concat_output)\n        output = F.softmax(output, dim=1)\n        # Return output and final hidden state\n        return output, hidden"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Evaluation\n=================\n\nGreedy Search Decoder\n---------------------\n\nAs in the chatbot tutorial, we use a `GreedySearchDecoder` module to\nfacilitate the actual decoding process. This module has the trained\nencoder and decoder models as attributes, and drives the process of\nencoding an input sentence (a vector of word indexes), and iteratively\ndecoding an output response sequence one word (word index) at a time.\n\nEncoding the input sequence is straightforward: simply forward the\nentire sequence tensor and its corresponding lengths vector to the\n`encoder`. It is important to note that this module only deals with one\ninput sequence at a time, **NOT** batches of sequences. Therefore, when\nthe constant **1** is used for declaring tensor sizes, this corresponds\nto a batch size of 1. To decode a given decoder output, we must\niteratively run forward passes through our decoder model, which outputs\nsoftmax scores corresponding to the probability of each word being the\ncorrect next word in the decoded sequence. We initialize the\n`decoder_input` to a tensor containing an *SOS\\_token*. After each pass\nthrough the `decoder`, we *greedily* append the word with the highest\nsoftmax probability to the `decoded_words` list. We also use this word\nas the `decoder_input` for the next iteration. The decoding process\nterminates either if the `decoded_words` list has reached a length of\n*MAX\\_LENGTH* or if the predicted word is the *EOS\\_token*.\n\nTorchScript Notes:\n------------------\n\nThe `forward` method of this module involves iterating over the range of\n$[0, max\\_length)$ when decoding an output sequence one word at a time.\nBecause of this, we should use **scripting** to convert this module to\nTorchScript. Unlike with our encoder and decoder models, which we can\ntrace, we must make some necessary changes to the `GreedySearchDecoder`\nmodule in order to initialize an object without error. In other words,\nwe must ensure that our module adheres to the rules of the TorchScript\nmechanism, and does not utilize any language features outside of the\nsubset of Python that TorchScript includes.\n\nTo get an idea of some manipulations that may be required, we will go\nover the diffs between the `GreedySearchDecoder` implementation from the\nchatbot tutorial and the implementation that we use in the cell below.\nNote that the lines highlighted in red are lines removed from the\noriginal implementation and the lines highlighted in green are new.\n\n![](https://pytorch.org/tutorials/_static/img/chatbot/diff.png){.align-center}\n\n### Changes:\n\n-   Added `decoder_n_layers` to the constructor arguments\n    -   This change stems from the fact that the encoder and decoder\n        models that we pass to this module will be a child of\n        `TracedModule` (not `Module`). Therefore, we cannot access the\n        decoder's number of layers with `decoder.n_layers`. Instead, we\n        plan for this, and pass this value in during module\n        construction.\n-   Store away new attributes as constants\n    -   In the original implementation, we were free to use variables\n        from the surrounding (global) scope in our\n        `GreedySearchDecoder`'s `forward` method. However, now that we\n        are using scripting, we do not have this freedom, as the\n        assumption with scripting is that we cannot necessarily hold on\n        to Python objects, especially when exporting. An easy solution\n        to this is to store these values from the global scope as\n        attributes to the module in the constructor, and add them to a\n        special list called `__constants__` so that they can be used as\n        literal values when constructing the graph in the `forward`\n        method. An example of this usage is on NEW line 19, where\n        instead of using the `device` and `SOS_token` global values, we\n        use our constant attributes `self._device` and\n        `self._SOS_token`.\n-   Enforce types of `forward` method arguments\n    -   By default, all parameters to a TorchScript function are assumed\n        to be Tensor. If we need to pass an argument of a different\n        type, we can use function type annotations as introduced in [PEP\n        3107](https://www.python.org/dev/peps/pep-3107/). In addition,\n        it is possible to declare arguments of different types using\n        Mypy-style type annotations (see\n        [doc](https://pytorch.org/docs/master/jit.html#types)).\n-   Change initialization of `decoder_input`\n    -   In the original implementation, we initialized our\n        `decoder_input` tensor with `torch.LongTensor([[SOS_token]])`.\n        When scripting, we are not allowed to initialize tensors in a\n        literal fashion like this. Instead, we can initialize our tensor\n        with an explicit torch function such as `torch.ones`. In this\n        case, we can easily replicate the scalar `decoder_input` tensor\n        by multiplying 1 by our SOS\\_token value stored in the constant\n        `self._SOS_token`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class GreedySearchDecoder(nn.Module):\n    def __init__(self, encoder, decoder, decoder_n_layers):\n        super(GreedySearchDecoder, self).__init__()\n        self.encoder = encoder\n        self.decoder = decoder\n        self._device = device\n        self._SOS_token = SOS_token\n        self._decoder_n_layers = decoder_n_layers\n\n    __constants__ = ['_device', '_SOS_token', '_decoder_n_layers']\n\n    def forward(self, input_seq : torch.Tensor, input_length : torch.Tensor, max_length : int):\n        # Forward input through encoder model\n        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)\n        # Prepare encoder's final hidden layer to be first hidden input to the decoder\n        decoder_hidden = encoder_hidden[:self._decoder_n_layers]\n        # Initialize decoder input with SOS_token\n        decoder_input = torch.ones(1, 1, device=self._device, dtype=torch.long) * self._SOS_token\n        # Initialize tensors to append decoded words to\n        all_tokens = torch.zeros([0], device=self._device, dtype=torch.long)\n        all_scores = torch.zeros([0], device=self._device)\n        # Iteratively decode one word token at a time\n        for _ in range(max_length):\n            # Forward pass through decoder\n            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)\n            # Obtain most likely word token and its softmax score\n            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)\n            # Record token and score\n            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)\n            all_scores = torch.cat((all_scores, decoder_scores), dim=0)\n            # Prepare current token to be next decoder input (add a dimension)\n            decoder_input = torch.unsqueeze(decoder_input, 0)\n        # Return collections of word tokens and scores\n        return all_tokens, all_scores"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Evaluating an Input\n===================\n\nNext, we define some functions for evaluating an input. The `evaluate`\nfunction takes a normalized string sentence, processes it to a tensor of\nits corresponding word indexes (with batch size of 1), and passes this\ntensor to a `GreedySearchDecoder` instance called `searcher` to handle\nthe encoding/decoding process. The searcher returns the output word\nindex vector and a scores tensor corresponding to the softmax scores for\neach decoded word token. The final step is to convert each word index\nback to its string representation using `voc.index2word`.\n\nWe also define two functions for evaluating an input sentence. The\n`evaluateInput` function prompts a user for an input, and evaluates it.\nIt will continue to ask for another input until the user enters 'q' or\n'quit'.\n\nThe `evaluateExample` function simply takes a string input sentence as\nan argument, normalizes it, evaluates it, and prints the response.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def evaluate(searcher, voc, sentence, max_length=MAX_LENGTH):\n    ### Format input sentence as a batch\n    # words -> indexes\n    indexes_batch = [indexesFromSentence(voc, sentence)]\n    # Create lengths tensor\n    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])\n    # Transpose dimensions of batch to match models' expectations\n    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)\n    # Use appropriate device\n    input_batch = input_batch.to(device)\n    lengths = lengths.to(device)\n    # Decode sentence with searcher\n    tokens, scores = searcher(input_batch, lengths, max_length)\n    # indexes -> words\n    decoded_words = [voc.index2word[token.item()] for token in tokens]\n    return decoded_words\n\n\n# Evaluate inputs from user input (``stdin``)\ndef evaluateInput(searcher, voc):\n    input_sentence = ''\n    while(1):\n        try:\n            # Get input sentence\n            input_sentence = input('> ')\n            # Check if it is quit case\n            if input_sentence == 'q' or input_sentence == 'quit': break\n            # Normalize sentence\n            input_sentence = normalizeString(input_sentence)\n            # Evaluate sentence\n            output_words = evaluate(searcher, voc, input_sentence)\n            # Format and print response sentence\n            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]\n            print('Bot:', ' '.join(output_words))\n\n        except KeyError:\n            print(\"Error: Encountered unknown word.\")\n\n# Normalize input sentence and call ``evaluate()``\ndef evaluateExample(sentence, searcher, voc):\n    print(\"> \" + sentence)\n    # Normalize sentence\n    input_sentence = normalizeString(sentence)\n    # Evaluate sentence\n    output_words = evaluate(searcher, voc, input_sentence)\n    output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]\n    print('Bot:', ' '.join(output_words))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Load Pretrained Parameters\n==========================\n\nNo, let\\'s load our model!\n\nUse hosted model\n----------------\n\nTo load the hosted model:\n\n1)  Download the model\n    [here](https://download.pytorch.org/models/tutorials/4000_checkpoint.tar).\n2)  Set the `loadFilename` variable to the path to the downloaded\n    checkpoint file.\n3)  Leave the `checkpoint = torch.load(loadFilename)` line uncommented,\n    as the hosted model was trained on CPU.\n\nUse your own model\n------------------\n\nTo load your own pretrained model:\n\n1)  Set the `loadFilename` variable to the path to the checkpoint file\n    that you wish to load. Note that if you followed the convention for\n    saving the model from the chatbot tutorial, this may involve\n    changing the `model_name`, `encoder_n_layers`, `decoder_n_layers`,\n    `hidden_size`, and `checkpoint_iter` (as these values are used in\n    the model path).\n2)  If you trained the model on a CPU, make sure that you are opening\n    the checkpoint with the `checkpoint = torch.load(loadFilename)`\n    line. If you trained the model on a GPU and are running this\n    tutorial on a CPU, uncomment the\n    `checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))`\n    line.\n\nTorchScript Notes:\n------------------\n\nNotice that we initialize and load parameters into our encoder and\ndecoder models as usual. If you are using tracing\nmode(`torch.jit.trace`) for some part of your models, you must call\n`.to(device)` to set the device options of the models and `.eval()` to\nset the dropout layers to test mode **before** tracing the models.\n[TracedModule]{.title-ref} objects do not inherit the `to` or `eval`\nmethods. Since in this tutorial we are only using scripting instead of\ntracing, we only need to do this before we do evaluation (which is the\nsame as we normally do in eager mode).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "save_dir = os.path.join(\"data\", \"save\")\ncorpus_name = \"cornell movie-dialogs corpus\"\n\n# Configure models\nmodel_name = 'cb_model'\nattn_model = 'dot'\n#attn_model = 'general'``\n#attn_model = 'concat'\nhidden_size = 500\nencoder_n_layers = 2\ndecoder_n_layers = 2\ndropout = 0.1\nbatch_size = 64\n\n# If you're loading your own model\n# Set checkpoint to load from\ncheckpoint_iter = 4000"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Sample code to load from a checkpoint:\n\n``` {.python}\nloadFilename = os.path.join(save_dir, model_name, corpus_name,\n                         '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),\n                         '{}_checkpoint.tar'.format(checkpoint_iter))\n```\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# If you're loading the hosted model\nloadFilename = 'data/4000_checkpoint.tar'\n\n# Load model\n# Force CPU device options (to match tensors in this tutorial)\ncheckpoint = torch.load(loadFilename, map_location=torch.device('cpu'))\nencoder_sd = checkpoint['en']\ndecoder_sd = checkpoint['de']\nencoder_optimizer_sd = checkpoint['en_opt']\ndecoder_optimizer_sd = checkpoint['de_opt']\nembedding_sd = checkpoint['embedding']\nvoc = Voc(corpus_name)\nvoc.__dict__ = checkpoint['voc_dict']\n\n\nprint('Building encoder and decoder ...')\n# Initialize word embeddings\nembedding = nn.Embedding(voc.num_words, hidden_size)\nembedding.load_state_dict(embedding_sd)\n# Initialize encoder & decoder models\nencoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)\ndecoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)\n# Load trained model parameters\nencoder.load_state_dict(encoder_sd)\ndecoder.load_state_dict(decoder_sd)\n# Use appropriate device\nencoder = encoder.to(device)\ndecoder = decoder.to(device)\n# Set dropout layers to ``eval`` mode\nencoder.eval()\ndecoder.eval()\nprint('Models built and ready to go!')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Convert Model to TorchScript\n============================\n\nEncoder\n-------\n\nAs previously mentioned, to convert the encoder model to TorchScript, we\nuse **scripting**. The encoder model takes an input sequence and a\ncorresponding lengths tensor. Therefore, we create an example input\nsequence tensor `test_seq`, which is of appropriate size (MAX\\_LENGTH,\n1), contains numbers in the appropriate range $[0, voc.num\\_words)$, and\nis of the appropriate type (int64). We also create a `test_seq_length`\nscalar which realistically contains the value corresponding to how many\nwords are in the `test_seq`. The next step is to use the\n`torch.jit.trace` function to trace the model. Notice that the first\nargument we pass is the module that we want to trace, and the second is\na tuple of arguments to the module's `forward` method.\n\nDecoder\n-------\n\nWe perform the same process for tracing the decoder as we did for the\nencoder. Notice that we call forward on a set of random inputs to the\ntraced\\_encoder to get the output that we need for the decoder. This is\nnot required, as we could also simply manufacture a tensor of the\ncorrect shape, type, and value range. This method is possible because in\nour case we do not have any constraints on the values of the tensors\nbecause we do not have any operations that could fault on out-of-range\ninputs.\n\nGreedySearchDecoder\n-------------------\n\nRecall that we scripted our searcher module due to the presence of\ndata-dependent control flow. In the case of scripting, we do necessary\nlanguage changes to make sure the implementation complies with\nTorchScript. We initialize the scripted searcher the same way that we\nwould initialize an unscripted variant.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "### Compile the whole greedy search model to TorchScript model\n# Create artificial inputs\ntest_seq = torch.LongTensor(MAX_LENGTH, 1).random_(0, voc.num_words).to(device)\ntest_seq_length = torch.LongTensor([test_seq.size()[0]]).to(device)\n# Trace the model\ntraced_encoder = torch.jit.trace(encoder, (test_seq, test_seq_length))\n\n### Convert decoder model\n# Create and generate artificial inputs\ntest_encoder_outputs, test_encoder_hidden = traced_encoder(test_seq, test_seq_length)\ntest_decoder_hidden = test_encoder_hidden[:decoder.n_layers]\ntest_decoder_input = torch.LongTensor(1, 1).random_(0, voc.num_words)\n# Trace the model\ntraced_decoder = torch.jit.trace(decoder, (test_decoder_input, test_decoder_hidden, test_encoder_outputs))\n\n### Initialize searcher module by wrapping ``torch.jit.script`` call\nscripted_searcher = torch.jit.script(GreedySearchDecoder(traced_encoder, traced_decoder, decoder.n_layers))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Print Graphs\n============\n\nNow that our models are in TorchScript form, we can print the graphs of\neach to ensure that we captured the computational graph appropriately.\nSince TorchScript allow us to recursively compile the whole model\nhierarchy and inline the `encoder` and `decoder` graph into a single\ngraph, we just need to print the [scripted\\_searcher]{.title-ref} graph\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print('scripted_searcher graph:\\n', scripted_searcher.graph)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run Evaluation\n==============\n\nFinally, we will run evaluation of the chatbot model using the\nTorchScript models. If converted correctly, the models will behave\nexactly as they would in their eager-mode representation.\n\nBy default, we evaluate a few common query sentences. If you want to\nchat with the bot yourself, uncomment the `evaluateInput` line and give\nit a spin.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Use appropriate device\nscripted_searcher.to(device)\n# Set dropout layers to ``eval`` mode\nscripted_searcher.eval()\n\n# Evaluate examples\nsentences = [\"hello\", \"what's up?\", \"who are you?\", \"where am I?\", \"where are you from?\"]\nfor s in sentences:\n    evaluateExample(s, scripted_searcher, voc)\n\n# Evaluate your input by running\n# ``evaluateInput(traced_encoder, traced_decoder, scripted_searcher, voc)``"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Save Model\n==========\n\nNow that we have successfully converted our model to TorchScript, we\nwill serialize it for use in a non-Python deployment environment. To do\nthis, we can simply save our `scripted_searcher` module, as this is the\nuser-facing interface for running inference against the chatbot model.\nWhen saving a Script module, use script\\_module.save(PATH) instead of\ntorch.save(model, PATH).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "scripted_searcher.save(\"scripted_chatbot.pth\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}