{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# For tips on running notebooks in Google Colab, see\n# https://pytorch.org/tutorials/beginner/colab\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recurrent DQN: Training recurrent policies\n==========================================\n\n**Author**: [Vincent Moens](https://github.com/vmoens)\n\n```{=html}\n
The ~torchrl.envs.transforms.StepCounter
transform is accessory. Since the CartPoletask goal is to make trajectories as long as possible, counting the stepscan help us track the performance of our policy.
: The class supports almost all LSTM features such asdropout or multi-layered LSTMs.However, to respect TorchRL's conventions, this LSTM must have the batch_first
attribute set to True
which is the default in PyTorch. However,our ~torchrl.modules.LSTMModule
changes this defaultbehavior, so we're good with a native call.Also, the LSTM cannot have a bidirectional
attribute set to True
asthis wouldn't be usable in online settings. In this case, the default valueis the correct one.
TorchRL also provides a wrapper class torchrl.modules.QValueActor
thatwraps a module in a Sequential together with a ~torchrl.modules.tensordict_module.QValueModule
like we are doing explicitly here. There is little advantage to do thisand the process is less transparent, but the end results will be similar towhat we do here.
For the sake of efficiency, we're only running a few thousands iterationshere. In a real setting, the total number of frames should be set to 1M.
\n```\n```{=html}\n