Summary
We view reinforcement learning as a generic sequence modeling problem and investigate how much of the usual machinery of reinforcement learning algorithms can be replaced with the tools that have found widespread use in large-scale language modeling.
The core of our approach is the Trajectory Transformer, trained on sequences of states, actions, and rewards treated interchangeably, and a set of beam-search-based planners.