## Offline Reinforcement Learning as

One Big Sequence Modeling Problem

**Trajectory Transformer**

**Single-Step Model**

*Long-horizon predictions of the Trajectory Transformer compared*

to those of a feedforward single-step dynamics model.

to those of a feedforward single-step dynamics model.

**Summary**

**Transformers as dynamics models**

*Attention patterns of the Trajectory Transformer, showing (left) a discovered*

Markovian stratetgy and (right) an approach with action smoothing.

Markovian stratetgy and (right) an approach with action smoothing.

**Beam search as trajectory optimizer**

- Decoding a Trajectory Transformer with unmodified beam search gives rise to a model-based imitative method that optimizes for entire predicted trajectories to match those of an expert policy.
- Conditioning trajectories on a future desired state alongside previously-encountered states yields a goal-reaching method.
- Replacing log-probabilities from the sequence model with reward predictions yields a model-based planning method, surprisingly effective despite lacking the details usually required to make planning with learned models effective.

Start Goal

**Offline Reinforcement Learning as One Big Sequence Modeling Problem**

**Related Publication**

*e.g.*, in the humanoid predictions above) and allows them to evaluate their policies in image-based environments (

*e.g.*, Atari). We encourage you to check out their work as well.