- The multi-step dyna is based on a multi-step model, called the lambda-model.
- “The lambda-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online.”
- Multi-step Dyna uses the lambda-model to generate predictions k-steps into the future and applies TD to these simulations
- In the paper they extend tabular multi-step beta-models to linear function approximation
- The linear model is updated by gradient descent at each time step
- “Given a situation, multi-step Dyna figures out the sequences of the results in one step, two steps, etc, through many ‘dreams'[cringe] (i.e. imaginary or model-based experiences) that are connected together; the input to one (dream) being the output from the previous.”
- It seems like the k-step model is a one-step model that is applied iteratively
- They say the 1-step model is somehow optimal, but I’m not groking this point at the moment.
[Wordpress ate half of what I wrote again. The other paper (same year at NIPS is much easier to read)]