Dyna(k): A Multi-Step Dyna Planning. Yao, Sutton, Bhatnagar, Diao, Szepesvari. Workshop on Abstraction in RL. 2009

  1. The multi-step dyna is based on a multi-step model, called the lambda-model.
    1. “The lambda-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online.”
  2. Multi-step Dyna uses the lambda-model to generate predictions k-steps into the future and applies TD to these simulations
  3. In the paper they extend tabular multi-step beta-models to linear function approximation
  4. The linear model is updated by gradient descent at each time step
  5. “Given a situation, multi-step Dyna figures out the sequences of the results in one step, two steps, etc, through many ‘dreams'[cringe] (i.e. imaginary or model-based experiences) that are connected together; the input to one (dream) being the output from the previous.”
  6. It seems like the k-step model is a one-step model that is applied iteratively
  7. They say the 1-step model is somehow optimal, but I’m not groking this point at the moment.
[Wordpress ate half of what I wrote again.  The other paper (same year at NIPS is much easier to read)]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: