- The multi-step dyna is based on a multi-step model, called the lambda-model.
- “The lambda-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online.”

- Multi-step Dyna uses the lambda-model to generate predictions k-steps into the future and applies TD to these simulations
- In the paper they extend tabular multi-step beta-models to linear function approximation
- The linear model is updated by gradient descent at each time step
- “Given a situation, multi-step Dyna figures out the sequences of the results in one step, two steps, etc, through many ‘dreams'[cringe] (i.e. imaginary or model-based experiences) that are connected together; the input to one (dream) being the output from the previous.”
- It seems like the k-step model is a one-step model that is applied iteratively
- They say the 1-step model is somehow optimal, but I’m not groking this point at the moment.

[Wordpress ate half of what I wrote again. The other paper (same year at NIPS is much easier to read)]

Advertisements
(function(g,$){if("undefined"!=typeof g.__ATA){
g.__ATA.initAd({collapseEmpty:'after', sectionId:26942, width:300, height:250});
g.__ATA.initAd({collapseEmpty:'after', sectionId:114160, width:300, height:250});
}})(window,jQuery);
var o = document.getElementById('crt-1471168891');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-1471168891",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-1471168891'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}
var o = document.getElementById('crt-1263282317');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-1263282317",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-1263282317'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}