- The multi-step dyna is based on a multi-step model, called the lambda-model.
- “The lambda-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online.”

- Multi-step Dyna uses the lambda-model to generate predictions k-steps into the future and applies TD to these simulations
- In the paper they extend tabular multi-step beta-models to linear function approximation
- The linear model is updated by gradient descent at each time step
- “Given a situation, multi-step Dyna figures out the sequences of the results in one step, two steps, etc, through many ‘dreams'[cringe] (i.e. imaginary or model-based experiences) that are connected together; the input to one (dream) being the output from the previous.”
- It seems like the k-step model is a one-step model that is applied iteratively
- They say the 1-step model is somehow optimal, but I’m not groking this point at the moment.

[Wordpress ate half of what I wrote again. The other paper (same year at NIPS is much easier to read)]

Advertisements
(function(){var c=function(){var a=document.getElementById("crt-1697046587");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-1697046587",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();
(function(){var c=function(){var a=document.getElementById("crt-1678366755");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-1678366755",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();