- Referenced in PATH INTEGRAL POLICY IMPROVEMENT WITH COVARIANCE MATRIX ADAPTATION as an early application of cross-entropy to continuous spaces
**Although this paper works in discrete action spaces.**- Alg looks for best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function (type and # of BFs are specified a priori)
- Compare against the optimization algorithm DIRECT (interesting)
- Has citations that says value-function approximation generally requires expert design, or leverages basis functions because it has nice theoretic properties (but poor actual performance).
**Attempts have been made to adaptively form basis functions for VF approximation, but this can lead to convergence issues.** **Also has citations that say for policy search methods, functions that define policy are generally either ad-hoc or require expert design**- Funny that the algorithm can decide where to put the BFs and what action to select from them, but only selects from a discrete set. Seems trivial to move to continuous actions from here (may see why that is tough in a minute)
- Their test domains are the double integrator, bicycle, and HIV
- It is compared against LSPI and fuzzy Q, as well as DIRECT optimization of the basis functions
**Actor-critic methods perform both gradient-based policy optimization as well as value function approximation**- Gradient methods also assume reaching the local optimum is good enough, but in some cases there are many local optima which are not good
- This is particularly problematic when the policy representation is rich (many RBFs) as opposed to frugal (few linear functions)

- There are other cited methods for finding basis functions for VF approximation that they use as inspiration for doing the same for policies
**Convergence of Cross-Entropy is not guaranteed, although in practice it is generally convergent**- Although for discrete optimization the probability of reaching optimum can be made arbitrarily close to 1 by using an arbitrarily small smoothing parameter

- Argue here that the policy is easier to represent than the value function in many cases
- Compared to value function approximators with equally spaced basis functions, CE required fewer BFs, but this is natural – did they compare to VFAs that use adaptive basis functions (they cited it)
- Adaptive basis functions allows it to work better in high dimensional spaces

- They give a shout to doing continuous action stuff with the algorithm

Advertisements
(function(){var c=function(){var a=document.getElementById("crt-1806116526");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-1806116526",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();
(function(){var c=function(){var a=document.getElementById("crt-1863349274");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-1863349274",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();