- Referenced in PATH INTEGRAL POLICY IMPROVEMENT WITH COVARIANCE MATRIX ADAPTATION as an early application of cross-entropy to continuous spaces
**Although this paper works in discrete action spaces.**- Alg looks for best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function (type and # of BFs are specified a priori)
- Compare against the optimization algorithm DIRECT (interesting)
- Has citations that says value-function approximation generally requires expert design, or leverages basis functions because it has nice theoretic properties (but poor actual performance).
**Attempts have been made to adaptively form basis functions for VF approximation, but this can lead to convergence issues.** **Also has citations that say for policy search methods, functions that define policy are generally either ad-hoc or require expert design**- Funny that the algorithm can decide where to put the BFs and what action to select from them, but only selects from a discrete set. Seems trivial to move to continuous actions from here (may see why that is tough in a minute)
- Their test domains are the double integrator, bicycle, and HIV
- It is compared against LSPI and fuzzy Q, as well as DIRECT optimization of the basis functions
**Actor-critic methods perform both gradient-based policy optimization as well as value function approximation**- Gradient methods also assume reaching the local optimum is good enough, but in some cases there are many local optima which are not good
- This is particularly problematic when the policy representation is rich (many RBFs) as opposed to frugal (few linear functions)

- There are other cited methods for finding basis functions for VF approximation that they use as inspiration for doing the same for policies
**Convergence of Cross-Entropy is not guaranteed, although in practice it is generally convergent**- Although for discrete optimization the probability of reaching optimum can be made arbitrarily close to 1 by using an arbitrarily small smoothing parameter

- Argue here that the policy is easier to represent than the value function in many cases
- Compared to value function approximators with equally spaced basis functions, CE required fewer BFs, but this is natural – did they compare to VFAs that use adaptive basis functions (they cited it)
- Adaptive basis functions allows it to work better in high dimensional spaces

- They give a shout to doing continuous action stuff with the algorithm

Advertisements
(function(g,$){if("undefined"!=typeof g.__ATA){
g.__ATA.initAd({collapseEmpty:'after', sectionId:26942, width:300, height:250});
g.__ATA.initAd({collapseEmpty:'after', sectionId:114160, width:300, height:250});
}})(window,jQuery);
var o = document.getElementById('crt-532489716');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-532489716",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-532489716'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}
var o = document.getElementById('crt-384325146');
if ("undefined"!=typeof Criteo) {
var p = o.parentNode;
p.style.setProperty('display', 'inline-block', 'important');
o.style.setProperty('display', 'block', 'important');
Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-384325146",collapseContainerIfNotAdblocked:true,"callifnotadblocked": function () {var o = document.getElementById('crt-384325146'); o.style.setProperty('display','none','important');o.style.setProperty('visbility','hidden','important'); } });
} else {
o.style.setProperty('display', 'none', 'important');
o.style.setProperty('visibility', 'hidden', 'important');
}