Cross-Entropy Optimization of Control Policies with Adaptive Basis Functions. Busoniu, Ernst, De Schutter, Babuska. IEEE Transactions on Systems, Man, and Cybernetics. 2011. | Ari Weinstein's Research

Cross-Entropy Optimization of Control Policies with Adaptive Basis Functions. Busoniu, Ernst, De Schutter, Babuska. IEEE Transactions on Systems, Man, and Cybernetics. 2011.

Although this paper works in discrete action spaces.

Alg looks for best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function (type and # of BFs are specified a priori)

Compare against the optimization algorithm DIRECT (interesting)

Has citations that says value-function approximation generally requires expert design, or leverages basis functions because it has nice theoretic properties (but poor actual performance). Attempts have been made to adaptively form basis functions for VF approximation, but this can lead to convergence issues.

Also has citations that say for policy search methods, functions that define policy are generally either ad-hoc or require expert design

Funny that the algorithm can decide where to put the BFs and what action to select from them, but only selects from a discrete set. Seems trivial to move to continuous actions from here (may see why that is tough in a minute)

Their test domains are the double integrator, bicycle, and HIV

It is compared against LSPI and fuzzy Q, as well as DIRECT optimization of the basis functions

Actor-critic methods perform both gradient-based policy optimization as well as value function approximation

Gradient methods also assume reaching the local optimum is good enough, but in some cases there are many local optima which are not good

This is particularly problematic when the policy representation is rich (many RBFs) as opposed to frugal (few linear functions)

There are other cited methods for finding basis functions for VF approximation that they use as inspiration for doing the same for policies

Convergence of Cross-Entropy is not guaranteed, although in practice it is generally convergent

Although for discrete optimization the probability of reaching optimum can be made arbitrarily close to 1 by using an arbitrarily small smoothing parameter

Argue here that the policy is easier to represent than the value function in many cases

Compared to value function approximators with equally spaced basis functions, CE required fewer BFs, but this is natural – did they compare to VFAs that use adaptive basis functions (they cited it)

Adaptive basis functions allows it to work better in high dimensional spaces

They give a shout to doing continuous action stuff with the algorithm