Cross-Entropy Optimization of Control Policies with Adaptive Basis Functions. Busoniu, Ernst, De Schutter, Babuska. IEEE Transactions on Systems, Man, and Cybernetics. 2011.

  1. Referenced in PATH INTEGRAL POLICY IMPROVEMENT WITH COVARIANCE MATRIX ADAPTATION as an early application of cross-entropy to continuous spaces
  2. Although this paper works in discrete action spaces.
  3. Alg looks for best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function (type and # of BFs are specified a priori)
  4. Compare against the optimization algorithm DIRECT (interesting)
  5. Has citations that says value-function approximation generally requires expert design, or leverages basis functions because it has nice theoretic properties (but poor actual performance).  Attempts have been made to adaptively form basis functions for VF approximation, but this can lead to convergence issues.
  6. Also has citations that say for policy search methods, functions that define policy are generally either ad-hoc or require expert design
  7. Funny that the algorithm can decide where to put the BFs and what action to select from them, but only selects from a discrete set.  Seems trivial to move to continuous actions from here (may see why that is tough in a minute)
  8. Their test domains are the double integrator, bicycle, and HIV
  9. It is compared against LSPI and fuzzy Q, as well as DIRECT optimization of the basis functions
  10. Actor-critic methods perform both gradient-based policy optimization as well as value function approximation
  11. Gradient methods also assume reaching the local optimum is good enough, but in some cases there are many local optima which are not good
    1. This is particularly problematic when the policy representation is rich (many RBFs) as opposed to frugal (few linear functions)
  12. There are other cited methods for finding basis functions for VF approximation that they use as inspiration for doing the same for policies
  13. Convergence of Cross-Entropy is not guaranteed, although in practice it is generally convergent
    1. Although for discrete optimization the probability of reaching optimum can be made arbitrarily close to 1 by using an arbitrarily small smoothing parameter
  14. Argue here that the policy is easier to represent than the value function in many cases
  15. Compared to value function approximators with equally spaced basis functions, CE required fewer BFs, but this is natural – did they compare to VFAs that use adaptive basis functions (they cited it)
    1. Adaptive basis functions allows it to work better in high dimensional spaces
  16. They give  a shout to doing continuous action stuff with the algorithm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: