Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach. Theodorou, Buchli, and Schaal. ICRA 2010

  1. Designed for use in continuous action, high dimensional spaces
  2. Math is based on control theory
  3. Claims to be simple, with low risk of numerical divergence (doesn’t perform matrix inversions or gradient estimations).  No learning rates as well
  4. Empirically works better than gradient-based methods, there are empirical results for littledog
  5. Mentions limitations of value-based, rollout, and approximate policy iteration
    1. In particular, says rollout methods have too many tuning parameters
  6. “This approach make[s] an appealing theoretical connection between value function approximation using the stochastic HJB equations and direct policy learning by approximating the path integral, i.e., by solving a statistical inference problem from sample rollouts.”
  7. The method here has no tuning params aside from exploration noise
  8. Looks like You still need to know alot about the structure of the problem as well as that it is still linear
  9. The algorithm also requires a noiseless rollout
  10. “In our case we use a special case of parameterized policies in the form of Dynamic Movement Primitives… Essentially, these policies code a learnable point attractor for a movement from y_t_0 to the goal g… The DMP equations are obviously of the form of our control system, just with a row vector as control transition matrix
  11. Compare against other policy search methods that perform gradient computations.  
    1. Also mentions PoWER, which is not a gradient method, and is based on EM, but since it requires a special property for the reward function, it was not applicable to this setting
  12. I am not familiar with most of the comparison algorithms aside from REINFORCE.  It is however, strange that the other two comparison algorithms are basically as bad or worse than REINFORCE, as it is commonly considered to be an extremelyinefficient algorithm.   Comparing to REINFORCE is like comparing to Q-Learning
    1. So while PI^2 does well relative to these (seems to be by an order of magnitude), it makes me wonder how much effort was spent tuning the other algorithms.
    2. Said PI^2 didn’t need param tuning while others did for each domain, so I can be sympathetic if they didn’t tune the others so well
  13. The test domains are, however, extremely high dimensional (50-DOF)
  14. In the little dog experiment they did learning from demonstration
  15. Says each degree of freedom in the littledog experiment was represented by a DMP with 50 basis functions
    1. I don’t know what a DMP is.
    2. I guess the basis functions are what the linear parameterization is being done to?
  16. This paper doesn’t introduce the path integral approach, but is a more general approach than the original algorithm.  It would be interesting to see how different this is from the original.
  17. I don’t grok what the algorithm is doing yet; I would probably need to invest some time to figure out what is going on.  Maybe Videolectures will help.

2 thoughts on “Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach. Theodorou, Buchli, and Schaal. ICRA 2010

  1. SoloGen says:

    What’s the title of the paper?!

  2. Ari Weinstein says:

    Sorry, I started posting unfinished notes on things and then re-editing later because WordPress kept deleting my drafts. Its updated now.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: