Finite horizon exploration for path integral control problems. Bert Kappen. On-line Trading of Exploration and Exploitation, NIPS Workshop ’06 – Talk

Based on this videolectures talk.

  1. Noise can have significant influence on solution to a problem.  In general, finding an optimal solution is intractable.
  2. There are, however, some settings where the solution is tractable:
    1. LQR domains (but domain must exhibit particular form of smoothness)
    2. Deterministic control (no noise)
  3. That is assuming the environment is known, but what about the case where it is not (must learn/explore)?
  4. Can work in following control settings:
    1. Finite path control (minimize cost of a fixed-length path), may be time dependent
    2. Infinite horizon control (this is general RL), must find a global policy
  5. Path intergral control for finite horizon tasks
    1. Cont. time, space
  6. In this method there is a gradient
  7. Influence of noise decays?
  8. Empirical results are from a nonsmooth domain, result uses importance sampling?
  9. There is no solution of a Bellman equation
  10. Exploration horizon is proportional to time to go and optimism (says how much value could possibly exist in the remaining part of the trajectory) – exploration increases with horizon time and optimism
  11. Do random exploration that can help figure out optimal control (around minute 22)
  12. In path integral, exploration and exploitation is totally independent

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: