Finite horizon exploration for path integral control problems. Bert Kappen. On-line Trading of Exploration and Exploitation, NIPS Workshop ’06 – Talk


Based on this videolectures talk.

  1. Noise can have significant influence on solution to a problem.  In general, finding an optimal solution is intractable.
  2. There are, however, some settings where the solution is tractable:
    1. LQR domains (but domain must exhibit particular form of smoothness)
    2. Deterministic control (no noise)
  3. That is assuming the environment is known, but what about the case where it is not (must learn/explore)?
  4. Can work in following control settings:
    1. Finite path control (minimize cost of a fixed-length path), may be time dependent
    2. Infinite horizon control (this is general RL), must find a global policy
  5. Path intergral control for finite horizon tasks
    1. Cont. time, space
  6. In this method there is a gradient
  7. Influence of noise decays?
  8. Empirical results are from a nonsmooth domain, result uses importance sampling?
  9. There is no solution of a Bellman equation
  10. Exploration horizon is proportional to time to go and optimism (says how much value could possibly exist in the remaining part of the trajectory) – exploration increases with horizon time and optimism
  11. Do random exploration that can help figure out optimal control (around minute 22)
  12. In path integral, exploration and exploitation is totally independent
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: