Based on this videolectures talk.
- Noise can have significant influence on solution to a problem. In general, finding an optimal solution is intractable.
- There are, however, some settings where the solution is tractable:
- LQR domains (but domain must exhibit particular form of smoothness)
- Deterministic control (no noise)
- That is assuming the environment is known, but what about the case where it is not (must learn/explore)?
- Can work in following control settings:
- Finite path control (minimize cost of a fixed-length path), may be time dependent
- Infinite horizon control (this is general RL), must find a global policy
- Path intergral control for finite horizon tasks
- Cont. time, space
- In this method there is a gradient
- Influence of noise decays?
- Empirical results are from a nonsmooth domain, result uses importance sampling?
- There is no solution of a Bellman equation
- Exploration horizon is proportional to time to go and optimism (says how much value could possibly exist in the remaining part of the trajectory) – exploration increases with horizon time and optimism
- Do random exploration that can help figure out optimal control (around minute 22)
- In path integral, exploration and exploitation is totally independent