- Basic idea is a method of refining policies via online search techniques, finding a policy for a query state
- Method is designed to be anytime
- Based on receding horizon search; search to a fixed depth, but then replan at every step
- But as planning increases the horizon is expanded
- Idea of lookahead pathologies; a lookahead is pathological if looking deeper increases the chance of choosing a suboptimal action
- Goal is to do something like sparse sampling, but more efficiently
- Discuss LRTA* and LAO*
- Says that in sparse sampling, increasing H while holding C constantly eventually increases the error rate, so that iterative deepening methods will eventually cause a poor decision to be made
- Say it is important to balance C and H
- They sample full trajectories instead of building an complete SS tree.
- Horizon is dynamically controlled by the estimated global sampling error of the tree
- Estimated error of tree is based on use of confidence intervals, which is estimated based on how many times each (s,a,d) is sampled (isn’t constant like in SS). Errors have to be propagated through the tree
- Show empirical example where performance degrades with increased horizon