On-line search for solving Markov Decision Processes via heuristic sampling. Peret, Garcia

Basic idea is a method of refining policies via online search techniques, finding a policy for a query state

Method is designed to be anytime

Based on receding horizon search; search to a fixed depth, but then replan at every step

But as planning increases the horizon is expanded

Idea of lookahead pathologies; a lookahead is pathological if looking deeper increases the chance of choosing a suboptimal action

Goal is to do something like sparse sampling, but more efficiently

Discuss LRTA* and LAO*

Says that in sparse sampling, increasing H while holding C constantly eventually increases the error rate, so that iterative deepening methods will eventually cause a poor decision to be made

Say it is important to balance C and H

They sample full trajectories instead of building an complete SS tree.

Horizon is dynamically controlled by the estimated global sampling error of the tree

Estimated error of tree is based on use of confidence intervals, which is estimated based on how many times each (s,a,d) is sampled (isn’t constant like in SS). Errors have to be propagated through the tree

Show empirical example where performance degrades with increased horizon