On-line search for solving MDPs via Heuristic Sampling. Peret, Garcia. ECAI 2004

Deals with sparse-sampling style algorithms

Discusses lookahead pathologies, which is the case where (even with discounting, it seems), increasing the horizon H, while keeping the sample-width C constant eventually increases the error – this means a simple iterative deepening approach will eventually lead to poor decisions.

I was familiar with this, but only in the context where tree-search is being performed and one (imperfect) evaluated value (not actual bounds) of the state is used in the leaf nodes.

The have some demonstrations of this in action

The idea in the paper is to make a good trade off between horizon and width by maintaining an estimation of the global sampling error on the tree.

The estimation is a heuristic which indicates whether some state/actions need additional sampling

Uses confidence intervals to estimate error of only the “finite sampling” (is this just the possible error on the expected value?). Assumes gaussian distribution