On-line search for solving MDPs via Heuristic Sampling. Peret, Garcia. ECAI 2004

  1. Deals with sparse-sampling style algorithms
  2. Discusses lookahead pathologies, which is the case where (even with discounting, it seems), increasing the horizon H, while keeping the sample-width C constant eventually increases the error – this means a simple iterative deepening approach will eventually lead to poor decisions.
    1. I was familiar with this, but only in the context where tree-search is being performed and one (imperfect) evaluated value (not actual bounds) of the state is used in the leaf nodes.
  3. The have some demonstrations of this in action
  4. The idea in the paper is to make a good trade off between horizon and width by maintaining an estimation of the global sampling error on the tree.
    1. The estimation is a heuristic which indicates whether some state/actions need additional sampling
  5. Uses confidence intervals to estimate error of only the “finite sampling” (is this just the possible error on the expected value?).  Assumes gaussian distribution
  6. The errors are also propagated through the tree

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: