On-line search for solving Markov Decision Processes via heuristic sampling. Peret, Garcia

  1. Basic idea is a method of refining policies via online search techniques, finding a policy for a query state
  2. Method is designed to be anytime
  3. Based on receding horizon search; search to a fixed depth, but then replan at every step
    1. But as planning increases the horizon is expanded
  4. Idea of lookahead pathologies; a lookahead is pathological if looking deeper increases the chance of choosing a suboptimal action
  5. Goal is to do something like sparse sampling, but more efficiently
  6. Discuss LRTA* and LAO*
  7. Says that in sparse sampling, increasing H while holding C constantly eventually increases the error rate, so that iterative deepening methods will eventually cause a poor decision to be made
  8. Say it is important to balance C and H
  9. They sample full trajectories instead of building an complete SS tree.
  10. Horizon is dynamically controlled by the estimated global sampling error of the tree
    1. Estimated error of tree is based on use of confidence intervals, which is estimated based on how many times each (s,a,d) is sampled (isn’t constant like in SS). Errors have to be propagated through the tree
  11. Show empirical example where performance degrades with increased horizon

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: