Monte-Carlo Exploration for Deterministic Planning. Nakhost, Muller. IJCAI 2009.

  1. MC methods lead to breakthroughs in Go and general game playing
  2. Introduces planner ARVAND
  3. Local search are the most effective techniques for satisficing planning
    1. But they have trouble dealing with local minima and plateaus where the heuristic function is flat in a large neighborhood
  4. Generally, these types of search algorithms are greedy, so a heuristic that isn’t very good can lead search to areas that are ultimately unpromising
  5. Mentions exploration/exploitation tradeoff
  6. Monte-Carlo Random Walk (MRW) is based on two things:
    1. Action generation is orders of magnitude faster than state-of-the-art heuristic evaluation, so it often works well (in terms of wall-clock time) to try and move further and then evaluate the result with the heuristic less frequently
    2. Results from games (Go) show that exploration is also important
  7. Mention randomization in motion planning, and some other sutff
  8. Mention other methods for escaping local minima/plateaus (FF, and something called enforced hill climbing/EHC)
    1. An extension to EHC, Marvin, keeps track of sequences that lead to exits from plateaus and attempts to apply them again later
  9. Arvand may use heuristics generated by FF, so can be like FF with randomized exploration
  10. Variants of MRW may move away from chance exploration to bias action selection to what was effective previously
  11. # of random walks, how long to make them are parameters
  12. Randomized exploration runs into trouble if the #actions is huge (>1000), or many random trajectories lead to dead ends.  Pruning is important in these cases
    1. They discuss something that has a couple of names, which comes down to softmax action selection
    2. In this paper, ARVAND starts with chance random walks, but falls back on these if that doesn’t make progress
  13. Empirical results discuss performance of this algorithm and competitors thoroughly, but I don’t have the background to make sense of it
  14. “To the best of our knowledge, this is the first successful use of Monte-Carlo search as the main search method of a classical planner. Contrary to greedy local search methods that try to immediately exploit their local knowledge, MRW planning explores the local search space by using random walks.”
Tagged , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: