Category Archives: Constraint Satisfaction

Monte-Carlo Exploration for Deterministic Planning. Nakhost, Muller. IJCAI 2009.

  1. MC methods lead to breakthroughs in Go and general game playing
  2. Introduces planner ARVAND
  3. Local search are the most effective techniques for satisficing planning
    1. But they have trouble dealing with local minima and plateaus where the heuristic function is flat in a large neighborhood
  4. Generally, these types of search algorithms are greedy, so a heuristic that isn’t very good can lead search to areas that are ultimately unpromising
  5. Mentions exploration/exploitation tradeoff
  6. Monte-Carlo Random Walk (MRW) is based on two things:
    1. Action generation is orders of magnitude faster than state-of-the-art heuristic evaluation, so it often works well (in terms of wall-clock time) to try and move further and then evaluate the result with the heuristic less frequently
    2. Results from games (Go) show that exploration is also important
  7. Mention randomization in motion planning, and some other sutff
  8. Mention other methods for escaping local minima/plateaus (FF, and something called enforced hill climbing/EHC)
    1. An extension to EHC, Marvin, keeps track of sequences that lead to exits from plateaus and attempts to apply them again later
  9. Arvand may use heuristics generated by FF, so can be like FF with randomized exploration
  10. Variants of MRW may move away from chance exploration to bias action selection to what was effective previously
  11. # of random walks, how long to make them are parameters
  12. Randomized exploration runs into trouble if the #actions is huge (>1000), or many random trajectories lead to dead ends.  Pruning is important in these cases
    1. They discuss something that has a couple of names, which comes down to softmax action selection
    2. In this paper, ARVAND starts with chance random walks, but falls back on these if that doesn’t make progress
  13. Empirical results discuss performance of this algorithm and competitors thoroughly, but I don’t have the background to make sense of it
  14. “To the best of our knowledge, this is the first successful use of Monte-Carlo search as the main search method of a classical planner. Contrary to greedy local search methods that try to immediately exploit their local knowledge, MRW planning explores the local search space by using random walks.”
Tagged , ,

Towards a Second Generation Random Walk Planner: An Experimental Exploration. Nakhost, Muller. IJCAI 2013

  1. Focuses on best-first planners that use randomization to break out of regions of the search space that have a flat heuristic value
  2. Four major insights claimed
    1. Better to evaluate states frequently as opposed to just at the end of search
    2. Better to allow parameters to intelligently adjust on the fly
    3. Better to bias action selection based on current state as opposed to the simple heuristic of how useful it was throughout the entire history
    4. “Even simple forms of random walk planning can compete with GBFS.”
  3. Sat-searchers work almost exclusively based on heuristics.  IPC 2011 had 25 of 27 entries in that category.
    1. Because they use heuristics they don’t really explore; when heuristics are bad lack of search driven by other methods means that solution quality is poor
  4. The same authors, in 2009 introduced Monte-Carlo Random Walks, which run finite-length random walks, and then choose one course of action based on leaf evaluation
  5. Mention RRTs, as well as Roamer.
  6. The planner they discuss is called Arvand-2013, built on top of something called Fast Downward algorithm
  7. A fair amount of discussion of parameter values that worked well for particular different domains from IPC
  8. <Too results specific based on IPC domains, skipping>

The Roamer Planner Random-Walk Assisted Best-First Search. IPPC Competition 2011.

<Achieved 5th of 27 planners in the area it competed in the IPPC>

  1. Motivation is that when doing best-first search, algorithm may hit plateaus, where the believed best solution doesn’t change for a long period of time
    1. Is a problem in hardcore search/optimization like SAT/CSP
  2. Goal then, is to find what is called an “exit state,” which is a state that will cause a change in the heuristic value
  3. The algorithm presented here, ROAMER, uses random search to help find an exit state when a plateau is hit
  4. Description contains mention of some search methods I don’t know about <more of use to SAT/CSP community than RL probably>, but seems to be based on an iterative version of A*
  5. Cite a paper that introduces MC random exploration for search
  6. Basically these style planners do some form of heuristic BFS
  7. In the case that the heuristic is perfect, a planner will expand only the path from start to goal, but even a small error in the heuristic function it may have to expand exponential number of states to find goal
  8. In the formulation heuristic values are positive and decrease as search deepens until the goal is found with value 0
    1. In cases considered, the heuristic values of s, s’ (its successor) are the same, so there is no “gradient”
  9. <The premise seems a little ironic here – a standard claim in optimization is that when you reach a plateau, there is no gradient to follow, and then the optimization enters a random walk and you are in trouble.  Here, they say when you reach a plateau, get out of it through a random walk.>
  10. Reasons to do random walk:
    1. Better at jumping out of local minima
    2. Doesn’t require use of heuristic function, so can be cheaper to run
    3. Basically memory free, so does not add to cost of bestFS
  11. In SAT/CSP heuristic depends on the number of features not satisfied; in most cases changing the statement won’t satisfy an additional feature, so the heuristic value of both states are equivalent
  12. Mention tabu search and WalkSAT <only familiar with the first>
  13. When dealing with problems that have more helpful heuristics the random search is less useful.
  14. Best-first drives the search, but random search is invoked when deemed useful
  15. Basically if a plateau is detected (when to report a plateau is a parameter), does random walks (to a finite horizon) until a change in heuristic value occurs