<Achieved 5th of 27 planners in the area it competed in the IPPC>
- Motivation is that when doing best-first search, algorithm may hit plateaus, where the believed best solution doesn’t change for a long period of time
- Is a problem in hardcore search/optimization like SAT/CSP
- Goal then, is to find what is called an “exit state,” which is a state that will cause a change in the heuristic value
- The algorithm presented here, ROAMER, uses random search to help find an exit state when a plateau is hit
- Description contains mention of some search methods I don’t know about <more of use to SAT/CSP community than RL probably>, but seems to be based on an iterative version of A*
- Cite a paper that introduces MC random exploration for search
- Basically these style planners do some form of heuristic BFS
- In the case that the heuristic is perfect, a planner will expand only the path from start to goal, but even a small error in the heuristic function it may have to expand exponential number of states to find goal
- In the formulation heuristic values are positive and decrease as search deepens until the goal is found with value 0
- In cases considered, the heuristic values of s, s’ (its successor) are the same, so there is no “gradient”
- <The premise seems a little ironic here – a standard claim in optimization is that when you reach a plateau, there is no gradient to follow, and then the optimization enters a random walk and you are in trouble. Here, they say when you reach a plateau, get out of it through a random walk.>
- Reasons to do random walk:
- Better at jumping out of local minima
- Doesn’t require use of heuristic function, so can be cheaper to run
- Basically memory free, so does not add to cost of bestFS
- In SAT/CSP heuristic depends on the number of features not satisfied; in most cases changing the statement won’t satisfy an additional feature, so the heuristic value of both states are equivalent
- Mention tabu search and WalkSAT <only familiar with the first>
- When dealing with problems that have more helpful heuristics the random search is less useful.
- Best-first drives the search, but random search is invoked when deemed useful
- Basically if a plateau is detected (when to report a plateau is a parameter), does random walks (to a finite horizon) until a change in heuristic value occurs