Learning Inadmissible Heuristics During Search. Jordan T. Thayer, Austin Dionne, Wheeler Ruml

  1. Goal is to transform admissible heuristics to more effective (faster solve times with potential suboptimal results) heuristics online.  This has been done before, but only offline
  2. The method used to change heuristic is TD.  John mentioned this feels like dealing with the Bellman residual – first thing I thought of too.
  3. Use 4 features: cost at arriving at node n from root, estimate of cost-to-go from n to terminal state, depth(n), the number of transitions needed to get from root to n, and d(n) the estimated number of transitions needed to get from n to goal
  4. Doesn’t seem to be a very significant performance difference between online and offline methods tested
  5. They ran experiments on 8-tile puzzle.  Also, vacuum world, which is close to my heart.
  6. Actual CPU time using the real heuristic is better than the learned one, after that, offline LMS(?) is best.  The comparisons, are a bit apples and oranges though it seems
  7. Extremely wide error bars, hard to draw conclusions from the empirical results
  8. They give an example where the proposed approach fails because of the single step updates
  9. Talk about search along multiple heuristics, and that this approach doesn’t function directly in that setting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: