Learning Inadmissible Heuristics During Search. Jordan T. Thayer, Austin Dionne, Wheeler Ruml

Goal is to transform admissible heuristics to more effective (faster solve times with potential suboptimal results) heuristics online. This has been done before, but only offline
The method used to change heuristic is TD. John mentioned this feels like dealing with the Bellman residual – first thing I thought of too.
Use 4 features: cost at arriving at node n from root, estimate of cost-to-go from n to terminal state, depth(n), the number of transitions needed to get from root to n, and d(n) the estimated number of transitions needed to get from n to goal
Doesn’t seem to be a very significant performance difference between online and offline methods tested
They ran experiments on 8-tile puzzle. Also, vacuum world, which is close to my heart.
Actual CPU time using the real heuristic is better than the learned one, after that, offline LMS(?) is best. The comparisons, are a bit apples and oranges though it seems
Extremely wide error bars, hard to draw conclusions from the empirical results
They give an example where the proposed approach fails because of the single step updates
Talk about search along multiple heuristics, and that this approach doesn’t function directly in that setting

Ari Weinstein's Research