- Goal is to transform admissible heuristics to more effective (faster solve times with potential suboptimal results) heuristics online. This has been done before, but only offline
- The method used to change heuristic is TD. John mentioned this feels like dealing with the Bellman residual – first thing I thought of too.
- Use 4 features: cost at arriving at node n from root, estimate of cost-to-go from n to terminal state, depth(n), the number of transitions needed to get from root to n, and d(n) the estimated number of transitions needed to get from n to goal
- Doesn’t seem to be a very significant performance difference between online and offline methods tested
- They ran experiments on 8-tile puzzle. Also, vacuum world, which is close to my heart.
- Actual CPU time using the real heuristic is better than the learned one, after that, offline LMS(?) is best. The comparisons, are a bit apples and oranges though it seems
- Extremely wide error bars, hard to draw conclusions from the empirical results
- They give an example where the proposed approach fails because of the single step updates
- Talk about search along multiple heuristics, and that this approach doesn’t function directly in that setting