- Paper makes a strange point in abstract, so perhaps a red-flag. On the other hand I think I saw it because it was referenced by Remi on the RL newsgroup. “We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of drems (e.g. nightmares)…” Whaaaa?
- Not sure its refereed as-is, paper is from arxiv, says preprint submitted to knowledge-based systems
- Basic idea is to do cheap planning for cases when A* is too expensive, or when there is uncertainty
- Dyna-H is basically dyna with a heuristic added (uses epsilon-greedy exploration as described)
- Does an approximate tree search and refines results iteratively, can work off of incomplete (i suppose also incorrect?) models of the environment, unlike A* which requires a correct model
- Algorithm tries to generate runs through the MDP from the worst (according to the heuristic(shaping)) trajectories
- Beats Dyna-Q handily, which is to be expected.
- The class of trajectories considered by the heuristic (cartesian) is a different class than from what is actually possible (manhattan), in this case its ok.