Dyna-H: a heuristic planning RL algorithm applied to RPG strategy decision systems. Santos, Martin H., Lopez, Botella

  1. Paper makes a strange point in abstract, so perhaps a red-flag.  On the other hand I think I saw it because it was referenced by Remi on the RL newsgroup.  “We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of drems (e.g. nightmares)…” Whaaaa?
    1. Not sure its refereed as-is, paper is from arxiv, says preprint submitted to knowledge-based systems
  2. Basic idea is to do cheap planning for cases when A* is too expensive, or when there is uncertainty
  3. Dyna-H is basically dyna with a heuristic added (uses epsilon-greedy exploration as described)
  4. Does an approximate tree search and refines results iteratively, can work off of incomplete (i suppose also incorrect?) models of the environment, unlike A* which requires a correct model
  5. Algorithm tries to generate runs through the MDP from the worst (according to the heuristic(shaping)) trajectories
  6. Beats Dyna-Q handily, which is to be expected.
  7. The class of trajectories considered by the heuristic (cartesian) is a different class than from what is actually possible (manhattan), in this case its ok.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: