Learning to act using real-time dynamic programming. Barto, Bradtke, Singh

  • The (A)RDTP/(adaptive) real time dynamic programming paper
  • Bradtke’s thesis (Incremental Dynamic Programming for On-Line Adaptive Optimal Control) contains a better and more concise explanation of the algorithms
  • Related to an algorithm called LRTA*, which is Learning-Real-Time A*, but RDTP allows for uncertainty
  • Similar to VI but selectively updates values to save computational costs, and is adapted to motion planning style domains (action penalty with terminal goal states – a less general setting)
  • ARTDP is basically the same but doesn’t assume a generative model like RTDP does.  ARTDP builds an ML model of the MDP and plans from that
  • This has guarantees, but only at the limit
  • Paper also discusses Q-Learning and some modifications to it
    • “offline” where it works off a generative model
  • They show RTDP beating up on Q-Learning, but it would probably be more valid to compare to something else (Perhaps Dyna/Queue Dyna if Queue Dyna existed at the time)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: