Interpolation-based Q-Learning. Szepesvari, Smart


Is written in Csaba, and isn’t 100% relevant to what I’m working on right now, so I’m just giving it a cursory read through

  • Addresses continuous state learning
  • Uses local FAs
  • Algorithm converges to Bellman fixed point if FA satisfies certain interpolation properties
  • Main results work by assuming stationary policy, then Θ (param to FA) converges to optimal s.t. FA satisfies Bellman
  • Multi-stage method is presented that yields estimates that converge to optimal Q
    • Allows requirement of fixed policy to be relaxed
  • Proof given for interpolative Q-Learning is said to be general enough to apply to other RL techniques where interpolative FAs are used
  • I believe uses basis points and needs a method to select them
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: