Interpolation-based Q-Learning. Szepesvari, Smart

Is written in Csaba, and isn’t 100% relevant to what I’m working on right now, so I’m just giving it a cursory read through

  • Addresses continuous state learning
  • Uses local FAs
  • Algorithm converges to Bellman fixed point if FA satisfies certain interpolation properties
  • Main results work by assuming stationary policy, then Θ (param to FA) converges to optimal s.t. FA satisfies Bellman
  • Multi-stage method is presented that yields estimates that converge to optimal Q
    • Allows requirement of fixed policy to be relaxed
  • Proof given for interpolative Q-Learning is said to be general enough to apply to other RL techniques where interpolative FAs are used
  • I believe uses basis points and needs a method to select them

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: