Ideas from Fitted Q-iteration in continuous action-space MDPs: Andras Antos, Csaba Szepesvari, Remi Munos

Paper outlines a modified fitted q-iteration algorithm in stochastic, continuous state, action spaces

Requires a set of possible policies to be selected from

Requires some smoothness assumptions, such as Lipschitz, points out that in general, smoothness is probably required for any solving of continuous MDPs

Policy search generally done by some gradient method

Quite a few (9) assumptions are made, but I don’t understand them all well

Main contributions:

First finite-time bounds for continuous-state and actionspace
RL that uses value functions

Frst analysis of fitted Q-iteration, an algorithm that has proved to be useful in a number of cases, even when used with non-averagers for which no previous theoretical analysis existed

Cool. Author(s)?