- Paper outlines a modified fitted q-iteration algorithm in stochastic, continuous state, action spaces
- Requires a set of possible policies to be selected from
- Requires some smoothness assumptions, such as Lipschitz, points out that in general, smoothness is probably required for any solving of continuous MDPs
- Policy search generally done by some gradient method
- Quite a few (9) assumptions are made, but I don’t understand them all well
- Main contributions:
- First finite-time bounds for continuous-state and actionspace

RL that uses value functions - Frst analysis of fitted Q-iteration, an algorithm that has proved to be useful in a number of cases, even when used with non-averagers for which no previous theoretical analysis existed

- First finite-time bounds for continuous-state and actionspace

## Ideas from Fitted Q-iteration in continuous action-space MDPs: Andras Antos, Csaba Szepesvari, Remi Munos

**Tagged**2007, Andras Antos, Csaba Szepesvari, Fitted Q-iteration in continuous action-space MDPs, NIPS, Remi Munos

Cool. Author(s)?