- Is concerned with multidimensional continuous state and action spaces
- Can just discretize s,a dimensions and get guarantees as to upper bounds on the approximation error.
- Lipshitz smoothness is assumed.
- They define a problem instance in terms of <T, R, γ, ε>, s.t. given these things, the result should be an estimated value function that is ε close to optimal.
- Assume the program partitions the space into hypercubes of side length h, so they have volume h^n for a dimension n.
- Concerned with the computational complexity, and not the sample complexity, as the MDP is assumed known already so alot of it isn’t so relevant – I won’t be paying close attention to the computational issues.
- Their bounds are tight, they do this by having two different MDPs that are as different as possible given the assumptions and data already gathered.
- They split cells into those that are well sampled, and those that are badly sampled, like Rmax.