Brief notes from a very light read – look at the authors and you know that you can either read this paper in 5 minutes, or 5 weeks
- Theoretical results encompass both LARS-TD and LC-TD. They call their method Lasso-TD.
- Methods form a contraction mapping, so they converge
- A Lasso-TD solution always exists (although it may not be unique) and has a unique value function
- Performance bounds are derived “estimation error with the rate O(||α||1 n -1/4).” Additional assumptions improve this to O(||α||0 n -1/2)
- Theorem connects “… prediction performance of Lasso-TD (which is an L1-regularized fixed point problem) and the L1-norm of the best approximation of V in Fn.”
- As number of features d increases wrt number of samples n, performance of vanilla LSTD gets continually worse, while Lasso-TD only degrades at (~O(log d))
- “… in order for Lasso-TD to be effective, a sparse approximation of the value function must exist.”
- They go on to talk about some properties of an MDP that would cause this property to exist
- Domains with smooth rewards and dynamics will have smooth value functions.
- “From approximation theory, we know that piecewise-smooth functions can be accurately approximated by localized features, such as wavelets (…). As a result, if we define F as the linear function space spanned by a set of d wavelets, any value function is likely to be well-approximated by functions in F with a relatively small number of non-zero coefficients.”
- Also mentions LSTD-RP (LSTD with random projections <on my reading list>).
- In the general unrestricted case, LSTD-RP works better than Lasso-TD.
- When sparsity does exist, however, Lasso-TD works better