Brief notes from a very light read – look at the authors and you know that you can either read this paper in 5 minutes, or 5 weeks

- Theoretical results encompass both LARS-TD and LC-TD. They call their method Lasso-TD.
- Methods form a contraction mapping, so they converge
- A Lasso-TD solution always exists (although it may not be unique) and has a unique value function
- Performance bounds are derived “estimation error with the rate O(||α||
_{1}n^{-1/4}).” Additional assumptions improve this to O(||α||_{0}n^{-1/2}) - Theorem connects “… prediction performance of Lasso-TD (which is an L1-regularized fixed point problem) and the L1-norm of the best approximation of V in Fn.”
- As number of features
*d*increases wrt number of samples*n*, performance of vanilla LSTD gets continually worse, while Lasso-TD only degrades at (~O(log*d*)) - “… in order for Lasso-TD to be effective, a sparse approximation of the value function must exist.”
- They go on to talk about some properties of an MDP that would cause this property to exist

- Domains with smooth rewards and dynamics will have smooth value functions.
- “From approximation theory, we know that piecewise-smooth functions can be accurately approximated by localized features, such as wavelets (…). As a result, if we define
*F*as the linear function space spanned by a set of*d*wavelets, any value function is likely to be well-approximated by functions in*F*with a relatively small number of non-zero coefficients.”

- “From approximation theory, we know that piecewise-smooth functions can be accurately approximated by localized features, such as wavelets (…). As a result, if we define
- Also mentions LSTD-RP (LSTD with random projections <on my reading list>).
- In the general unrestricted case, LSTD-RP works better than Lasso-TD.
- When sparsity does exist, however, Lasso-TD works better

Advertisements