Another cursory read through

- Considers case where #features is larger than #samples
- Analyzes LSTD when features are random projections
- Theoretical paper that provides performance bounds
- LSTD finds the value function of a fixed policy (that is why algorithms like LSTD with L1 regularization are paired with some form of policy iteration, often LSPI)
- Show the impact of errors during subsequent steps of DP
- Perfomance bound given for resulting LSPI algorithm
- Tradeoff is reduction in estimation <testing?> error but increase in approximation <training?> error
- Works off of data from a trajectory
- Cost of LSTD-RP is O(
*d*^{3}+*ndD*)*d*is dimension of projected space,*D*is dimension of original space,*n*is number of samples in trajectory

- Vanilla LSTD has cost of O(D
^{3}+*nD*)^{2} - Analysis of performance shows
*d*should be √n- In this case, LSTD-RP cost is O(
*n*^{3/2}*D*) which is better than O(*D*^{3}) when*n*<*D* - On the other hand, cost of prediction in LSTD-RP is O(
*dD*), whereas in LSTD its O(*D*)

- In this case, LSTD-RP cost is O(
- They give a horribly complex error bound
- Assume random projections drawn from 0-centered Gaussian
- There is also a simple bound given for L2 regularized LSTD
- A crazily complex bound is given for LSTD-RP with LSPI

Advertisements