LSTD With Random Projections. Ghavamzadeh, Lazaric, Maillard, Munos. NIPS 2010


Another cursory read through

  1. Considers case where #features is larger than #samples
  2. Analyzes LSTD when features are random projections
  3. Theoretical paper that provides performance bounds
  4. LSTD finds the value function of a fixed policy (that is why algorithms like LSTD with L1 regularization are paired with some form of policy iteration, often LSPI)
  5. Show the impact of errors during subsequent steps of DP
  6. Perfomance bound given for resulting LSPI algorithm
  7. Tradeoff is reduction in estimation <testing?> error but increase in approximation <training?> error
  8. Works off of data from a trajectory
  9. Cost of LSTD-RP is O(d3+ndD)
    1. d is dimension of projected space, D is dimension of original space, n is number of samples in trajectory
  10. Vanilla LSTD has cost of O(D3+nD2)
  11. Analysis of performance shows d should be √n
    1. In this case, LSTD-RP cost is O(n3/2D) which is better than O(D3) when n < D
    2. On the other hand, cost of prediction in LSTD-RP is O(dD), whereas in LSTD its O(D)
  12. They give a horribly complex error bound
  13. Assume random projections drawn from 0-centered Gaussian
  14. There is also a simple bound given for L2 regularized LSTD
  15. A crazily complex bound is given for LSTD-RP with LSPI
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: