Linear Complementarity for Regularized Policy Evaluation and Improvement. Johns, Painter-Wakefield, Parr. NIPS 2010

Based on a very rapid read-through before I go home

  1. Also for doing L1 regularization for feature selection
  2. Here, the problem is formulated as “… a linear complementarity problem (LCP).” <I don’t know what that means off the top of my head>
  3. Similar to LARS-TD, but has a few advantages
    1. Easier to apply off-the shelf tools
    2. Can be warm started
  4. LCPs can describe any convex quadratic problem, even though the basic setup is pretty simple
  5. Turns out this approach is mathematically very similar to LARS but this method is computationally cheaper
    1. <Confusing they say that> Their approach is more expensive than LARS
  6. Also, LARQ (their proposal) can get “stuck” and its difficult to correct. “This occurs when the greedy policy for a particular β is not well defined.  In such cases, the algorithm attempts to switch to a new policy immediately following a policy change.  This problem is not unique to LARQ.  Looping is possible with most approximate policy iteration algorithms.   What makes it particularly troublesome for LARQ is that there are few satisfying ways of addressing this issue without sacrificing the invariants.”
  7. They present some changes that attempt to work around these issues (for example it can’t get stuck), and call this algorithm LC-MPI
  8. <Only got to here>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: