- Deals with developing policies in very high dimensional state spaces
- Proposes a linear dimensionality reduction (projection?) algorithm that discovers predictive projections
- A predictive projection is a prediction of future states by nearest-neighbor learning
- Consider robotics where state information is camera input – this raw state is too high D to work in natively
- Want something that can reduce the dimension planning has to take place it
- In the projected space, the idea is that the same action in two similar projected states should produce a similar outcome
- Work here is based on Gradient-based distance metric learning
- More specifically, based on something called Neighborhood Components Analysis (NCA) which minimizes error for nearest-neighbor classification
- Projection is made to maintain accuracy on estimates of future states.
- Problem is that because it is a least-squares (I think) metric, noise and outliers cause problems, so another trick has to be used
- “The predictive projections algorithm as described above may not perform well in cases where the effects of different actions are restricted to specific state dimensions.”
- They use LSPI to do learn the policy, test on Lagoudakis’ pendulum
- Other stuff, skipping. I think I found the wrong paper…