Predictive Projections. Sprague. IJCAI 09.

  1. algorithm for linear “projections in which accurate predictions of future states can be made using simple nearest neighbor style learning.”
  2. Experiments on simulated pendulum and real roomba robot
  3. Considers video data
  4. Based on earlier work (by Tesauro among others) in “which nearest neighbor learning is recast in a probabilistic framework that allows for gradient based optimization of the distance metric.  These existing algorithms discover projections of the training data under which nearby points are likely to have the same class label or similar regression targets.”
    1. Neighborhood Components Analysis (NCA) – for nearest neighbor classification
  5. “It would be difficult to directly find an A that minimizes the k-nearest-neighbor classification error rate, because the number of errors will be a highly discontinuous function of A; very small changes in A may change the set of nearest neighbors for some points. The innovation behind the NCA algorithm is to recast nearest neighbor learning in a probabilistic framework. In this framework, expected error is a continuous, differentiable function of A and thus may be minimized using gradient based techniques.”
  6. NCA was originally designed for classification but was later extended to regression
  7. A minor tweak is needed to get NCA for regression to work well on predicting vectors when the problem is stochastic
  8. Here they use conjugate gradient to do the optimization
  9. The approach can scale well; a paper from ’07 was able to run on 60k training points
  10. “Following this gradient acts to reduce the objective in two different ways. It adjusts A to be more predictive by increasing the probability that a neighbor will be chosen if it successfully predicts the next state. It also adjusts A to be more predictable, by moving target states together whenever there is a high probability they will be chosen to predict each other.”
  11. Important to whiten data first
  12. To deal with actions, partition data by action but constrain to use same projection <how is this different than just throwing out the action data>
  13. Only parameters to the algorithm are size of projection matrix A and its initial values
  14. “The predictive projections algorithm as described above may not perform well in cases where the effects of different actions are restricted to specific state dimensions. Since there is no explicit penalty for failing to predict some dimensions, the algorithm may minimize the objective function by finding an A which is not full rank, thus accurately predicting some dimensions while discarding others.”
    1. Although there is a potential fixes for this
  15. For the applications this method is coupled with LSPI (along with RBFs)
  16. Algorithm deals well with Lagoudakis+Parrs’ LSPI
  17. For the roomba experiment they put a camera on top – all the robot needs to do is to continue to move forwards without hitting the wall
    1. <Seems like not a terribly hard task just turn when the wall takes up most of the view?>
  18. temp
  19. Image is 20×20: 1200-d once color is factored in
    1. Then this is reduced via PCA to 25d
    2. Then their approach takes that to 2d (A is initialized to the first 2 principal components)
  20. Their approach basically recovers translation and rotation, while PCA recovers amount of wall present and lighting (their approach can use this to make a good policy, but that can’t be done linearly from what PCA produces)
  21. “To our knowledge, the proto-value function framework has not been applied to the type of noisy, high dimensional control problems addressed in this paper. It seems likely that the neighborhood calculations required for constructing the diffusion model could be dominated by noise dimensions, particularly in very noisy tasks such as the modified pendulum domain described above. In that case, the PVF approach and
    predictive projections would be complementary: The PP algorithm could find a low dimensional state projection that contains relevant state information, and the PVF algorithm could then be used to discover a set of appropriate basis functions in that space”
  22. “Another closely related project is the basis iteration algorithm described in [Sprague, 2007]. This algorithm also uses gradient based metric learning to discover an appropriate projection, but it focuses directly on finding a metric that allows for accurate estimation of the optimal value function. It accomplishes this by iterating value function estimation with updates to the projection matrix. This algorithm has the advantage of incorporating reward information, but it depends on starting with an initial projection that enables a reasonableestimate of the optimal value function.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: