
algorithm for linear “projections in which accurate predictions of future states can be made using simple nearest neighbor style learning.”
 Experiments on simulated pendulum and real roomba robot
 Considers video data
 Based on earlier work (by Tesauro among others) in “which nearest neighbor learning is recast in a probabilistic framework that allows for gradient based optimization of the distance metric. These existing algorithms discover projections of the training data under which nearby points are likely to have the same class label or similar regression targets.”
 Neighborhood Components Analysis (NCA) – for nearest neighbor classification
 “It would be difficult to directly find an A that minimizes the knearestneighbor classification error rate, because the number of errors will be a highly discontinuous function of A; very small changes in A may change the set of nearest neighbors for some points. The innovation behind the NCA algorithm is to recast nearest neighbor learning in a probabilistic framework. In this framework, expected error is a continuous, differentiable function of A and thus may be minimized using gradient based techniques.”
 NCA was originally designed for classification but was later extended to regression
 A minor tweak is needed to get NCA for regression to work well on predicting vectors when the problem is stochastic
 Here they use conjugate gradient to do the optimization
 The approach can scale well; a paper from ’07 was able to run on 60k training points
 “Following this gradient acts to reduce the objective in two different ways. It adjusts A to be more predictive by increasing the probability that a neighbor will be chosen if it successfully predicts the next state. It also adjusts A to be more predictable, by moving target states together whenever there is a high probability they will be chosen to predict each other.”
 Important to whiten data first
 To deal with actions, partition data by action but constrain to use same projection <how is this different than just throwing out the action data>
 Only parameters to the algorithm are size of projection matrix A and its initial values
 “The predictive projections algorithm as described above may not perform well in cases where the effects of different actions are restricted to specific state dimensions. Since there is no explicit penalty for failing to predict some dimensions, the algorithm may minimize the objective function by finding an A which is not full rank, thus accurately predicting some dimensions while discarding others.”
 Although there is a potential fixes for this
 For the applications this method is coupled with LSPI (along with RBFs)
 Algorithm deals well with Lagoudakis+Parrs’ LSPI
 For the roomba experiment they put a camera on top – all the robot needs to do is to continue to move forwards without hitting the wall
 <Seems like not a terribly hard task just turn when the wall takes up most of the view?>
 Image is 20×20: 1200d once color is factored in
 Then this is reduced via PCA to 25d
 Then their approach takes that to 2d (A is initialized to the first 2 principal components)
 Their approach basically recovers translation and rotation, while PCA recovers amount of wall present and lighting (their approach can use this to make a good policy, but that can’t be done linearly from what PCA produces)
 “To our knowledge, the protovalue function framework has not been applied to the type of noisy, high dimensional control problems addressed in this paper. It seems likely that the neighborhood calculations required for constructing the diffusion model could be dominated by noise dimensions, particularly in very noisy tasks such as the modified pendulum domain described above. In that case, the PVF approach and
predictive projections would be complementary: The PP algorithm could find a low dimensional state projection that contains relevant state information, and the PVF algorithm could then be used to discover a set of appropriate basis functions in that space”
 “Another closely related project is the basis iteration algorithm described in [Sprague, 2007]. This algorithm also uses gradient based metric learning to discover an appropriate projection, but it focuses directly on finding a metric that allows for accurate estimation of the optimal value function. It accomplishes this by iterating value function estimation with updates to the projection matrix. This algorithm has the advantage of incorporating reward information, but it depends on starting with an initial projection that enables a reasonableestimate of the optimal value function.”
Advertisements