Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach. Atkeson, Morimoto. NIPS 2002.

  1. Paper reduces necessary computation time by using updates that change the first and second derivatives of the value function and first derivative of the policies
  2. Value function is represented by the values of a library of policies
  3. Also discusses how to extend the approach to periodic tasks, just doing so with goal states is discussed in an earlier paper, as well as how to do model learning at the same time.  I’m reading this one because it addresses tasks with hybrid state such as walking or hopping
  4. Also proposes to segment trajectories at discontinuities, which leads to discontinuities in the value function and policies
  5. Looks like they do projections to a lower dimension space at the points where the discontinuities occur, because it is possible for some reason
  6. They also do other searches for discontinuities aside from the areas where it is known they will exist
  7. They use locally weighted regression to construct the value function
  8. Discuss doing Taylor approximations of dynamics.
  9. “Given a trajectory, one can integrate the value function and its first and second spatial derivatives backwards in time to compute an improved value function and policy”
  10. All the equations are in terms of linear algebra, and assume smooth dynamics which is why trajectories must be partitioned at nonsmooth areas
  11. In domains with a goal or regulator tasks, trajectories are grown back from the goal points
  12. They mention a number of different ways of providing initial trajectories, for example, manually guided ones, or according to a policy
  13. After the initial set is given, more trajectories are added in the following method:
    1. Use the global policy if available
    2. Use the local policy from the nearest point (in the library?) of the same type of dynamics
    3. Use the local value function estimate and derivatives from that nearest point
  14. Eh, like the last paper of Atkeson I read, the paper is very hand-wavy and therefore not usable.  Stopping here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: