Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach. Atkeson, Morimoto. NIPS 2002.

Paper reduces necessary computation time by using updates that change the first and second derivatives of the value function and first derivative of the policies

Value function is represented by the values of a library of policies

Also discusses how to extend the approach to periodic tasks, just doing so with goal states is discussed in an earlier paper, as well as how to do model learning at the same time. I’m reading this one because it addresses tasks with hybrid state such as walking or hopping

Also proposes to segment trajectories at discontinuities, which leads to discontinuities in the value function and policies

Looks like they do projections to a lower dimension space at the points where the discontinuities occur, because it is possible for some reason

They also do other searches for discontinuities aside from the areas where it is known they will exist

They use locally weighted regression to construct the value function

Discuss doing Taylor approximations of dynamics.

“Given a trajectory, one can integrate the value function and its first and second spatial derivatives backwards in time to compute an improved value function and policy”

All the equations are in terms of linear algebra, and assume smooth dynamics which is why trajectories must be partitioned at nonsmooth areas

In domains with a goal or regulator tasks, trajectories are grown back from the goal points

They mention a number of different ways of providing initial trajectories, for example, manually guided ones, or according to a policy

After the initial set is given, more trajectories are added in the following method:

Use the global policy if available

Use the local policy from the nearest point (in the library?) of the same type of dynamics

Use the local value function estimate and derivatives from that nearest point

…

Eh, like the last paper of Atkeson I read, the paper is very hand-wavy and therefore not usable. Stopping here.