- Paper reduces necessary computation time by using updates that change the first and second derivatives of the value function and first derivative of the policies
- Value function is represented by the values of a library of policies
- Also discusses how to extend the approach to periodic tasks, just doing so with goal states is discussed in an earlier paper, as well as how to do model learning at the same time. I’m reading this one because it addresses tasks with hybrid state such as walking or hopping
- Also proposes to segment trajectories at discontinuities, which leads to discontinuities in the value function and policies
- Looks like they do projections to a lower dimension space at the points where the discontinuities occur, because it is possible for some reason
- They also do other searches for discontinuities aside from the areas where it is known they will exist
- They use locally weighted regression to construct the value function
- Discuss doing Taylor approximations of dynamics.
- “Given a trajectory, one can integrate the value function and its first and second spatial derivatives backwards in time to compute an improved value function and policy”
- All the equations are in terms of linear algebra, and assume smooth dynamics which is why trajectories must be partitioned at nonsmooth areas
- In domains with a goal or regulator tasks, trajectories are grown back from the goal points
- They mention a number of different ways of providing initial trajectories, for example, manually guided ones, or according to a policy
- After the initial set is given, more trajectories are added in the following method:
- Use the global policy if available
- Use the local policy from the nearest point (in the library?) of the same type of dynamics
- Use the local value function estimate and derivatives from that nearest point
- …

- Eh, like the last paper of Atkeson I read, the paper is very hand-wavy and therefore not usable. Stopping here.