Results on swimmer are “easy” for the algorithm because its easy to find the manifold

Domains with cyclic actions tend to exist on a ring-type manifold/high-dimensional cylinder which connects back when the gait loops back to a certain point

Manifolds are best in domains with constraints, gives up performance on arbitrary MDPs

Works with reward function, not just path planning

Control works by fitting a locally quadratic value function and planning a path through state space that gives good reward

Robustness of controller depends on smoothness of domains

Needs to be “bootstrapped” to reasonably good policies to learn in some domains:

In swimmer domain things are smooth enough that it can start out doing random stuff and improve on that

In the walker domain it has to start with a policy that can at least stand upright