Animating the Dead: Computational Necromancy with Reinforcement Learning. Bill Smart, et al. 2008

His talk over at MSR.

  1. Do manifold learning
  2. Give up guarantees of global optimality
  3. Results on swimmer are “easy” for the algorithm because its easy to find the manifold
    1. Domains with cyclic actions tend to exist on a ring-type manifold/high-dimensional cylinder which connects back when the gait loops back to a certain point
  4. Manifolds are best in domains with constraints, gives up performance on arbitrary MDPs
  5. Works with reward function, not just path planning
  6. Control works by fitting a locally quadratic value function and planning a path through state space that gives good reward
  7. Robustness of controller depends on smoothness of domains
  8. Needs to be “bootstrapped” to reasonably good policies to learn in some domains:
    1. In swimmer domain things are smooth enough that it can start out doing random stuff and improve on that
    2. In the walker domain it has to start with a policy that can at least stand upright

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: