- Working on high-D continuous RL
- Builds a model with sparse Gaussian processes, and then does local (re)planning “by solving it as a constrained optimization problem”
- Use MPC/control related methods that were done back in ’04 but revisited here and can be used for real-time control now
- Test in “extended” cart-pole <all this means here is the start state is randomized> and quadcopter
- Don’t try to do MCTS, because it is expensive. Instead use gradient optimization
- Instead of normal O(n^3) costs for GPs, this has O(m^2n), whre m < n
- “However, as only the immediately preceding time steps are coupled through the equality constraints induced by the dynamics model, the stage-wise nature of such modelpredictive control problems result in a block-diagonal structure in the Karush-Kuhn-Tucker optimality conditions that admit efficient solution. There has recently been several highly optimized convex solvers for such stage-wise problems, on both linear (Wang and Boyd 2010) and linear-timevarying (LTV) (Ferreau et al. 2013; Domahidi et al. 2012) dynamics models.”
- Looks like the type of control they use has to linearize the model locally
- “For the tasks in this paper we only use quadratic objectives, linear state-action constraints and ignore second order approximations.”
- Use an off-the shelf convex solver for doing the MPC optimization
- Use warm starts for replanning
- The optimization converges in a handful of steps
- <Say they didn’t need to do exploration at all for the tasks they considered, but it looks like they have a pure random action period at first>
- Although the cart-pole is a simple task, they learn it in less than 5 episodes
- <But why no error bars, especially when this experiment probably takes a few seconds to run. This is crazy in a paper from 2015, although it is probably fine it makes me wonder if it sometimes fails to get a good policy>
- Use some domain knowledge to make learning the dynamics for the quadcopter a lower-dimensional problem
- 8D state, 2D action
- For quadcopter there is training data from a real quadcopter? fed in and then it is run in simulation
- “By combining sparse Gaussian process models with recent efficient stage-wise solvers from approximate optimal control we showed that it is feasible to solve challenging problems in real-time.”