- Working on high-D continuous RL
- Builds a model with sparse Gaussian processes, and then does local (re)planning “by solving it as a constrained optimization problem”
- Use MPC/control related methods that were done back in ’04 but revisited here and can be used for real-time control now
- Test in “extended” cart-pole <all this means here is the start state is randomized> and quadcopter
- Don’t try to do MCTS, because it is expensive. Instead use gradient optimization
- Instead of normal O(n^3) costs for GPs, this has O(m^2n), whre m < n
- “However, as only the immediately preceding time steps are coupled through the equality constraints induced by the dynamics model, the stage-wise nature of such modelpredictive control problems result in a block-diagonal structure in the Karush-Kuhn-Tucker optimality conditions that admit efficient solution. There has recently been several highly optimized convex solvers for such stage-wise problems, on both linear (Wang and Boyd 2010) and linear-timevarying (LTV) (Ferreau et al. 2013; Domahidi et al. 2012) dynamics models.”
- Looks like the type of control they use has to linearize the model locally
- “For the tasks in this paper we only use quadratic objectives, linear state-action constraints and ignore second order approximations.”
- Use an off-the shelf convex solver for doing the MPC optimization
- Use warm starts for replanning
- The optimization converges in a handful of steps
- <Say they didn’t need to do exploration at all for the tasks they considered, but it looks like they have a pure random action period at first>
- Although the cart-pole is a simple task, they learn it in less than 5 episodes
- <But why no error bars, especially when this experiment probably takes a few seconds to run. This is crazy in a paper from 2015, although it is probably fine it makes me wonder if it sometimes fails to get a good policy>

- Use some domain knowledge to make learning the dynamics for the quadcopter a lower-dimensional problem
- 8D state, 2D action

- For quadcopter there is training data from a real quadcopter? fed in and then it is run in simulation
- “By combining sparse Gaussian process models with recent efficient stage-wise solvers from approximate optimal control we showed that it is feasible to solve challenging problems in real-time.”

Advertisements
(function(){var c=function(){var a=document.getElementById("crt-72161882");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:388248,containerid:"crt-72161882",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();
(function(){var c=function(){var a=document.getElementById("crt-1984836173");window.Criteo?(a.parentNode.style.setProperty("display","inline-block","important"),a.style.setProperty("display","block","important"),window.Criteo.DisplayAcceptableAdIfAdblocked({zoneid:837497,containerid:"crt-1984836173",collapseContainerIfNotAdblocked:!0,callifnotadblocked:function(){a.style.setProperty("display","none","important");a.style.setProperty("visbility","hidden","important")}})):(a.style.setProperty("display","none","important"),a.style.setProperty("visibility","hidden","important"))};if(window.Criteo)c();else{if(!__ATA.criteo.script){var b=document.createElement("script");b.src="//static.criteo.net/js/ld/publishertag.js";b.onload=function(){for(var a=0;a<__ATA.criteo.cmd.length;a++){var b=__ATA.criteo.cmd[a];"function"===typeof b&&b()}};(document.head||document.getElementsByTagName("head")[0]).appendChild(b);__ATA.criteo.script=b}__ATA.criteo.cmd.push(c)}})();