## Learning, Inference, and Control for Robotics and Sustainable Energy. J. Zico Kolter. Talk

1. Gives example of little dog – easy to model kinematics, but full model is difficult with uncertain terrain, similar example with driving on poor surfaces
2. Can make the same argument about wind turbines (hard to model) or what happens in te home (hard to control)
3. But data is easy to get
4. 2 parts: data driven learning and control for dynamic tasks, and data driven control for sustainable energy

Part1

1. Say its very diificult to accurately make a model of robot dynamics from pure physics, so better to use data to (help) make a model
2. Say many planning problems can be helped just by looking at the sign of derivative terms. Do gradient descent just on that
3. Does this form of policy gradient to teach little dog to climb steps in about 5 minutes
4. Same issue with drift-parking. Cant use a dynamic model based on phystics because it misses particular bits, so you should combine observed data
5. Idea is dynamics is hard to model, but maneuver is repeatable over short horizons.  So parts can be open-loop
6. Result is molti-model LQR:
1. Use predictions errors over data to estimate model variance
2. Use variance- aware method (new iterative LQR method) to compute optimal controls

Part2

1. Generating energy from wind-turbines with a data-driven control approach (control right pitch of blades)
2. The models we have for wind dynamics is not accurate, and operates in very restricted conditions.  They really suck
3. Because of this online optimization is important.  Go about doing stochastic optimization
4. Care about data efficiency, satisfied with local optimim
5. “Trust region policy search” – use second order (Hessian) info to optimize.  Update param values by trust-region
1. Need to estimate Hessian, which is difficult, but can do important sampling on previous results to reduce the # of samples needed
2. Hessian may be indefinite, so use a trus region solver – fits a polynomial only locally as opposed to globally.  This can be solved exactly
3. Use something based on variance of gaussian used to sample to pick region
6. Beats up REINFORCE badly (but even back in the day REINFORCE was known to be a very sample inefficient algo), but indepenedent of that it does quickly climb up to the optimal region
7. Idea is to use the power consumption of the entire home at the power meter coming in instead of monitorig each outlet independently
8. Uses HMMs to model whether the state of each device is on or off.  Problem is current algs can’t deal with input sizes as large as what occurs in a home
9. Do spectral clustering on the data to identify what is actually happening in the house
10. Need a new alg to do tractable inference.  It is a convex approx inference methods that can be quickly solved for hudreds of thousands of variables