- Gives example of little dog – easy to model kinematics, but full model is difficult with uncertain terrain, similar example with driving on poor surfaces
- Can make the same argument about wind turbines (hard to model) or what happens in te home (hard to control)
- But data is easy to get
- 2 parts: data driven learning and control for dynamic tasks, and data driven control for sustainable energy

Part1

- Say its very diificult to accurately make a model of robot dynamics from pure physics, so better to use data to (help) make a model
- Say many planning problems can be helped just by looking at the sign of derivative terms. Do gradient descent just on that
- Does this form of policy gradient to teach little dog to climb steps in about 5 minutes
- Same issue with drift-parking. Cant use a dynamic model based on phystics because it misses particular bits, so you should combine observed data
- Idea is dynamics is hard to model, but maneuver is repeatable over short horizons. So parts can be open-loop
- Result is molti-model LQR:
- Use predictions errors over data to estimate model variance
- Use variance- aware method (new iterative LQR method) to compute optimal controls

Part2

- Generating energy from wind-turbines with a data-driven control approach (control right pitch of blades)
- The models we have for wind dynamics is not accurate, and operates in very restricted conditions. They really suck
- Because of this online optimization is important. Go about doing stochastic optimization
- Care about data efficiency, satisfied with local optimim
- “Trust region policy search” – use second order (Hessian) info to optimize. Update param values by trust-region
- Need to estimate Hessian, which is difficult, but can do important sampling on previous results to reduce the # of samples needed
- Hessian may be indefinite, so use a trus region solver – fits a polynomial only locally as opposed to globally. This can be solved exactly
- Use something based on variance of gaussian used to sample to pick region

- Beats up REINFORCE badly (but even back in the day REINFORCE was known to be a very sample inefficient algo), but indepenedent of that it does quickly climb up to the optimal region
- Idea is to use the power consumption of the entire home at the power meter coming in instead of monitorig each outlet independently
- Uses HMMs to model whether the state of each device is on or off. Problem is current algs can’t deal with input sizes as large as what occurs in a home
- Do spectral clustering on the data to identify what is actually happening in the house
- Need a new alg to do tractable inference. It is a convex approx inference methods that can be quickly solved for hudreds of thousands of variables

Advertisements