Simulation-Based LQR-Trees with Input and State Constraints. Reist, Tedrake. ICRA 2010

Seems like this paper presents a method to calculate the size of the LQR funnels using a simulation approach, and not the SoS (sum of squares) optimization.

It is easier to implement

The algorithm is inspired by RRT “Randomized motion planning demonstrated that it can find feasible trajectories for nonlinear systems in nonconvex, high dimensional search problems”

They compare this to the work of Atkeson which uses trajectories as a sparse representation of the global value function (this is what was used by tassa, erez)

There, trajectories are added based on how well two adjacent trajectories agree on the value function between them

Then nearest-neighbor lookup is used there. Here they use time dependency

The algorithm here isn’t interested in optimality or estimating the global value function; the goal is to produce a scalable algorithm that produces feasible policies

Says that Atkeson’s approach doesn’t scale well because in cart pole it uses trees with anywhere from 20,000 to 200,000 nodes, which is equivalent to 12 to 21 discretizations per dimension, so it doesn’t help that much

This method works in discrete time, whereas the orig is continuous time

They use Time-Invariant LQR for planning, not sure how it is different from regular LQR

The regions are constructed by falsification

Seems like what they do is apply the policy for the original trajectory from points further away and then see when those trajectories don’t end up in the next funnel anymore – oh they also do another type of check each state along the trajectory

Funnels only shrink and dont expand, so they start covering the entire space and then shrink down

Empirical results on swing-up and cart-pole

Took 32 hours to finish running cart-pole, although they run it until 5000 trajectories finish successfully, which is a large number

Swing-up takes only half an hour, so this takes 64 times as long, which shows that when you do learning, the curse of dimensionality can really hurt.

They mention they are still using hyper-ellipses to get coverage, which is probably not optimal