Simulation-Based LQR-Trees with Input and State Constraints. Reist, Tedrake. ICRA 2010

  1. Seems like this paper presents a method to calculate the size of the LQR funnels using a simulation approach, and not the SoS (sum of squares) optimization.
    1. It is easier to implement
  2. The algorithm is inspired by RRT “Randomized motion planning demonstrated that it can find feasible trajectories for nonlinear systems in nonconvex, high dimensional search problems”
  3. They compare this to the work of Atkeson which uses trajectories as a sparse representation of the global value function (this is what was used by tassa, erez)
    1. There, trajectories are added based on how well two adjacent trajectories agree on the value function between them
    2. Then nearest-neighbor lookup is used there.  Here they use time dependency
  4. The algorithm here isn’t interested in optimality or estimating the global value function; the goal is to produce a scalable algorithm that produces feasible policies
  5. Says that Atkeson’s approach doesn’t scale well because in cart pole it uses trees with anywhere from 20,000 to 200,000 nodes, which is equivalent to 12 to 21 discretizations per dimension, so it doesn’t help that much
  6. This method works in discrete time, whereas the orig is continuous time
  7. They use Time-Invariant LQR for planning, not sure how it is different from regular LQR
  8. The regions are constructed by falsification
  9. Seems like what they do is apply the policy for the original trajectory from points further away and then see when those trajectories don’t end up in the next funnel anymore – oh they also do another type of check each state along the trajectory
  10. Funnels only shrink and dont expand, so they start covering the entire space and then shrink down
  11. Empirical results on swing-up and cart-pole
  12. Took 32 hours to finish running cart-pole, although they run it until 5000 trajectories finish successfully, which is a large number
    1. Swing-up takes only half an hour, so this takes 64 times as long, which shows that when you do learning, the curse of dimensionality can really hurt.
  13. They mention they are still using hyper-ellipses to get coverage, which is probably not optimal

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: