Category Archives: Continuous state spaces

Cross-Entropy open loop planning applied to noisy humanoid walking

In this experiment, both the domain and the generative model are noisy.  Noise is introduced by applying  a uniformly distributed noise to one randomly selected joint at each time step.  The noise is not “well-behaved” as it is not gaussian, and not zero-mean.  I believe the results look better than the previous post, where the domain was noisy, but the generative model was not.


Sufficient Conditions for the Existence of Resolution Complete Planning Algorithms. Yershov, LaValle. Proceedings Workshop on Algorithmic Foundations of Robotics (WAFR), 2010

  1. Kobilarov cited this paper saying the following: “Conditions for resolution completeness of planning with such primitives [parameterizations of open-loop control with actions and corresponding time durations] have been established.”
  2. Establishes metric spaces over the control and trajectory spaces
  3. Based on these metrics, if the domain is Lipshitz, then the mapping between open loop control and trajectories is continuous
  4. Because of this, it is possible to search the space of actions, as opposed to building a reachability graph.
  5. This algorithm has guarantees like most motion planning algs, which is that if a solution exists, it will be found eventually
  6. Not reading this carefully, because I think I already got what I need out of this paper.  Most of it is proofs, which I dont really care about – just what is proved.
  7. Requires continuity, which goes with Lipshitz smoothness
  8. Algorithm is very simple
  9. The algorithm effectively deals with noise, as there is a term that deals with error that arises from integration errors.

Anytime Computation of Time-Optimal Off-Road Vehicle Maneuvers using the RRT*. Jeon, Karaman, Frazzoli. Decision and Control.

  1. An empirical application of RRT*, this paper finds trail-breaking as something that naturally emerges from RRT*
  2. I was excited because I thought this application would use stochasticity, but looks like it is a deterministic simulation.
    1. They give the dynamics equations
  3. There have been methods that applied numerical optimization from optimal control theory to this domain, but it requires “careful transcription to numerical optimization formulations.  Most algorithms of this class must be started with a feasible initial solution, which is usually hard to generate in the first place.”
    1. Also some suffer from local optimality (may be same as requiring initial solution)
    2. Cites applications of optimal control to the domain, as well as a paper that compares other methods
  4. In this application, they plan in the “task space”, which is some lower dimensional representation of the original problem that doesn’t throw out information
  5. Plan in state space, and then require a method which will near-optimally nearly-steer between the two states
    1. The one they use is based on shooting, the one they use requires another function that subdivides the input and increments or decrements it as needed
    2. That is a pretty difficult problem in itself
    3. Reminds me of binary action search
  6. Uses branch and bound/admissible heuristics
  7. Approach requires distance metric, as all RRT methods do

Adaptive Path Planning in a Dynamic Environment using a Receding Horizon Probabilistic Roadmap: Experimental Demonstration

  1. If dynamics and constraints are nonlinear, there is no closed-form analytical solution
  2. Gradient and potential field methods can be useful in nonlinear domains, with restricted observability, but are strongly influenced by initial search location, as nonlinear constraints can introduce many maxima/minima
    1. Gradient methods may require special differentiation because of non-smooth constraints
  3. The main idea of the paper is to use PRMs to build a global connectivity graph, which may need to have edges removed as the agent only senses obstacles nearby, so replanning is still needed as locations that were thought to be free turn out to be closed
    1. Planning is done purely in state space, then motion primitives stitch the states together.
  4. They plan in a quadratic cost function
  5. The use a helicopter domain that they provide simple dynamics equations for
  6. The motion primitives used for planning ensure the helicopter does not enter a bad state
  7. They also tested the approach on an actual helicopter

Cross Entropy optimization applied to humanoid locomotion in a noisy domain with a deterministic model.

In this experiment, the true domain has a large amount of noise; between 10-50 newtons of force is randomly applied to one randomly selected joint at each time step.  During the rollouts, there is no such noise, so the planner must deal with unpredictable pertubations of the domain.

Cross-Entropy Motion Planning. Kobilarov. International Journal of Robotics Research. 2012.

  1. Not the same research as presented in
  2. This approach is for the noiseless setting
  3. The previous paper deals with using cross-entropy directly, while this paper discusses a combination of cross-entropy and RRT*
  4. The additional RRT* component is intended to help deal with the more difficult regions of planning “In particular, more complicated obstacles such as those associated with narrow passages significantly shrink the feasible regiouns in trajectory space and thus, in the abscence of a good prior, can render the method intractable due to the large number of rejected samples.”
  5. Considers open-loop planning, although mentions there are a two ways to parameterize trajectories:
    1. As a sequence of actions: “Conditions for resolution completeness of planning with such primitives [encoding of sequence of actions with time duration for each action] have been established (Yershov and LaValle 2011).
    2. As a sequence of states
    3. The paper outlines methods for doing both, as either may be better depending on the domain
  6. “In general, the problem cannot be solved in closed form since both the dynamics and constraints can be non-linear.  Gradient-based optimization is not suitable unless a good starting guess is chosen since the constraints impose many local minima.  In addition, constraints corresponding to arbitrary obstacles can be non-smooth and require special differentiation (Clarke et al. 1998) to guarantee convergence.”
  7. In sampling based motion planning, a graph is produced that is a finite approximation to the infinite set of feasible trajectories, so the original problem can be solved approximately through the graph
  8. Care about optimality at the limit.  PRMs approach optimality, but this occurs at an exponentially slow rate (defined how excatly?) due to the increasing number of edges+vertices “which in higher dimensions becomes computationally intractable.”
    1. So this paper cares about RRTs and not PRMs
  9. RRT* has a problem of uniformly distributing samples, and most of these samples don’t help at all.  By using cross-entropy optimization on top, it attempts to allocate samples along the path where the optimal trajectory may lie
    1. CE optimizes the method of sampling vertices
    2. RRT* is further optimized to always attempt to connect newly added nodes to the goal
  10. “Even though general theoretical convergence of the CE method has been shown [citations] actual rates of convergence, sample complexity, or precise performance guarantees remain open problems.”
  11. The difference between RRT* and cross-ent is that one samples in state space, the other in parameterized trajectories (such as action sequences)
  12. CE was originally designed for rare-event estimation. “The rare event of interest in this work is finding a parameter Z with a real-valued cost J(Z) which happens to be very close to the cost of an optimal parameter Z*… the rare-event estimation is equivalent to the global optimization of J(Z)”
  13. Uses Gaussain Mixture Models as the basis for CE optimizaiton, because it is easy to do with EM
    1. Although it can only capture as many local regions as components in the mixture, each Gaussian component can be regarded as an approximation of a local second-order model of the objective function centered at each mean (by considering the covariance as the inverse Hessian of the cost – I would like a citation/further discussion of this)
  14. Discusses connection to CE and CMA as well as another algorithm called EDA.  The connection to CMA was already discussed before in the CMA-ES paper.  Says the two are virtually identical in the Gaussian case.
  15. “In case when very few elite samples are available a classical GMM EM algorithm will fail to estimate correct means and covariances.”  So they add additional noise, as is common in Cross-Ent
  16. Gives examples of double integrator(actually in 2D, with obstacles), driving (dubins?),  and helicopter navigation
  17. In the helicopter domain, 85% of rollouts with regular CE are thrown out, so RRT* is added to make samples more useful
  18. In TCE RRT* (There is also an SCE proposed), examines correlations of states across entire trajectories
  19. In the helicopter, shows expected optimal trajectory as a result of 3 mixture components (I’m not sure what dimension, though).  The result is very complex in terms of the resulting path in the x,y space.
  20. Discusses other distributions that may make a good tradeoff between efficiency and accuracy.
  21. In a higher dimensional double-integrator domain with obstacles, the two algs here SCE-RRT*, and TCE-RRT* outperform RRT*.
  22. Use domain I havent heard of which is a weird version of a plane that has dubins steering in x,y and double integrator in z.

Dynamic Walking 2010. Russ Tedrake. Feedback Motion Planning via Sums-of-Squares Verification. Talk


  1. Walking has a lot that it can take from motion planning.
  2. Discusses the limit cycle of a walker that only controlls torque at the hip
    1. How do we make it robust?  It is fragile
    2. Its limit cycle exists in 2D
  3. How do we walk on rough terrain
  4. Why is this hard?
    1. Dynamic constraints
    2. Underactuated (can’t follow arbitrary trajectories)
    3. Limits on actuators limit torque and possible speed
  5. What works well?
    1. Trajectory optimization (handles constraints naturally)
    2. Local orbital stabilization (local, more difficult to deal with constraints)
    3. Global feasible control
    4. Control dealing with regions of attraction
  6. Talks about RRTs, example in inverted pendulum (dynamic constraints), and some optimizations to RRTs
  7. Then moves to little dog
    1. Plan in the space of “half bound” primitives, so not at the low-level
  8. These randomized methods can find solutions in these large dynamics domains
  9. Then attempt to locally stabilize the plans via linear feedback
  10. Need to know dynamics and the form of the equations for LQR Trees to work
  11. Have results for limit cycles with impact
  12. The magic isn’t connecting the random point to the goal, its in computing the region around that trajectory so that information can be re-used
    1. So, it doesn’t really conflict with other methods of planning, but provides a way to hold on to planning that was done previously
  13. Algorithm (roughly)
    1. Figure out the linearly controllable region around the goal
    2. Sample a random point in state space
    3. Attempt to find a trajectory that connects that point to some region that has already been solved for
    4. They then “stabilize” the trajectory with time varying linear feedback
    5. Compute the region around the trajectory that is also stable
  14. It probabalistically covers the space (any region that can reach the goal will be covered eventually)
  15. The implementation is very complicated and difficult to get right
  16. Examples of it running swingup on a real robot
  17. Also have it working on an actual compass gait robot
  18. Have it working for getting a plane to perch (solved with another method before, but this is more stable)
  19. He says he used to be against model-based approaches with walking and more was into learning, but the machinery available when you know the mechanics is so much more powerful that he prefers those methods now
    1. This can be used in concert with system identification to figure out the dynamics
  20. Works fine in domains with nonlinear dynamics
  21. Lets you build sparse trajectory libraries
  22. Says planning is the next interesting thing to talk about in that field, moving from limit cycles to optimization to planning (limit cycles don’t work in uneven terrain)
    1. “Hopefully youre using RRTs in 5 years”
  23. Have done up to 6 action dimension/12 state problems
  24. Says the sum of squares component doesn’t scale well with high dimensionality
  25. Says cost is a polynomial of a polynomial, but with high constant factors
  26. Sees room for RL in deciding exactly how to move around once you fall into a funnel
  27. One of the questions at the end mentions work by Sontag

Stable Dynamic Walking over Rough Terrain.  Theory and Experiment.  Manchester, Mettin, Iida, Tedrake.  Robotics Research 2011.

  1. Seems to cover mostly the same material as the above talk
  2. Concern with design stabilization of non-periodic trajectories of underactuated system
    1. One example is a biped walking on rough terrain
  3. The goal is to do so by computing a “transverse linearization about the desired motion: a linear impulsive system which locally represents dynamics about a target trajectory” which is then stabilized using receding horizon control
  4. The application domain here is a compass walker that only has hip actuation
  5. Claims you can have either: stability and versatility or efficiency and naturalism, but not all four, paper tries to give all
  6. For the domain they focus on, they make a provably stable controller
  7. Two classes of walkers:
    1. Discuss ZMP, which ensures the center of gravity is always within the foot/feet on ground.  This can be computed with standard tools, but produces conservative, inefficient, unnatural gaits
    2. Passive-dynamic walkers do not have full actuation and allow gravity and other aspects of the dynamics to play a large role in locomotion
  8. Dynamic walking on uneven terrain is a relatively undeveloped topic, although there are a few references, a number by the authors
  9. This paper also discusses Poincare maps.  Came up before in an earlier paper, not sure which authors.
    1. The problem is that these maps are only defined for periodic limit cycles, which does not exist on uneven terrain
  10. The method of transverse linearization was also previously used for periodic systems, but here is extended to nonperiodic systems.
    1. The advantage is the representation is continuous, as opposed to Poincare maps which occur at a few fixed locations in the limit cycle
  11. Stabilizing only the transverse dynamics is useful in underactuated systems
  12. Model is a nonlinear system subject to instantaneous impacts
  13. Time intervals lie in periods where there is no impact, time advances a step when an impact occurs
  14. Assumptions made are pretty weak
  15. “The stabilization of the transverse linearization implies local exponential orbital stabilization of the original nonlinear system to the target motion.  A transverse linearization is a time-varying linear system representing of the dynamics close to the target motion.”  Stabilizing the transverse linearization implies stabilizing the original domain
  16. Computing the transverse linearization can be done analytically, there is a little extra that has to be added due to the contacts
  17. The MPC they are doing is LQR
  18. The rollots are done to a fixed number of footsteps in the future
  19. Looks like they got this to run on an actual compass-walker going down 2 (very short) stairs, although I’m not 100% sure

Sparse Roadmap Spanners. Dobson, Krontiris, Bekris. Workshop on the Algorithmic Foundations of Robotics (WAFR) 2012

  1. PRMs approach optimality at the limit, but of course this is not actually practical in terms of time or memory requirements
  2. If optimality is relaxed, near-optimal roadmaps can be made that are much more sparse
  3. The approach here asymptotically:
    1. Is complete
    2. Near optimal
    3. Probability of adding nodes converges to zero as time passes
  4. The implication is that finite size data structures can plan near optimally
  5. Empirical results back these claims up
  6. RRT isn’t focused on here, even though it can plan in dynamic domains because the goal is on preprocessing with PRMs
  7. fixed k-PRM, where each point is connected to a fixed k number of nearest points isn’t optimal, but if kis scaled logarithmically with the number of nodes, it is.  Delta-PRM, which connects all points in a delta-ball is asymptotically optimal, I suppose because the number of points in the ball increases as the number of samples increases
    1. These methods just keep adding nodes and vertices at each iteration, and there is no way to know when to stop, so the graph simply grows and grows
  8. The algorithm sequential roadmap spanner produces smaller maps by utilizing graph spanners that are subgraphs that have a node-to-node traversal cost that is allowed to be within some multiplicative factor of the cost in the full graph – therefore the graph remains small but the suboptimality of it is bounded
    1. In practice the graphs tend to be much smaller but don’t sacrifice much in terms of optimality
  9. This paper introduces ther SPArse Roadmap Spanner algorithm.  It has the properties:
    1. Of being probabalistically complete
    2. Can connect any 2 pts of length tc*+4delta.  t and delta are parameters to the algorithm, c* is the cost of the optimum path between query points in Cfree
    3. Converges to a finite size roadmap
  10. It is said to the best of their knowledge that this is the first approach that is near optimal and produces compact results – I wonder how Brafman’s paper from ICAPs compares?
  11. The approach has a natural stopping criteria (not sure what it is yet though)
  12. The alg builds 2 graphs in parallel, one sparse and one dense
    1. there are limits on edge lengths – the sparse graph has longer edges
  13. Uses visibility based navigation through the sparse graphs with guard and bridge nodes
  14. This method has a tiny fraction of the number of nodes delta-prm makes, and the path quality is very close

Cross-entropy optimization applied to simulated humanoid stair descent.

In reinforcement learning, you get what you ask for.  The reward function here is equal to the velocity of the hip along the horizonal axis (as was the case in the walking video, details are the same except rollouts are only 30 steps long, so the vector being optimized is 210 dimensional).  As you can see here, the best way down the stairs in this example is to just jump down!

I’m not sure if this is good or not good – I think it may make sense to try a taller flight of stairs and see what happens.  Its also possible to set things up so that reward is accrued each time a step is touched in sequence, but that feels contrived.  I could penalize changes in velocity along the vertical axis to make it smoother?


Stochastic Complementarity for Local Control of Discontinuous Dynamics. Tassa, Todorov. RSS 2010

  1. Aside from floor contacts joint limits also introduce discontinuities
  2. As in Tom Erez’ work, they introduce stochasticity to smooth the problem so methods that require smoothness can be used
  3. “The moral is that optimal control of discontinuous dynamics cannot give acceptable results with a deterministic model”
    1. Results from HOO/HOLOP show that you can get extremely good results with discontinuous models, as long as the value function is smooth near the maximum
    2. He may be discussing continuous time though, which is a different setting
  4. “The modeling and simulation of multibody systems with contacts and friction is a broad and active field of research.  One reasonable approach, not investigated here, is to model discontinuous phenomena with a continuous-time hybrid dynamical system which undergoes switching.  These methods require accurate resolution of collision and separation times, so fixed-time step integration with a fixed computational cost is impossible.”
  5. This paper also performs local dynamic programming
  6. Mentions Tedrakes LQR trees as a way of measuring the size of basins of attraction for these local methods
  7. Criticizes another paper that develops a solution to deterministic race car driving, says that it is not good that the approach comes to close to failure modes such as the limit of adhesion because it isn’t robust.  I disagree with that – if the problem is deterministic, you should take advantage of that.
    1. Limits of adhesion are used in that paper because it analyzes domains with discontinuities, and limits of adhesion are another example of that
    2. Unfortunately, the web page that describes the problem is gone so I can’t test against it
  8. In this paper they reuse rollouts by cutting off the beginning of the previous result and then just concatenating the last control on the end again
  9. Also use trajectory libraries here around the limit cycle
  10. The domain tested models a finger flicking  a spinner – they come up with a limit cycle for the problem
  11. 40s of planning per step (after library and shifting of previous steps) on an 4-core i7
  12. The controller was robust to human manipluation of the domain on line
  13. When the noise in the model was made too small, the method sometimes failed to produce a productive limit cycle, or the trajectory optimizer did not converge
  14. They say one thing they would like to do is to compute solutions over the full distributions of the system as opposed to sampled points, but that introduces other problems
    1. The distributions become multimodal on impact, so gaussian approximations don’t work any more
  15. Also mentions the problems related to the method such as inadequate exploration, the initial controller is passive (no force), so the only reason it made contact at all was because gravity brought the finger in contact with the spinner
  16. Elipses allow the dynamics to be differentiable (with added noise), but some other shapes would prevent differentiability all together
  17. The equations used to control the policy mix quantities of different units in a manner such that chosing different units for either distances or impulses would lead to different policies, which is not desirable
    1. “in many ways it is easier to write down a numerical method for rigid-body dynamics than it is to say exactly what the method is trying to compute”