Cross-entropy optimization applied to humanoid walking

  • Physics simulation by PyOde.
  • 10,000 rollouts per planning step, with rollouts 100 steps long (certainly could have done many fewer).
  • Reward is the velocity of the “hip” along the x-axis to left, a large negative reward occurs when any section aside from a foot touches the ground (which is also terminal)
    • In the interest of full disclosure, the algorithm did cause the dude to fall down later in the simulation, but I’m very confident that could have been prevented with more rollouts.
  • There are 7 joints being controlled (each rendered as a linear section), and the action is a torque applied at each
  • I think the state can be most compactly represented in 22 dimensions (the great thing about the method is it doesn’t matter!), being an x,y position and velocity at the shoulder, and each other part of the body can be encoded in terms of an angle and angular velocity relative to another joint in the skeleton.
  • Optimization is done according to a very large gaussian (700 dimensions).  This method works well, but actually computing the covariance matrix is very expensive.  The physics engine is also pretty heavy-weight and I had to do expensive operations in order to get it to work in this setting – the experiment took about a day and a half.

I think it would be interesting to try and apply the method to walking on an uneven floor.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: