Cross-entropy optimization applied to simulated humanoid stair descent.

In reinforcement learning, you get what you ask for.  The reward function here is equal to the velocity of the hip along the horizonal axis (as was the case in the walking video, details are the same except rollouts are only 30 steps long, so the vector being optimized is 210 dimensional).  As you can see here, the best way down the stairs in this example is to just jump down!

I’m not sure if this is good or not good – I think it may make sense to try a taller flight of stairs and see what happens.  Its also possible to set things up so that reward is accrued each time a step is touched in sequence, but that feels contrived.  I could penalize changes in velocity along the vertical axis to make it smoother?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: