Ex: An Effective Algorithm for Continuous Actions Reinforcement Learning Problems. Martin H., de Lope


  • The algorithm that won the 2009 RL competition in the helicopter domain (I believe)
  • Uses a weighted k-nn approach along with TD methods to estimate value across continuous actions
    • Uses “probability traces” instead of eligibility traces, could be more accurately be called “weighted eligibility traces”
    • In general they use the word “probability” when “weights” seems more accurate
  • Doesn’t have any proofs, but is empirically effective.  Seems like a reasonable approach
  • Database of Q-values grows as more samples are taken, so more points to interpolate between with weights as more experience occurs
  • Experimental section would be better if it averaged the results of a number of experiments as opposed to just one run of an experiment
Advertisements

One thought on “Ex: An Effective Algorithm for Continuous Actions Reinforcement Learning Problems. Martin H., de Lope

  1. hut3 says:

    They also have a knn-td method that seems similar to this for (I’m guessing) discrete actions, reading right now.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: