- The algorithm that won the 2009 RL competition in the helicopter domain (I believe)
- Uses a weighted k-nn approach along with TD methods to estimate value across continuous actions
- Uses “probability traces” instead of eligibility traces, could be more accurately be called “weighted eligibility traces”
- In general they use the word “probability” when “weights” seems more accurate
- Doesn’t have any proofs, but is empirically effective. Seems like a reasonable approach
- Database of Q-values grows as more samples are taken, so more points to interpolate between with weights as more experience occurs
- Experimental section would be better if it averaged the results of a number of experiments as opposed to just one run of an experiment
They also have a knn-td method that seems similar to this for (I’m guessing) discrete actions, reading right now.