While working on the action sequence search vs. policy parameter search, I noticed a few things:
- Action sequence planning with replanning seemed the most robust
- Action sequence planning without replanning seemed to be about as bad as random
- In the policy parameter search, using the parameter setting that resulted in the single best run (I call this greedy) seemed to work better than asking the HOO algorithm to find the best parameterization (I call this Hoo-greedy, by greedily following the means down the tree)
This left me with a number of questions, one of the bigger ones being what the impact of noise is on the greedy vs. hoo-greedy policies. I ran all four methods (action sequence and policy parameter search with/out replanning) in the inverted pendulum domain with varying amounts of noise. The full action range in the domain is from -50 to 50 units, and in the experiments I ran the actions were corrupted by noise uniformly distributed from 0 to +/- 4 units of noise. The results are below:
So from here the new basic takeaways are in the inverted pendulum domain:
- Action sequence with replanning works close to optimal with the greedy or hoo-greedy results regardless of noise
- Parameter search with replanning works close to random performance either way. This is good because in the double-integrator domain the results between parameter search and action sequence search with replanning were on top of each other.
- Parameter search without replanning only seems to work when the greedy parameter settings are chosen, but that only works in domains with no noise at all. Even very small amounts of noise seem to cause trouble in this domain
Aside from that, I’m not sure to say. Unfortunately, I can’t really think of anything concrete to pull from the experiments here and in the double integrator so I think I’ll keep running experiments and see what other things seem to come up.
Update: a similar graph rolled in 1 for the double integrator domain. Its a bit of a mess, but action sequence search with replanning use the Hoo-suggested action still comes out on top. There are some gnarly error bars because I only ran a few trials of each as they take a bit of time:
Another update, here are results of a similar experiment in the Ball-Beam domain:
Another update, here are results of a similar experiment in the Bicycle domain: