It was requested that I do a chart of the running times of the algorithms I’ve been working on/using. Since most are planning algorithms (UCT) and another is a batch-mode learner (R-max), I figured a reasonable way to test them was to see how long each took to run through a trajectory of an MDP for 200 steps. This was done on my Core 2 Quad 3.0 ghz with other stuff going on in the background. The results are below:
- ‘gen model’ refers to the algorithm having access to the generative model and ‘building model’ means just that; it is doing a model based version that does not rely on a given model, just SARS samples
- ‘Distr BoB’ is the version discussed in the previous post. This was run on a single quad core machine so it had one server process and three client processes doing the rollouts.
- Rmax has 10 chops/dim
- All algorithms planned out 50 steps into the future, and the model based versions were allowed 2048 samples, which amounts to ~40 rollouts.
Some things to point out:
UCT, HOOT and RMax all run on a single core
Rmax was tested along its first trajectory when it is “learning” many new states, and replans often. Once most/all states it will access are learned it is much faster, must be under a second. This cost, however, goes up as the resolution is increased or dimensions are added, of course.
Distributed BoB/MRE with model building was also tested along its first trajectory when it will be fastest, as the model is very simple with a small number of samples, figure the time to increase by about 50% when it has an accurate model built.
What else. HOOT is a dog, as we have known. I think the most interesting thing to note is how UCT, which is supposed to be a blazingly fast planner is just as fast as the non-optimized BoB. The distributed BoB is three times as fast as UCT, and this was using just one machine. With more machines, the time seems to scale pretty much linearly with my testing, so distributed BoB can get quite fast. Why is there such a big difference between HOOT and BoB? HOOT has to hit a HOO at each step in the rollout, wheras in BoB its just queried once at the beginning for instructions on how to behave for the rest of the rollout.
As a bonus, since I have it, here is a plot of the performance of BoB with a generative model as the number of rollouts changes vs the performance achieved in the double integrator domain (optimal is about -1.3). Note that this is number of rollouts, whereas above we talked about number of samples. The number of samples used was 50 times the number in the x-axis as the rollouts are of length 50.