- There are already some papers on this topic, I am familiar with one from the games group at Massricht?
- About parallelizing MCTS
- The extension here seems to be on more domains
- Say that 3 main findings are that parallelization helps when there are time constraints, and that single core ensembles perform better than vanilla given a fixed amount of memory. Seems in no case does ensemble learning produce poorer results
- Root parallelization where all agents run entirely separately and only combine results for the root is the simplest method and requires almost no synchronization
- In the method called “Ensemble UCT”, many small trees are searched out on one process, and then the action taken is based on a weighted vote of those trees
- Say this is essentially equivalent to applying bagging to RL. Seems correct.
- When the predictor has high variance, bagging can significantly help results
- UCT can also have high variance, I’ve read this is true of many rollout based algorithms
- Root parallelization always seemed to be about the best
- Mention bias/variance tradeoff
- Mention UCT takes too long to converge to values in domains that are highly stochastic, so they used a modified version of it for some of the domians
- In some cases, the number of trajectories run was pretty small (one domain 128)
- I don’t like their naming conventions
- Shows some cases where root parallelization doesn’t help when the total sum of trajectories is held constant