Ensemble Monte-Carlo Planning: An Empirical Study. Alan Fern, Paul Lewis.

There are already some papers on this topic, I am familiar with one from the games group at Massricht?

About parallelizing MCTS

The extension here seems to be on more domains

Say that 3 main findings are that parallelization helps when there are time constraints, and that single core ensembles perform better than vanilla given a fixed amount of memory. Seems in no case does ensemble learning produce poorer results

Root parallelization where all agents run entirely separately and only combine results for the root is the simplest method and requires almost no synchronization

In the method called “Ensemble UCT”, many small trees are searched out on one process, and then the action taken is based on a weighted vote of those trees

Say this is essentially equivalent to applying bagging to RL. Seems correct.

When the predictor has high variance, bagging can significantly help results

UCT can also have high variance, I’ve read this is true of many rollout based algorithms

Root parallelization always seemed to be about the best

Mention bias/variance tradeoff

Mention UCT takes too long to converge to values in domains that are highly stochastic, so they used a modified version of it for some of the domians

In some cases, the number of trajectories run was pretty small (one domain 128)

I don’t like their naming conventions

Shows some cases where root parallelization doesn’t help when the total sum of trajectories is held constant

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy