Ensemble Monte-Carlo Planning: An Empirical Study. Alan Fern, Paul Lewis.

  1. There are already some papers on this topic, I am familiar with one from the games group at Massricht?
  2. About parallelizing MCTS
  3. The extension here seems to be on more domains
  4. Say that 3 main findings are that parallelization helps when there are time constraints, and that single core ensembles perform better than vanilla given a fixed amount of memory.  Seems in no case does ensemble learning produce poorer results
  5. Root parallelization where all agents run entirely separately and only combine results for the root is the simplest method and requires almost no synchronization
  6. In the method called “Ensemble UCT”, many small trees are searched out on one process, and then the action taken is based on a weighted vote of those trees
  7. Say this is essentially equivalent to applying bagging to RL.  Seems correct.
    1. When the predictor has high variance, bagging can significantly help results
    2. UCT can also have high variance, I’ve read this is true of many rollout based algorithms
  8. Root parallelization always seemed to be about the best
  9. Mention bias/variance tradeoff
  10. Mention UCT takes too long to converge to values in domains that are highly stochastic, so they used a modified version of it for some of the domians
  11. In some cases, the number of trajectories run was pretty small (one domain 128)
  12. I don’t like their naming conventions
  13. Shows some cases where root parallelization doesn’t help when the total sum of trajectories is held constant

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: