The basic takeaway of this paper is that doing even limited rollouts can significantly improve the performance of poor policies

- For domains with stochasticity, Monte-Carlo estimations are necessary (so the implication is the policy is determininstic)
- Discusses root parallelization and pruning heuristics
- On average, rollouts in backgammon need to be around 30 steps long to play to completion
- There are generally about 20 legal moves that can be considered at each step, differences in values of initial actions generally vary by 0.01, when scores range from 1 to 3 (or negative) for a win, gammon, and backgammon, respectively
- Based on this, using pure Monte-Carlo sampling it would be necessary to perform hundreds of thousands of rollouts
- With pruning, roughly a million decisions have to be made to come to a result, typical tournament level human players take roughly 10 seconds

- Adding rollouts makes linear policies go from -0.5 to essentially 0 (the opponent is most basic configuration of TD-Gammon 2.1 with no lookahead)
- Next experiments do limited length rollouts (7 or 11 steps) and then use ANN for “equity” (evaluation) function
- Points to a paper by Shannon from 1950 that discusses rollouts

### Like this:

Like Loading...

*Related*