- Interested in the cases where UCT does well and Alpha-Beta doesn’t, or vice versa
- There is an optimal setting for exploration/exploitation in the search tree, where:
- Exploration means searching the tree wide
- Exploitation mean searching the tree deep
- UCT will outperform minimax if running on a small fixed number of available sample. Minimax will beat UCT if it has accessibility to a good evaluation metric and UCT doesn’t
- They say that given a good evaluation metric, the standard UCT method for propagating information up the search tree can be improved via a minimax style rule
- They talk about stopping the UCT rollout when it hits a new node, which I’ve not heard of before, but its in line with what we’ve been talking about with FS^3
- Minimax is good in games with many “trap states”, like chess (or mancala, used in this paper). Apparently, its still better than UCT in chess.
- From other reading when doing minimax search in chess, they can keep the number of expansions down to about 1.2 per level. Of course, 1 expansion is the minimum possible, so its quite close.
- They run experiments on Mancala – since its deterministic, they use randomized start states
- In their experiments, UCT is allowed to expand at most the number of nodes alpha-beta would expand, given the same situation
- Its funny their using alpha-beta since we know much better versions exist (AB-SSS*) and this is a planning conference
- They talk about running UCT with the evaluation function available instead of random play outs, but my understanding was thats what the top go algorithms use anyway
- The constant attached to the bias parameter has a pretty significant impact in the experiments here
- Means exploration exploitation has to be closely balanced
- Say that the conventional wisdom is that forward pruning is worse than doing wide searches, but the results in UCT (which prunes aggressively) contradicts this
- If running random rollouts from the leaves to evaluate, its better to only do a small number of them and spend more time on full UCT rollouts of the search tree
- This is because if the UCT tree itself is expanded, the accuracy of the random evaluations gets more accurate because they are closer to the endgame
- Talk about minimax backups vs averaging backups. When random rollouts are used to evaluate (go) because of lack of a good evaluation function, averaging is better. When there is a good evaluation function, minimax backups are better
- Also depends on the number of nodes expanded
- Remember – all these results are just on Mancala