Trade-Offs in Sampling-Based Adversarial Planning. Raghuram Ramanujan, Bart Selman.

  1. Interested in the cases where UCT does well and Alpha-Beta doesn’t, or vice versa
  2. There is an optimal setting for exploration/exploitation in the search tree, where:
    1. Exploration means searching the tree wide
    2. Exploitation mean searching the tree deep
  3. UCT will outperform minimax if running on a small fixed number of available sample.  Minimax will beat UCT if it has accessibility to a good evaluation metric and UCT doesn’t
  4. They say that given a good evaluation metric, the standard UCT method for propagating information up the search tree can be improved via a minimax style rule
  5. They talk about stopping the UCT rollout when it hits a new node, which I’ve not heard of  before, but its in line with what we’ve been talking about with FS^3
  6. Minimax is good in games with many “trap states”, like chess (or mancala, used in this paper).  Apparently, its still better than UCT in chess.
    1. From other reading when doing minimax search in chess, they can keep the number of expansions down to about 1.2 per level.  Of course, 1 expansion is the minimum possible, so its quite close.
  7. They run experiments on Mancala – since its deterministic, they use randomized start states
  8. In their experiments, UCT is allowed to expand at most the number of nodes alpha-beta would expand, given the same situation
  9. Its funny their using alpha-beta since we know much better versions exist (AB-SSS*) and this is a planning conference
  10. They talk about running UCT with the evaluation function available instead of random play outs, but my understanding was thats what the top go algorithms use anyway
  11. The constant attached to the  bias parameter has a pretty significant impact in the experiments here
    1. Means exploration exploitation has to be closely balanced
  12. Say that the conventional wisdom is that forward pruning is worse than doing wide searches, but the results in UCT (which prunes aggressively) contradicts this
  13. If running random rollouts from the leaves to evaluate, its better to only do a small number of them and spend more time on full UCT rollouts of the search tree
    1. This is because if the UCT tree itself is expanded, the accuracy of the random evaluations gets more accurate because they are closer to the endgame
  14. Talk about minimax backups vs averaging backups.  When random rollouts are used to evaluate (go) because of lack of a good evaluation function, averaging is better.  When there is a good evaluation function, minimax backups are better
    1. Also depends on the number of nodes expanded
  15. Remember – all these results are just on Mancala

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: