Bayesian Inference in MCTS. Tesauro, Rajan, Segal. UAI 2010.

Is inspired by distribution-free (frequentist) approaches like UCT but uses Bayesian Inference to perform estimates of values

Bayesian approach allows alg designers to leverage evaluation functions differently

When assumptions hold, it is Bayes-optimal

Stochastic trial results at leaf nodes are combined with the prior to create posterior estimates of value in the nodes that are then propagated upwards, which occurs recursively

For MDPs, distributions are based on a distributional max operator (there is a corresponding min for game trees)

Likewise, there are distribution based upper-confidence bounds

Bayesian estimates allow for faster convergence with limited data compared to UCT

A paper is cited that says average value estimates are (sample) inefficient ad often underestimate the true value

Computation costs are within an order of magnitude of UCT

Because the approach is more sample efficient, it is advantageous in cases where sampling is expensive (such as Go)

Bayesian MCTS is robust to sampling policies. Convergence of UCT depends on focusing majority of trials on the optimal path

Bayesian MCTS should better leverage parallelization / off policy sampling

Max/min values are based on extremum distributions. Whats that?

It is a set of random variables

Min, max values are then computed based on this set. Can be done by integration or Gaussian approximation

The modifications to UCT proposed are very simple

Gaussian approximations are nice because they are very nice. Also, some simple calculation can take into account correlation of sibling leaf nodes

Looks like worst-case scenarios that are possible when using Gaussian approximation don’t come up in practice

The empirical domains are very simple, but reasonable

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy