Achieving Master Level Play in 9×9 Computer Go. Gelly, Silver

  • Covers Heuristic UCT which is UCT w/ prior information in terms of Q(), as well as UCT RAVE which allows for added generalization by reusing data in the search tree
  • These two algorithms are combined to build MoGo, which achieved master level play in 9×9 Go (professional level), and was the best 9×9 or 19×19 agent at the time of writing
  • Heuristic Minimax Search was effective in chess, checkers, and othello, an estimated Q() can be treated as a kind of heuristic function.  This approach was less effective in Go, as it seems that good heuristics are more difficult to encode in Go
    • Best Go agents use tree search
  • Heuristic used is based on linear fit of 1 million binary features trained by TD
  • Q() of the tree are initialized with the value given by the linear FA, this Q() has a weighting which controls how much that is used relative to returns of rollouts
  • Rave component reuses information gained high in the tree in descendant nodes for Q() values, this would be appropriate in inf-horizon rollouts, here it adds a bias but helps modulate the high variance (but unbiased) returns of rollouts.
    • This is similar to the history heuristic of α/β search

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: