- Covers Heuristic UCT which is UCT w/ prior information in terms of Q(), as well as UCT RAVE which allows for added generalization by reusing data in the search tree
- These two algorithms are combined to build MoGo, which achieved master level play in 9×9 Go (professional level), and was the best 9×9 or 19×19 agent at the time of writing
- Heuristic Minimax Search was effective in chess, checkers, and othello, an estimated Q() can be treated as a kind of heuristic function. This approach was less effective in Go, as it seems that good heuristics are more difficult to encode in Go
- Best Go agents use tree search
- Heuristic used is based on linear fit of 1 million binary features trained by TD
- Q() of the tree are initialized with the value given by the linear FA, this Q() has a weighting which controls how much that is used relative to returns of rollouts
- Rave component reuses information gained high in the tree in descendant nodes for Q() values, this would be appropriate in inf-horizon rollouts, here it adds a bias but helps modulate the high variance (but unbiased) returns of rollouts.
- This is similar to the history heuristic of α/β search