Seems to be a workshop paper (Workshop on Machine Learning and Data Mining in Games ECML/PKDD 2011), not sure if it exists elsewhere.

- Algorithm is called TLS – Tree Learning Search as a “stochastic global optimization algorithm and its integration with the MCTS framework to circumvent the continuous action (and state) problem and automize discretization
- TLS learns a model of the function to be optimized (value/score) from samples generated by MCTS
- MCTS uses the model learned by TLS to sample fucation-values in regions that seem important
- Conceptually the samples generated by MCTS form training examples to train a regression tree. This tree is then used to generate new samples that are maximally informative under certain assumptions
- In Data Strem Mining, the goal is to learn a model or extract patterns from a (fast) continuous stream of examples
- The Very Fast Decision Tree learner (VFDT) does incremental online tree building
- Uses Hoeffding bounds to compute bounds on the probability that the information gain of the split is higher than the information gain of all possible splits

- For regression trees, variance is generally used instead of information gain
- The approach discussed in the paper makes a split when there is a test which is probably significant, which is quite a bit weaker (F-test?)
**Tree Learning Search / TLS**- Each action node in the game tree searched by MCTS is replaced by an incrementally built regression tree that creates a data-driven discretization of the action space
- Instead of branching on discrete actions in the MCTS, the tree learner adaptively chooses how to create splits in the action space as the tree is built
- Splitting criteria is based on significance of difference of variance of value function
- This choice to use a regression tree as the main driver of decision making leads to some changes in the game tree. In a normal game tree, edges represent actions and nodes represent states. In the TLS tree, the state is only represented at the root; edges constrain the range of actions, and nodes can be said to represent a range of states that could be reached by taking that action
- Discuss the problem of
*concept drift*or*sampling shift*that occurs during training in some MCTS algorithms. For example, in UCT the value estimates at the root are nonstationary because the policy of the tree is changing underneath, but the values at the root don’t accurately reflect those changes. - In TLS leaves of the action trees represent internal nodes in the continually-constructed game tree. What should be done with the subtree starting in that node
- I don’t totally get this on the first read-through.
**Their result performs worse than fixed a-priori discretization**- My guess is they may be able to outperform a-priori by moving to higher dimensional domans; they only tested up to 2.
- They cite HOOT, which is cool