(2-pager from ICAPS MCTS workshop)
- Goal is to plan in settings where there is a restricted number of possible next states in any transition
- Alg is called Optimistic Planning for Sparsely Stochastic Systems
- Alg has access to list of all possible next states
- Bounds are kept for each possible transition, and states with the maximum upper bound are expanded first
- Analysis is w.r.t. simple regret
- Bounds propagate up the tree
- Seems to do rollouts following upper bounds, feels similar to FS^3
- In their experiments, has better regret, and plans deeper than OLOP (in a discrete version of swingup). OLOP needs more samples to plan effectively, and so has performance similar to uniform planning. Computational times are similar
- By following that expansion rule, alg also maximizes information gained relevant to regret