## Two basic problems in finite stochastic optimization. Sébastien Bubeck. Talk

Section 1: Stochastic Optimization

1. Setting: finite X, non-Bayesian, non-minimax, nonparametric noise, non-asymptotic analysis.
1. How do we define optimality in this context?
2. There are 2 parameters to the learner: n evaluations and k options, we want to learn v_1, …, v_k reward distributions of the options, want option with max mean, at each step 1,…, n the learner outputs an option I_t
3. for delta_i being the gap between option i and optimal, H = Sum_i 1/delta_i^2
4. You always need at least omega(H/logk) evaluations, so any algorithm that accomplishes this is optimal (Audibert, Bubeck, Munos 2010)
5. According to this criteria, uniform sampling is actually not so bad.
1. O(K Log K / min_i delta_i^2) – this is ok on easy problems but on hard ones is quite difficult
6. Multi-armed bandit introduced by Thompson in ’33 (I’m familiar with the second citation he gives – Robbins ’52)
7. Bandit survey to appear check webpage
8. After n steps, UCB has a polynomial prb of failure, while uniform sampling seems much better. Why does this not show up in practice?
9. He introduces a modified UCB (UCB-E)with a different bias term: sqrt((cn/H)/T_i(t))
1. This algorithm has a better exponentially small probability of error
2. This algorithm is optimal
3. It finds the best option in time O(H logH), but requires knowledge of H
10. Successive rejects algorithm:
1. For each phase, uniformly sample, and then reject the lowest avg val. arm
2. This has a pretty good regret bound (didn’t have time to copy)
11. In empirical results shows Successive rejects as much better than Hoeffding races
12. What if we want to find m-best options instead of single best?
1. For 1-best you can work by throwing away only the worst
2. For m-best you also want to throw away the worst, but also the best arms when we are confident about them
13. For the m-best there is a successive rejects alg, but it is successive accepts and rejects.
14. Current work is on Combinatorial Stochastic Optimization
1. Find the best concept in a given concept class.

Section 2: Optimal Discovery with Probabilistic Expert Advice

1. In a graph, you can ask whether a node is important or not
2. You can define a subset of the graph and sample a node from it an ask if it is optimal
3. Goal is to quickly define important nodes
1. Part of this problem is you only need to find each interesting node once; hearing about it again later is wasteful
4. Part of the theory is from Good-Turing when working on Enigma
5. Algorithm is called Good-UCB, looks alot like regular UCB (this is demonstrated empirically as well, and is tight the whole way, not just at the end)
1. Theorem says this is just as good as using an Oracle; uniform sampling does not achieve this
6. One of the examples for this is prime numbers, small exploration constant here works best