**Section 1: Stochastic Optimization**

- Setting: finite X, non-Bayesian, non-minimax, nonparametric noise, non-asymptotic analysis.
- How do we define optimality in this context?

- There are 2 parameters to the learner: n evaluations and k options, we want to learn v_1, …, v_k reward distributions of the options, want option with max mean, at each step 1,…, n the learner outputs an option I_t
- for delta_i being the gap between option i and optimal, H = Sum_i 1/delta_i^2
- You always need at least omega(H/logk) evaluations, so any algorithm that accomplishes this is optimal (Audibert, Bubeck, Munos 2010)
- According to this criteria, uniform sampling is actually not so bad.
- O(K Log K / min_i delta_i^2) – this is ok on easy problems but on hard ones is quite difficult

- Multi-armed bandit introduced by Thompson in ’33 (I’m familiar with the second citation he gives – Robbins ’52)
**Bandit survey to appear check webpage**- After n steps, UCB has a polynomial prb of failure, while uniform sampling seems much better. Why does this not show up in practice?
- He introduces a modified UCB (UCB-E)with a different bias term: sqrt((cn/H)/T_i(t))
- This algorithm has a better exponentially small probability of error
- This algorithm is optimal
- It finds the best option in time O(H logH), but requires knowledge of H

**Successive rejects algorithm**:- For each phase, uniformly sample, and then reject the lowest avg val. arm
- This has a pretty good regret bound (didn’t have time to copy)

- In empirical results shows Successive rejects as much better than Hoeffding races
- What if we want to find m-best options instead of single best?
- For 1-best you can work by throwing away only the worst
- For m-best you also want to throw away the worst, but also the best arms when we are confident about them

- For the m-best there is a successive rejects alg, but it is
**successive accepts and rejects**. - Current work is on Combinatorial Stochastic Optimization
- Find the best concept in a given concept class.

**Section 2: Optimal Discovery with Probabilistic Expert Advice**

- In a graph, you can ask whether a node is important or not
- You can define a subset of the graph and sample a node from it an ask if it is optimal
- Goal is to quickly define important nodes
- Part of this problem is you only need to find each interesting node once; hearing about it again later is wasteful

- Part of the theory is from Good-Turing when working on Enigma
- Algorithm is called Good-UCB, looks alot like regular UCB (this is demonstrated empirically as well, and is tight the whole way, not just at the end)
- Theorem says this is just as good as using an Oracle; uniform sampling does not achieve this

- One of the examples for this is prime numbers, small exploration constant here works best