- Consider noisy optimization with finite samples <not yet clear if this is budget is imposed by the actor or the environment>
- “Bayesian approach places emphasis on detailed modelling, including the modelling of correlations among the arms. As a result, it can perform well in situations where the number of arms is much larger than the number of allowed function evaluation, whereas the frequentist counterpart is inapplicable.”
- “This paper draws connections between Bayesian optimization approaches and best arm identification in the bandit setting. It focuses on problems where the number of permitted function evaluations is bounded.”
- Applications include parameter selection for machine learning tasks
- “The paper also shows that one can easily obtain the same theoretical guarantees for the Bayesian approach that were previously derived in the frequentist setting [Gabillon et al., 2012].”
- A number of different criteria can be used in Bayesian land to select where to sample: “probability of improvement (PI), expected improvement (EI), Bayesian upper confidence bounds (UCB), and mixtures of these”
- Mentions work of Bubeck/Munos/Etal
- Tons of relevant references
- Also discussion in terms of simple regret
- But looks like they are also talking PACy
- Setting they consider includes GPs
- “As with standard Bayesian optimization with GPs, the statistics of … enable us to construct many different acquisition functions that trade-off exploration and exploitation. Thompson sampling in this setting also becomes straightforward, as we simply have to pick the maximum of the random sample from …, at one of the arms, as the next point to query.”
- Seems like they are really considering the finite arm case where arms have some covariance
- Used Bayes math to get upper and lower bounds among all arms, and then this is used to generate a bound on the simple regret
- “Intuitively this strategy will select either the arm minimizing our bound on the simple regret (i.e. J(t)) or the best “runner up” arm. Between these two, the arm with the highest uncertainty will be selected, i.e. the one expected to give us the most information.”
- The exploration parameter beta is chosen based on how often each arm is chosen and then finding something epsilon optimal
- Regret bound is in terms of near-optimality

- “Here we should note that while we are using Bayesian methodology to drive the exploration of the bandit, we are analyzing this using frequentist regret bounds. This is a common practice when analyzing the regret of Bayesian bandit methods”
- Can do a derivation with Hoeffding or Bernstein bounds as well (leads to analysis of case of independent arms, bounded rewards)
- UGap vs BayesGap – bounds are pretty much the same
- Have a nice empirical section where they use data from 357 traffic sensors and try to find the location with the highest speed
- “By looking at the results, we quickly learn that techniques that model correlation perform better than the techniques designed for best arm identification, even when they are being evaluated in a best arm identification task.”

- Then they use it for optimizing parameters in scikit-learn
- “EI, PI, and GPUCB get stuck in local minima”