Gaussian Process Bandits: An Experimental Design Approach. Srinivas, Krause, Kakade, Seeger

A short (4 page) paper

Analysis of upper confidence bound on Gaussian process optimization

Smootheness is encoded by covariance function

The Gaussian Bandit UCB requires a discrete set of test points, which seems strange

There are some notation abuses that are difficult to understand, for example their definition of a no-regret algorithm and the statement that UCB is a no regret algorithm

The regret bound in discrete UCB is O(sqrt(kt)). For the infinite arm case, K is replaced by the bound for the maximum possible information gain due to sampling

The say this connects GP optimization w/ optimal experimental design, should read more about this

Here there is noise on the reward observations, it is assumed to be Gaussian

Although it is a regret algorithm it looks like there is a probability of failure delta

Information gain is submodular – more information is gained when the total number of points sampled is low (although there has to be cases where this isn’t true)

There is a different regret bound here that also has sqrtish flavor