Gaussian Process Bandits: An Experimental Design Approach. Srinivas, Krause, Kakade, Seeger

  • A short (4 page) paper
  • Analysis of upper confidence bound on Gaussian process optimization
  • Smootheness is encoded by covariance function
  • The Gaussian Bandit UCB requires a discrete set of test points, which seems strange
  • There are some notation abuses that are difficult to understand, for example their definition of a no-regret algorithm and the statement that UCB is a no regret algorithm
  • The regret bound in discrete UCB is O(sqrt(kt)).  For the infinite arm case, K is replaced by the bound for the maximum possible information gain due to sampling
    • The say this connects GP optimization w/ optimal experimental design, should read more about this
  • Here there is noise on the reward observations, it is assumed to be Gaussian
  • Although it is a regret algorithm it looks like there is a probability of failure delta
  • Information gain is submodular – more information is gained when the total number of points sampled is low (although there has to be cases where this isn’t true)
  • There is a different regret bound here that also has sqrtish flavor

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: