Regret Bounds for Gaussian Process Bandit Problems. Grunewalder, Audibert, Opper, Shawe-Taylor

  • Results are for the case where there is no noise
  • Gives a regret bound for GPs
  • Lots of math and I’m not really reading this carefully
  • The bounds also assume
    • The function being maximized is a GP
    • The prior of the GP is known (mean and covariance)
    • Other global optimization algorithms don’t have these constraints
  • More math I’m skipping
  • Regret bound presented is related to sqrt(T), but is a bit more complex:
    • sqrt(T)(sqrt(a_T+2log2)-sqrt(a_T))
    • This bound is a log-factor off optimal, which I think HOO is as well given its assumptions
  • Discuss extending results to domains where rewards are stochastic

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: