Regret Bounds for Gaussian Process Bandit Problems. Grunewalder, Audibert, Opper, Shawe-Taylor

Results are for the case where there is no noise
Gives a regret bound for GPs
Lots of math and I’m not really reading this carefully
The bounds also assume
The function being maximized is a GP
The prior of the GP is known (mean and covariance)
Other global optimization algorithms don’t have these constraints
More math I’m skipping
Regret bound presented is related to sqrt(T), but is a bit more complex:
sqrt(T)(sqrt(a_T+2log2)-sqrt(a_T))
This bound is a log-factor off optimal, which I think HOO is as well given its assumptions
Discuss extending results to domains where rewards are stochastic

Like this: Like Loading...

Related