Lipshitz Bandits without the Lipshitz Constant. Bubeck, Stoltz, Yu. Algorithmic Learning Theory 2011.

Generally, algorithms assume the Lipshitz constant is known. This paper does not assume this.

Shows it is possible to develop an algorithm that has regret that is minimax optimal in the globally Lipshitz setting without knowing the Lipshitz constants

Functions in noisy domains

Mention DIRECT, which doesn’t assume a Lipshitz constant, which is said to work well in practice, but only asymptotic convergence is guaranteed

Says the discretization method used is similar to Kleinberg’s. Then has 2 phases:

Starts with uniform exploration to make some estimate of the Lipshitz constant (which is good enough to get the bounds)

Find the optimal interval using a “standard” exploration-exploitation strategy

Mentions this 2-step method may be useful for domains aside from bandits

Basically assumptions are that f(x) and its derivative are Lipshitz in the hypercube

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here:
Cookie Policy