Lipshitz Bandits without the Lipshitz Constant. Bubeck. Stoltz, [Yuan?] Yu. Algorithmic Learning Theory 2011

WordPress lost my draft of this, so I am just quickly summarizing my previous summary

  1. Basic idea is that most optimization algorithms that assume Lipshitz smoothness also assume the Lipshitz constant is known
  2. This paper proposes an algorithm that does not require that information a-priori, and achieves minimax optimality even though that data is not known a-priori
    1. Regret is order: L^{d/(d+2)} T^{(d+1)/(d+2)} for a Lipshitz smoothness L, and time T.
    2. But this is for when T > max( L^d, (0.15 L^{2/(d+2)}/max(d,2))^d )
  3. The paper cites a number of other optimization papers as related work, including Kleinberg’s and DIRECT.  State that the latter is empirically good, but guarantees are only at the limit.
  4. Assume L is selected from a set of possible values.  Algorithms that just assume the largest L will find the correct solution, but it says that these methods have poor regret (especially with regards to T)
  5. The algorithm proposed here functions in two stages:
    1. Initially it uses uniform exploration that gives some estimate of the Lipshitz constant.  Although this of course may be wrong, the estimate found is good enough to start making intelligent decisions.
    2. Finds the optimal region using a standard exploration-exploitation strategy
  6. They say this exploration method may be useful in other areas, but leave it at that.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: