## Gaussian Process Optimization in the Bandit Setting: No Regeret and Experimental Design. Srinivas, Krause, Kakade, Seeger. IC

1. Analyzes an algorithm called GP-UCB
2. Regret is bounded in terms of information gain
3. <I think a general problem with the approach is the computational (and even moreso) memory requirements involved in using GPs.  I gave them up years ago because they kept running out of memory.>
4. Deals with noisy setting
5. Information gain is bounded by use of submodularity argument
6. Says HOO has issues with high-dimensional domains, which is contrary to what I remember from the paper.  Anyway, argues that measures of smoothness other than Lipshitz (or Holder) can give better results (based on what kernel you use in the GP)
7. Mentions that there is a difference bettween experimental design and optimization but I am not familiar with the distinctions
8. GP-UCB is a distinct method of using GPs for optimization; there is a small literature on different algorithms with a number of citations in the paper
9. Discusses RKHS, Reproducing Kernel Hilbert Spaces (something I’ve heard of  before but have no idea what it is)
1. There is something called the RKHS norms which is a measure of smoothness
10. Greedily selecting points that maximize information gain is approximately (a constant fraction of) optimal <they cite guestrins paper for this>
11. The GP-UCB algorithm, however, doesn’t just try to maximize information globally.  Since we are performing optimization we only care about areas where the maximum may lie.  There is some scaling constant that is used to balance gaining information vs finding the optimum.  It must be selected correctly or the algorithm will converge prematurely
12. There is a major problem with this approach which is what I thought I remembered hearing in the talk.  Given a bunch of potential points, there is math that tells you which one to sample next.  But in continuous optimization there is an infinitely large set of points and this approach has no way to deal with it, turning the global optimization problem into another global optimization problem (which they claim is easier, though.  I don’t really buy that)
13. Putting together the computational, memory, and most fundamental issue that it doesn’t resolve the global optimization problem, I think this needs to be seriously improved upon before its considered seriously.
14. For example in the experimental results section, they discretize a 1-d optimization problem into 1000 points.  This works fine for 1-D but clearly this won’t scale in any reasonable manner, one of the things they say the algorithm is good at.  For larger dimensions it is necessary to resort to some other method than uniform discretization