[this is based on a quick 2nd read through, so less notes than usual]

- Aggregates over a set of kernel functions in order to do ranking and selection more effectively
- The weighing scheme used is shown to be optimal under independent kernel estimators (need to find out what that means exactly)
- Uses knowledge gradient to select sampling points (roughly uses GPs and then estimates gradient of the bounds, and then attempts to sample where the gradient is highest, but a separate optimization algorithm is needed to find those locations)
- Proposed policy is shown to be asymptotically optimal
- This paper is concerned with searching over a finite number of options (computationally, restricted to some thousands)
- Similarity is measured by a kernel. Although there is no explicit use of lipshitz smoothness or concavity, kernel metrics must be Holder
- In the off line setting, the problem addressed is called ranking and selection, but in on-line its called multi armed bandit
- References for earliest papers on stochastic search
- Noise is assumed to be normally distributed
- Policy is 1-step optimal, and converges to optimal at the limit
- In their empirical results they use 0-mean 1D gaussian processes. They test this method, random sampling, and another GP-based method (which attempts to find the length-scale online (SKO)? theirs just weighs a bunch of provided kernels, finding the best one)
- For differing kernels, used linear fits but with different bandwidths
- Their stuff does very well, although with larger length-scales SKO works better. For GPs that actually have very short length-scales, KGCB is better.

- Also, SKO is less general, and only works well on certain classes of functions (certain covariance functions cause trouble, for example). Seems to be a little more fragile.
- I respect them for pointing out situations where SKO outperforms KGCB
- In an energy storage/retrieval problem, their method outperforms SKO, although with larger amounts of noise the bounds begin to overlap. In this problem KGCB converged more quickly, while in some of the earlier results SKO converged more rapidly, so depends on the domain.
- “Although our policy performs very well in the numerical experiments, there is a caveat. Kernel estimation is known to suffer from the curse of dimensionality as the MSE proportional to h^d where h is the bandwidth and d is the number of dimensions. If observations lie in high dimensional spaces, non-parametric estimation is known to have poor performance. Because of these reasons, the efficiency of our estimation method also degenerates in higher dimensions. Additive models might be used to handle this curse but this requires making more assumptions on the structure of the function.”