Basic setting is the stochastic (mutlidimensional) continuous bandit problem.

The other competing algorithms for the domain at the time this paper was written don’t seem very sophisticated and are variants of gradient ascent.

Uses a Bayesian method of locally weighted polynomial model, which gives a distribution over predictions for a query point.

Noise is assumed, but the parameters are initially unknown (then learned).

Claim the Bayesian approach allows for more reasonable behavior when the density of sampled points is low.

The optimal answer is intractable, but four proposed approximation algorithms are listed, a comparison of those 4 and a few others are shown empirically in 2 different domains.

I guess one important difference between stochastic optimization and bandit problems is how performance is meaasured. Bandit problems care about performance during learning/search. Optimization does not.

If we continue to use upper confidence intervals, it would be good to understand how the local-weighted methods provide them.

I guess one important difference between stochastic optimization and bandit problems is how performance is meaasured. Bandit problems care about performance during learning/search. Optimization does not.

If we continue to use upper confidence intervals, it would be good to understand how the local-weighted methods provide them.