Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization. Deautels, Krause, Burdick. ICML 2012. | Ari Weinstein's Research

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization. Deautels, Krause, Burdick. ICML 2012.

Another paper that I didn’t read very closely, but here are the basic ideas

Basic idea is dealing with bandits where you have to make a number of decisions and then find out the results of all of them after the selection (batch instead of single arm selection)

An algorithm called GP-BUCB is developed, the regret of which only increases by a constant factor independent of the batch size B

Like vanilla GP-UCB, this algorithm needs the arms to continuous, but then discretized for a max operation

The trick for the algorithm leverages a property of GPs, where the reduction in variance depends only on where samples were taken from, and not what the actual values of the samples were.

Based on this, it is possible to hypothesize what the new bounds on the algorithm will be once the sample comes back, and based on that, a decision is made.

Computations of the variance of the function dominate the running time

They have some math to make sure bounds remain correct during batch selection, so that it does not become “overconfident”

Applications are some Synthetic problems Automated Vaccine Design, Spinal Cord Therapy (neat)