Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization. Deautels, Krause, Burdick. ICML 2012.

  1. Another paper that I didn’t read very closely, but here are the basic ideas
  2. Basic idea is dealing with bandits where you have to make a number of decisions and then find out the results of all of them after the selection (batch instead of single arm selection)
  3. An algorithm called GP-BUCB is developed, the regret of which only increases by a constant factor independent of the batch size B
  4. Like vanilla GP-UCB, this algorithm needs the arms to continuous, but then discretized for a max operation
  5. The trick for the algorithm leverages a property of GPs, where the reduction in variance depends only on where samples were taken from, and not what the actual values of the samples were.
    1. Based on this, it is possible to hypothesize what the new bounds on the algorithm will be once the sample comes back, and based on that, a decision is made.
  6. Computations of the variance of the function dominate the running time
  7. They have some math to make sure bounds remain correct during batch selection, so that it does not become “overconfident”
  8. Applications are some Synthetic problems Automated Vaccine Design, Spinal Cord Therapy (neat)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: