- Considers the case where there is a fixed budget
- Shown to be NP-Hard
- Consider some heuristics
- “We observe empirically that the simple biased-robin algorithm significantly outperforms the other algorithms in the case of identical costs and priors.”
- Formalize the problem in terms of coins. You are given a set of coins with different biases, and are given a budget of number of flips to sample. Goal is to pick the coin with the highest bias for heads. Actually consider the case where there are priors over the distributions for each coin, so considers Bayesian case
- “We address the computational complexity of the problem, showing that it is in PSPACE, but also NP-hard under different coin costs.”
- Metric is based on regret
- “A strategy may be viewed as a finite, rooted, directed tree where each leaf node is a special “stop” node, and each internal node corresponds to flipping a particular coin, whose two children are also strategy trees, one for each outcome of the flip”
- So naturally the total number of ways this can work out is exponential

- “We have observed that optimal strategies for identical priors typically enjoy a similar pattern (with some exceptions): their top branch (i.e., as long as the outcomes are all heads) consists of flipping the same coin, and the bottom branch (i.e., as long as the outcomes are all tails) consists of flipping the coins in a Round-Robin fashion”
- Update estimates on coins according to beta distribution
- “The proof reduces the Knapsack Problem to a special coins problem where the coins have different costs, and discrete priors with non-zero probability at head probabilities 0 and 1 only. It shows that maximizing the profit in the Knapsack instance is equivalent to maximizing the probability of finding a perfect coin, which is shown equivalent to minimizing the regret. The reduction reveals the packing aspect of the budgeted problem. It remains open whether the problem is NP-hard when the coins have unit costs and/or uni-modal distributions”
- “It follows that in selecting the coin to flip, two significant properties of a coin are the magnitude of its current mean, and the spread of its density (think “variance”), that is how changeable its density is if it is queried: if a coin’s mean is too low, it can be ignored by the above result, and if its density is too peaked (imagine no uncertainty), then flipping it may yield little or no information …However, the following simple, two coin example shows that the optimal action can be to flip the coin with the lower mean and lower spread!”
- Even if Beta parameters of two coins are fixed, the beta parameter of a third coin make require you to choose the first or second coin depending on their values
- Furthermore, “The next example shows that the optimal strategy can be
*contingent*— i.e., the optimal flip at a given stage depends on the outcomes of the previous flips.” - Although the optimal algorithm is contingent, an algorithm that is not contingent may only give up a little bit on optimality
- Discusses a number of heuristics including biased robin and interval estimation
- Gittins indices are simple and optimal, but only in the infinite horizon discounted case
- Discusses a hack to get it to work in the budgeted case (manipulating the discount based on the remaining budget)

- Goes on to empirical evaluation of heuristics