In my previous post the geometric approach was empirically compared to HOO in the deterministic setting. The next step was to adapt the geometric approach to the stochastic setting. The change is quite simple: whereas in the deterministic setting it is only necessary to sample any arm once, in the deterministic setting that exact arm must be sampled repeatedly. Once that has been done a certain number of times (call that m), it is possible to estimate a mean along with confidence intervals. Here, the upper value on a 95% confidence interval was used. This value is simply used in place of the actual value that would have been observed in the deterministic setting.
HOO is already designed for the stochastic setting, and works out of the box.
I would show graphs like I did in previous posts of how the sampling actually looks, but it appears quite similar to the deterministic setting.
Anyway, the important bit is the comparison of the regret curves. This domain was very similar to the one used in the last (deterministic) experiment, except this time noise is corrupted by white noise with a standard deviation of 0.5, which is quite high considering the function only ranged from about 8 to -10:
Although the time scale of where the crossover appears is much larger than that in the deterministic setting, it still occurs, and the general appearance of the graph is the same.