## Bandit geometric upper bound estimator implemented, compared to HOO

Although we need to find a better name, I implemented the bandit geometric upper bound estimator. This estimator uses sampled points to find all the highest possible local maxima of a function by taking advantage of a known Lipshitz constant for a given continuous (1-D) domain.

When run in the same small domain from the HOO post, the results look like this when the algorithm is allowed 20 samples: Blue: true function, Green: upper bound on function, Pink: sampled points

The two algorithms were then run in a much larger domain (with a length of 200 instead of 50 with the same Lipshitz constant), and 1000 samples were allowed.

The result of the Hoo algorithm (for rho=0.5, v1=100, I tried to find values for parameters that yielded good results): Hoo in the larger domain. Here the green isn't an upper bound but an estimate of the mean

As you can see, HOO initally focuses on a good local maxima, but not the global maxima.  Various other settings of the parameters I tried had characteristically the same results.  Although I would imagine there is some setting of them that would yield better results, I don’t know how to find them.

At any rate, here are the results for the geometric approach in the same domain: I should point out that this domain is deterministic.  The next line of experiments will move to the stochastic case.

Chris was the first one to plot the regret of HOO in the domain used in the previous post, and here I compared the regret of HOO to the geometric approach in the larger domain shown in the previous two figures: Chris and Ali initially thought the sharp bend in the geometric regret was suspicious.  Chris checked out the code and hasn’t found anything wrong yet.  Ali pointed out that for regret bounds in a stochastic domain there must continually be some exploration, so a shape like HOO with slow growth that comes with a slow rate of continuous exploration is unavoidable, but we all agreed in the PAC setting algorithms gather data for a certain period of time and then make a decision, which would give the sharp angle that is found in the geometric approach, and that it should be easy to prove the geometric approach is PAC, even in the stochastic setting.

## One thought on “Bandit geometric upper bound estimator implemented, compared to HOO”

1. Michael Littman says:

the fact that the red line is flat means that it found the optimum. If it didn’t there would be a linear slope. Classical regret minimizing algorithms have a slope that is more square-rooty or log-y.

does anyone know if PAC and regret are interchangable? I’m starting to wonder if it might make sense to start a collaboration with some theory folks (like Saks) and do a formal treatment of the problem.