Bandit geometric upper bound estimator implemented, compared to HOO


Although we need to find a better name, I implemented the bandit geometric upper bound estimator. This estimator uses sampled points to find all the highest possible local maxima of a function by taking advantage of a known Lipshitz constant for a given continuous (1-D) domain.

When run in the same small domain from the HOO post, the results look like this when the algorithm is allowed 20 samples:

Blue: true function, Green: upper bound on function, Pink: sampled points

Blue: true function, Green: upper bound on function, Pink: sampled points

The two algorithms were then run in a much larger domain (with a length of 200 instead of 50 with the same Lipshitz constant), and 1000 samples were allowed.

The result of the Hoo algorithm (for rho=0.5, v1=100, I tried to find values for parameters that yielded good results):

Hoo in the larger domain.  Here the green isn't an upper bound but an estimate of the mean

Hoo in the larger domain. Here the green isn't an upper bound but an estimate of the mean

As you can see, HOO initally focuses on a good local maxima, but not the global maxima.  Various other settings of the parameters I tried had characteristically the same results.  Although I would imagine there is some setting of them that would yield better results, I don’t know how to find them.

At any rate, here are the results for the geometric approach in the same domain:

GeomBig1000I should point out that this domain is deterministic.  The next line of experiments will move to the stochastic case.

Chris was the first one to plot the regret of HOO in the domain used in the previous post, and here I compared the regret of HOO to the geometric approach in the larger domain shown in the previous two figures:

HooVsGeomChris and Ali initially thought the sharp bend in the geometric regret was suspicious.  Chris checked out the code and hasn’t found anything wrong yet.  Ali pointed out that for regret bounds in a stochastic domain there must continually be some exploration, so a shape like HOO with slow growth that comes with a slow rate of continuous exploration is unavoidable, but we all agreed in the PAC setting algorithms gather data for a certain period of time and then make a decision, which would give the sharp angle that is found in the geometric approach, and that it should be easy to prove the geometric approach is PAC, even in the stochastic setting.

Advertisements

One thought on “Bandit geometric upper bound estimator implemented, compared to HOO

  1. Michael Littman says:

    the fact that the red line is flat means that it found the optimum. If it didn’t there would be a linear slope. Classical regret minimizing algorithms have a slope that is more square-rooty or log-y.

    does anyone know if PAC and regret are interchangable? I’m starting to wonder if it might make sense to start a collaboration with some theory folks (like Saks) and do a formal treatment of the problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: