After reading a number of papers on the continuous bandit problem,Â I spent the past few days implementing a continuous 1-D bandit domain, which actually models the location of a point undergoing brownian motion.Â At the beginning of the function generation it is given what is effectively a Lipshitz constant (k).Â At each time step, the function moves forward in the x-dimension by 1 unit, and then in a range up or down in the y-dimension anywhere from (-k, k) from the previous y-location.

I then implemented the HOO algorithm from *Online Optimization in Î§-Armed Bandits*.Â Coding up HOO for just 1-D was very simple and was on the order of 150 lines of code.

The HOO-algorithm takes two parameters, analytically I don’t know how to (or if it is possible) to compute what they should be for this domain, but the algorithm seems to be pretty robust as far as these parameters go; results are characteristically pretty similar for a wide range of values.

When run on the domain, which had 50 piecewise linear sections, and allowed it 20 samples, this was the result:

Blue: true function, Green: estimated function, Pink: sampled locations

Here, the samples were taken from the center of the node, and splits occurred at the same location (the paper specifies these can be done arbitrarily as well), which means at the split, the sample point always falls into leftmost portion of the right child, here is the same setting with 100 samples, as you can see it quickly focuses in the tall region:

If the sample location and split boundary are randomized, the results can look quite different:

There are two interesting items which fall from this.Â First is the ability to use the approach from random decision trees to build an ensemble of randomized HOO trees.Â Secondly, The algorithm is quite effective in a fully deterministic mode, but I would like to see how my proposal compares to it.Â I’ll try and have that next week, but may not be ready until after that with holidays around.

### Like this:

Like Loading...

*Related*

neat. did it find the optimum in the last graph?

Seems like it did not, and that it actually ended up focusing on a sub-optimal region – this is most likely due to an inappropriate setting of the two parameter values. My guess is it sampled close enough to the global maxima to think it knew the region without actually knowing the region well, with a different parameter setting, it is possible it would be driven to sample all over more before considering it known. Then again, it did not seem to do this in the deteministic setting, so it could also be a result of a series of random values that caused it to perform poorly. That could be tested later on.

One of the problems of the algorithm is that it assumes knowledge of those two parameters, but doesn’t say how to figure them out. I suppose this is fundamentally the same problem as assuming the Lipshitz constant is known, except in the domain I set up we can specify what that constant happens to be.

yes, that makes sense. any thoughts on how to move forward?

I did some simple experiments after this post by comparing the regret from HOO with the regret from the geometric approach in a similar (but bigger) domain.

Aside from this, another simple algorithm to implement would be just a fixed resolution discretization of the action space, and maybe something a little more interesting, like simulated annealing.

In Multi-Armed Bandits in Metric Spaces (Kleinberg et. al) it is pointed out that the “needle in a haystack” problem is the most “difficult” setting to optimize, so I would like to compare the regret of all the algorithms in that case as well.

Aside from that, there is theoretical work to do. I’m pretty sure we can show the geometric approach is PAC, so that is another important bit.

I wrote back:

I did some simple experiments after this post by comparing the regret from > HOO with the regret from the geometric approach in a similar (but bigger) > domain. > > Aside from this, another simple algorithm to implement would be just a > fixed resolution discretization of the action space, and maybe something a > little more interesting, like simulated annealing. > > In Multi-Armed Bandits in Metric Spaces (Kleinberg et. al) it is pointed > out that the “needle in a haystack” problem is the most “difficult” setting > to optimize, so I would like to compare the regret of all the algorithms in > that case as well. > > Aside from that, there is theoretical work to do. I’m pretty sure we can > show the geometric approach is PAC, so that is another important bit. > > Thanks!