- Algorithm, called Binary Action Search (BAS) approximates a continuous action space to an arbitrary resolution via binary decisions which allow for (near, up to a specified resolution) optimal action selection
- Basic idea is to use the assumption that actions taken at nearby points in time should be somewhat similar, so instead of deciding which action to take, the goal is to decide by how much the last action taken should be altered.
- The action can be altered by decreasing or increasing the number corresponding to the action. This can be done repeatedly before an actual action is executed. For example, at first ask if action should be increased or decreased. Based on that answer half of the remaining action space is discarded, from there the same question can be asked again from the midpoint of the actions that are remaining in consideration, even before a single action is taken from that state. That allows more than just one small single adjustment.
- This seems like a bit of a problem though – if the Q function at that state over all actions isn’t monotonic, how is it really supposed to know whether the action should be incremented or decremented, unless it has already explored all actions down to that low resolution? And if you are doing that, I don’t see how the binary searching is really helping much? They point out that at least this helps the search time down to logarithmic in the resolution, as opposed to linear
- Makes no smoothness assumptions (but likewise, there isn’t much guarantee as to the performance)