Here is a video of the learned policy versus Trackfire.
In general, the policy seems to be:
- If you are in the middle, use SpinBot. This actually works well because Trackfire shoots alot, and most miss. Since shooting drains energy, this is effective because the other tank just “punches itself out,” although I should point out the learner isn’t even modelling that
- When near the wall, there are some regions where WallBot is used, and some where TrackFire is used, there is a corner (top right) where it goes from WallBot to SpinBot. Of course, TrackFire vs TrackFire is 50/50, and WallBot has an advantage vs TrackFire.
But anyway, here’s the actual found strategy, its a bit more noisy than whats learned against WallBot, but there are clear regions defined:
And here’s a clip of it in action. Sorry for the poor quality, the original video looks perfect, but Youtube manages to mangle it to the point its barely watchable. Anyway, the agent is yellow and TrackFire is red.
Theres one funny part at about a minute where the agent is looping in SpinBot directly around the agent, and neither shoot each other until a timeout.