From Computational Intelligence in Games, 2001
- Paper studies model building. Claim in abstract is even that incomplete models can help find good policies
- Approach is a combination of CMACs and prioritized sweeping
- They didn’t use the official Robocup simulator because complexity made evaluation of approaches difficult, so they spun their own.
- Their version is drastically simplified, which almost certainly makes model learning easy
- This is full soccer so reward is 1 when a goal is scored
- They claim that in continuous spaces, model learning is most successful when building local function approximators (but they dont explain that adequately here)
- Looks like they build a different model for each tiling?
- They have to do some weird hacks to get agents to be able to share data in a way that doesnt ruin the policies
- They cite MBIE as something similar to what they aren’t doing, but the connection isn’t exactly right
- They have to do random restarts on policies before they arrive at something that can be learned from
- They compare Q(lambda) to PIPE, probabilistic incremental program evolution
- It looks like they give PIPE 5x as much training data?
- The “CMAC model” algorithm is best, and PIPE is worst, regular CMAC in the middle