While I think model building in Pitfall is probably the most interesting aspect of the problem, it occurred to me this morning that it may not even be necessary in order to solve the actual problem.
The reason is that since we are running Pitfall in an emulator, we have the ability to store and load game states at any point (assuming I haven’t nuked that behavior by accident with all the edits I’ve made). What this means is that although we do not have a true generative model, we do have everything any forward searching algorithm requires already.
For example, it should be simple to do the exhaustive type of rollout where we try every action from every state and find every possible state at some horizon by building a tree of depth equal to the horizon length, and fan out equal to the number of actions (just for arguments sake; I know this in particular isn’t practical). All it is is a tree with the start state at the root, an action executed at each branch, and then the resulting true game state (which we actually have access to) in the following node, and so on.
From my perspective at this point it seems there are four interesting parts to the whole problem:
- Object detection: Recognize a wall, ladder, etc as such, in order to reduce the complexity of the problem for the learners. This should be done totally automatically with no input about the number of objects in the game, what objects in the game look like, etc.
- Model learning: Building an actual model of the game dynamics. This could be done without object detection, but my expectation is that without object detection this would be extremely difficult. These predictions are use by the planner (possibly through a merger).
- Merging: If we have many learners, how do we take input from all of them and turn that into a coherent picture for the planner?
- Planning: How to get the reward. My proposal is that right now we exploit the nice pieces of functionality an emulator provides to play the role of model building, but in the future actual learned models could be substituted.
Being that each of those three components by themselves are probably pretty challenging, I suppose it would be useful to think about the order in which to do them. Although doing rollout and storing true game states somewhat obviates the need to have a nice representation, I think it would be nice to have it anyway (this would also allow us to snap in real learners with less difficulty later on). Therefore, I think the order of work should be as follows:
- Object Detection: Should make the work of everything else easier (or at least not any harder)
- Planning: If we can’t get a planner to work with true predictions, the learners certainly aren’t going to be useful at all, so we need to lick this problem up front.
- Learning: The icing on the cake, allows us to throw away a dependency on the save/load function of emulators, and leaves us with a very nice story if it all works out.
- Merging: Only necessary in the case that we end up with multiple learners, and the learners themselves work. Certainly not necessary if we are leveraging the emulator at first, since its always right.
This is may be a lot at once, does it make sense?