Non-Linear Monte-Carlo Search in Civilization II. S.R.K. Branavan David Silver Regina Barzilay. IJCAI 2011

  1. Based on the title this must be the most awesomest paper ever.  Its the culmination of my entire existence on earth.
  2. Apply non-linear regression within MCTS to estimate Q-function from random rollouts.
  3. Because it generalizes, it doesn’t need many rollouts to make an estimate
  4. Work also pulls out information from the manual (which seems to be the focus of  a separate paper)
  5. The non-linear FA is much more effective than a linear one
  6. Result beats built-in AI 78% of the time.
    1. That is awesome, but this is AI that dates back to 1996, and was designed to work on home-PCs of the era (33 mhz), so we might expect it isn’t amazing.
    2. They actually used FreeCiv, so this isn’t totally right (still designed for 100 mhz machines), and the AI was constrained not to cheat as it does in the real Civ (which is very fair).  I imagine the AI in freeCiv is much better than Civ2, though.
    3. They do, however, treat unfinished games as a loss, so draws are awarded to the AI
  7. The branching factor in the game is extremely huge, so I’m amazed anything works at all.
  8. This is similar to the Go work that used FAs to learn an evaluation function, but there linear FAs were used
  9. The VFA (value function approximator) is built locally in time (from a particular state)
    1. Global VFA was effective in Backgammon, but not Go or Civ, which are larger
  10. Elements of manual relevant to current game state are modeled as hidden variables in the non-linear VFA
  11. Use 4-layer NN for VFA, trained using rollouts.  Layers are as follows:
    1. Game state, action, and game manual
    2. Linguistic model of manual
    3. Fixed features that perform predefined transformations on first 2 layers
    4. Value function
  12. Rollouts train all of these, including the linguistic model
  13. They argue nonlinear VFAs generalize better from a small number of rollouts – thats nonintuitive to me since the hypothesis space of linear VFAs is smaller and should be learnt more quickly (even if its accuracy is worse at the end)
  14. ANNs are helpful because they can process the input in stages
  15. There is already some literature on RL in civ, but this is the first that plays the entire game, as opposed to some subset of it, or only selecting from options
  16. Use game rules and opponent actions as part of the black-box transition function
  17. Manual text that is extracted and fed into the ANN depends on the situation
  18. Its funny they dont at least carry over the ANN from the last step as an initial point to start learning, I guess it biases samples and random sampling is better.
  19. Stanford parser used on the manual
  20. Rollouts are 20 steps long, and uses game score as the evaluation function.  500 rollouts are performed
  21. Almost half a million features fed into the ANN
  22. Exploration is epslion-greedy
  23. Played through a 100 step game in 1.5 hours, not so bad
  24. Compare to 3 other methods:
    1. MCTS: each item in the game computes its search tree indep, this is the only alg that explicitly builds a search tree
    2. Linear MC – similar to the linear VFA method used for Go
    3. Non-linear MC is basically the same method as in the paper, except no manual
    4. Random-Text same as proposed method, but manual has scrambled word order (but same word content)
  25. The other comparison methods are much less effective against the AI (next best is non-linear MC with about 30%).
    1. Regular MCTS didn’t win a single game
  26. Its worth mentioning that they use the Civ2 manual to play freeciv.  While they are very similar, they are not exactly the same game.
  27. There is another paper “Learning to Win by Reading Manuals in a Monte-Carlo Framework” that deals more with just dealing with the manual.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: