Ideas From: Monte-Carlo Simulation Balancing. Silver, Tesauro

  • Not really my style of paper so I didn’t read it carefully, also results don’t seem as good as some other approaches such as UCT-Rave
  • Goal is to optimize balance of simulation policy rather than its strength
    • Policy with a small error is called strong, one with small expected error is balanced
    • I haven’t spent time thinking about this, but I don’t see how the distinction is critical
  • Uses policy gradient
  • Seems to be designed for 2-player games/tested on Go

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: