Ideas From: Monte-Carlo Simulation Balancing. Silver, Tesauro


  • Not really my style of paper so I didn’t read it carefully, also results don’t seem as good as some other approaches such as UCT-Rave
  • Goal is to optimize balance of simulation policy rather than its strength
    • Policy with a small error is called strong, one with small expected error is balanced
    • I haven’t spent time thinking about this, but I don’t see how the distinction is critical
  • Uses policy gradient
  • Seems to be designed for 2-player games/tested on Go
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: