Model-based Reinforcement Learning in a Complex Domain. Kalyanakrishnan, Stone, Liu. Robocode symposium 2007.


  1. Seems to be from RoboCup-2007: Robot Soccer World Cup XI, a symposium.
  2. The approach seems to be a blend of generative model (they have a hand written model of the transition function) and model-learning (the terminal and reward functions):
    1. “the next state s’ is computed by applying a simple rule to the current state s. The rule simply assumes the players do not change their positions between s and s’.”  This works because the behavior of the players were modified slightly from the original publication to move less.
    2. “In our model, the termination and reward predictors are trained through supervised learning using the observed experiences.”
  3. Action-value estimates seem to be estimated using CMACs:
    1. “Q is now updated using transitions that are simulated using M (lines 18-36). This is accomplished by generating trajectories of depth depth using M, beginning with some random start state…”
    2. The method is described as being similar to experience replay.
  4. They try adding rollouts, but inaccuracies in the model degrade performance
  5. They try the algorithm with totally myopic actions (hold unless hold is expected to be terminal, otherwise kick), and the performance is worse than the full learning approach but still not bad.  Indicates short rollouts probably work ok.
  6. Also discuss relation to dyna-q

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: