TD Models: Modeling the World at a Mixture of Time Scales. Sutton. ICML 1995


  1. Allows model-building at multiple time scales within a single structure
  2. Uses TD for building these models
  3. Says the approach may be relevant to hierarchical planning which attacks a problem at multiple time (usually anyway) scales
  4. Discusses n-step as well as beta-models (the latter are learned TD-style)
  5. Ignores actions, is concerned only with sequence of states and the rewards, so seems to be on-policy
  6. Multiple n-step models can be combined to estimate the value function
    1. The issue is we would need many of these, because each works for only exactly one value of n
    2. Also may be expensive to learn for large n
  7. Instead of using the exact model for step n, it may bet better to average over a range of values, say n-5 to n+5
  8. “The predictions of the different time scales are linearly mixed and yet still they can be used in backup operations without altering convergence to V”
  9. In simple beta-models, weight of predictions falls of exponentially with delay
  10. In full beta- models a different scheme is used can can have an arbitrary weighing profile, and is dependent on the particular sequence of states (not sure how this has bought us anything yet)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: