The Successor Representation and Temporal Context. Gershman, Moore, Todd, Norman, Sederberg. Neural Computation 20120.

  1. Is the successor representation used for RL and episodic learning in the brain?
  2. “Our main contribution is to show that a variant of the temporal context model (TCM; Howard & Kahana, 2002), an influential model of episodic memory, can be understood as directly estimating the successor representation using the temporal difference learning algorithm…”
  3. “…this equivalence suggests a previously unexplored point of contact between different learning systems.”
  4. Learning properties of temporally extended sequences is more difficult than the task of simply predicting what comes next. (Example, predicting value vs reward in RL)
  5. There is a relationship to language understanding as well (“…people anticipate long-distance dependencies between words…”)
    1. This also shows up in episodic memory
  6. Concerned with Markov chains <as opposed to MDPs>
  7. “… we draw a formal connection between TD learning of the SR and an influential model of episodic memory, the temporal context model… resulting in a generalized form of TCM.”  TCM is a TD method of producing a successor representation
  8. SR has discounting so it can be trivially applied (linear operation) to make value estimates (among other estimates of discounted expectation)
  9. They call the successor representation M
  10. M could be learned by learning the transition function and doing some linear algebra (including an inversion), but it can also be learned through TD, which is the angle taken in this paper
  11. “TCM [The Temporal Context Model] was originally, designed to describe the associative process that underlie episodic memory and applied to human behavior in free recall experiments…”
  12. In free recall, even though the response does not require the items to be presented in order (unlike serial recall tasks) people still reproduce the temporal structure of the list. Latency between items listed is also related to their temporal closeness
  13. Also items tend to be listed in the order presented, and not reverse
  14. “During memory retrieval, the current state of the context vector is used as a retrieval cue.  Items are sampled from memory according to how well the context cue  (at test) matches the context associated with the item at study.”
  15. And items studied close-by in time have similar temporal contexts
  16. Part of the cue is symmetric, but part only has impact forward in time, so nearby items are recalled more easily, but items that come after are even easier to retrieve
  17. This model also explains recency effects (that the last items in a series are recalled more easily) because if tested immediately after learning, the temporal context during testing is very similar to that during the end of learning
    1. On the other hand, if a distractor task is placed after learning, the impact of recency is greatly reduced.  This is because the distractor task changes the temporal context between the end of learning and testing
  18. Previous examinations of TCM (temporal context model) were done in the special case where items were not repeated, but based on the successor representation (SR) learned by the TD(lambda) underpinnings, their model also makes particular predictions in the case where items are repeated
    1. In cases where there are no repeats, the predictions by TD and Hebbian (the more traditional model) make identical predictions, but when there are repeats the predictions diverge
  19. Other distinctions between TD and Hebbian is that TD is error driven, while Hebbian is only associative.  This means TD will only update when errors in prediction occur, Hebbian on the other hand will continue to strengthen association
    1. “It is worth noting that from a neurobiological point of view, Hebbian learning of this sort can lead to instabilities that have disastrous consequences for a memory system [citations]…”
    2. Conversely, evidence has been found for TD learning in RL
  20. Hebbian also does immediate associations and not the extended predictions made by TD(lambda) and the successor representation
  21. Ah, no empirical work – just the theoretical comparison of TD and Hebbian
  22. TCM is a previously established model that is connected to TD-SR here
  23. “… resonates with neuroscientific work suggesting a close connection between remembering the past and envisioning the future… Damage to this network, particularly the hippocampus, impairs both autobiographical memory and episodic future thinking…”
  24. “Retrieving an association from memory thereby corresponds to activating a prediction.”
  25. Its plausible that structures of the hippocampus could be doing computations that occur in TD-SR
  26. May also be based in anterior temporal cortex and lateral PFC

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: