Generalization of Value in Reinforcement Learning by Humans. Wimmer, Daw, Shohamy. European Journal of Neuroscience 2013.

  1. “… basic RL is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision making.”
  2. Hippocampus is generally regarded as important in memory and stimulus-stimulus relationships (generalization, classification, use of distance metrics)
  3. Discusses an fMRI study where a RL task with a “… relational structure, modeled after tasks used to isolate hippocampal contributions to memory.”
  4. Observe BOLD activity in striatum and hippocampus
  5. A model that allowed for generalization was more accurate
  6. “… functional connectivity between the ventral striatum and hippocampus was modulated, across participants, by the ability of the augmented model to capture participants’ choice.”
  7. “… hippocampal memories represent the relation between multiple arbitrary associated simuli.  Due to their nature hippocampal memories are flexible and can be generalized across stimuli and contexts…”
  8. If people learn that simulus A is followed by outcome X, and B is also followed by X, they will associate other knowledge between A and B
  9. Here options (actions) are “coupled” in terms of a correlated reward as opposed to a resulting stimulus
  10. Task was a four-armed bandit where each arm was represented by a different face
  11. Drift in actual rewards
  12. Faces were secretly paired so that they had equivalent reward payouts
  13. Initial modeling was done based on a value estimate that was modified based on whether the previous selection provided a reward or not
    1. Also had a component to factor in for structure in the bandit task (for arms that are correlated).  With the additional parameter value at 0, it models behavior as if there is no relationship, so its a generalization of previous similar models
    2. Assumed to be fixed
  14. It looks like this model already knows which actions are paired? Unclear.  I would have expected that not to be the case as the subjects aren’t told the pairing
  15. At least looks like the other parameters were fit with that parameter (alpha2, definining the amount that behavior is related between connected options) clamped to 0
  16. 300 trials, on average subjects changed choices 115 times
  17. Looks like subjects weren’t explicitly aware of pairing “pairing performance did not differ from chance” I think what they are referring to here is asking subjects specifically which faces were paired at end of task
  18. Effect of previous reward was significant in predicting choice, but not previous choice itself
  19. Both base and generalization models explain performance better than chance
  20. Perfomance of generalization model over base was better than chance due to extra parameter
  21. Bayesian analysis puts 85% of subjects as generalizers and 15% as not.
  22. “…generalization learning rates were approximately 13% of the primary learning rate.”  Low, but if you then separate out that rate for the first and second half of the experiment its significantly stronger for the second half
  23. Looking to see if BOLD effects in VS were naive to generalization
  24. If options are coupled reward estimates also change for both making different predictions for reward prediction error (RPE)
  25. “Indeed, activation in a region of the of the right ventral striatum significantly correlated with the difference regressor designed to capture the effects of generalization on prediction error… Thus in contrast to predictions based on simple reinforcement learning models, the net BOLD signal in the right ventral striatum, a region often characterized by a reward prediction error response, is best explained by a reinforcement learning model that incorporates generalization knowledge.”
  26. Generalization model better corresponded with activation of region in right ventral striatum (VS)
  27. So although VS is generally characterized as containing the value function in a traditional state-by-state manner independently, it looks like generalization is happening there
  28. Finding of correlation of activity in hippocampus with value was shown here, although in most studies (where states are usually fully independent), it has not been found to activate
  29. “We then asked whether these repsonses also reflected generalization knowledge… but this activation did not survive cluster-correction based on a medial temporal lobe mask.  The lack of significant evidence for generalization effects in these signals is unexpected in light of our hypothesis that the hippocampal system might support the generalization.”
    1. Not following, how is this different from the previous findings that did implicate the hippocampus?  Seems important but not understanding what is being expressed either because of my ignorance in the area or because of wording
    2. Mentions “analysis of the value difference regressor”? Whats that
  30. “Striatal-hippocampal connectivity was significantly predicted by the generalization model…”
  31. There is one unsorted detail that results could possibly come from negative correlations in reward between the two pairs, left to future work
  32. Only one individual was aware of the complete nature of the task
  33. The most direct evidence of the hypothesis was “… that connectivity between the striatum and hippocampus predicted the degree to which participants’ choice behavior was better described by the generalization model.”
  34. They also found a link between signal in the hippocampus which chosen option value (as derived from the model).  Generally, this activity is more closely linked to the ventromedial PFC, although in most of those studies presumably pains are taken to keep states and actions completely discrete and separate
  35. “We were unable to demonstrate the effects of value generalization quantitatively in hippocampal correlates of value, even though effects of generalization were visible in the striatum.  Based on our hypothesis… this result is puzzling and it may indicate the hypothesis was incorrect.”
    1. Ah, so it seems like the value (as computed by the model) of the action taken was correlated to connectivity, but the value itself (as opposed to that related to the policy expressed) didn’t show up in the hippocampus?
    2. There are reasons presented why this may not show up even if the hypothesis is basically correct though
  36. Although hippocampus is generally related to conscious processing, the results here show it may be active unconsciously as well, as all participants aside from one were not aware of the nature of the task
    1. The one person who was aware of the nature of the task had the best fit overall, which is encouraging
  37. There was a similar study, except options were negatively correlated, and subjects were aware of the nature of the task.  There, value correlates in ventromedial PFC were shown to reflect generalization
  38. Talks about how this study can be implicated in model-free and model-based, but its just a bandit task so they are really the same thing in that case.  Don’t think there is really anything to be said about the distinction between the two
  39. “Anatomically, the ventral striatum may gain access to relational representations via direct projections there from the hippocampus and medial temporal lobe [citations].  Conversely, value information in the hippocampus may arrive via significant projections from midbrain dopaminergic neurons of the ventral tegmental area [citations].”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: