States Versus Rewards: Dissociable Neural Prediction Error Signals Underlying Mode-Based and Model-Free Reinforcement Learning. Glascher, Daw, Dayan, O’Doherty. Neuron 2010.

  1. Mentions state prediction error (SPE), so partially based on a td approach to model building
  2. Using FMRI on Markov decision task, finds evidence of SPE in intraparietal sulcus and lateral pfc, in addition to the “well-characterized” Reward prediction error in ventral striatum
  3. Thorndike (1933) said that behavior is triggered by stimuli strengthened by reinforcement.  Tolman (1948) said that behavior is based on planning based on a “congitive map” representation
  4. Says that model-free (MF) has reward prediction errors (RPEs) and MB has SPEs, but MB also needs RPEs (although the way MF/MB experiments are set up its often done in a way that one outcome is immediately devalues so it doesn’t really need to be learned.
    1. A minor nit, but it would be more correct to say MF has value prediction errors than RPEs; they don’t learn rewards explicitly
    2. **Should look at these papers and see if experiments are actually examining RPEs or “VPEs” – if it is reward than there should simultaneously be something related to value as well.  In general, these two are correlated, but you could set up an experiment that produces worse than average reward for that <s,a> but still good value because of a transition to a high value state
  5. Set up a tree search task with 2 levels of decisions; 4 states after first decision (stochastic), and then 3 total states after 2nd choice
  6. In something similar to a latent learning design, interaction with the tree was first in the absence of rewards
    1. Means that RPEs could be recorded, because no reward information was presented
    2. In order to ensure thorough exploration, subjects were told which button to press
  7. Then rewards were introduced – subjects were told concretely what they were before interaction began
  8. Immediately, reward achieved was better than chance, which could not happen with purely MF model
  9. Used a model for the MB planner called FORWARD, SARSA for MF
  10. Fit done with Aikake’s information criteria?
  11. They hybrid learner, which combined forward and sarsa had the best fit
  12. Because models fit best with MF and MB they looked for SPEs and RPEs in fmri
  13. Analysis checks how strongly brain areas covaries with predicted SPEs and RPEs
    1. p values are pretty tiny, <0.001, or 0.0001 depending on the region
  14. Areas associated with RPEs and SPEs are found
    1. Method used “…ensures that the centers for the search volumes are selected in a way that is independent of the data in session 1.”
  15. Also a relationship between activity in rewardless training session 1 as well as with initial correct behavior in the second session where reward is introduced is found in the “pIPS” (posterior intraparietal sulcus?)
  16. “…we found that the supremacy of the model-based learner in the HYBRID [MF+MD model] declined rapidly over the course of continuing learning.”
  17. There is a possibility that what they are measuring as SPE may be related to salience, although the fact that the areas they picked up aren’t related to RPE reduces the likelihood of this interpretation being correct
    1. Also is probably not related specifically to executive control because in the initial sessions the subject was not making decisions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: