The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference During Decision Making in Humans. Hampton, Bossaerts, O’Doherty. Journal of Neuroscience 2006.

  1. Finding in the paper is that activity in ventromedial prefrontal cortex (vmPFC) is more consistent with hierarchical than flat RL
  2. “These results suggest that brain regions, such as vmPFC, use an abstract model of task structure to guide behavioral choice, computations that may underlie the human capacity for complex social interactions and abstract strategizing.”
  3. In the task studied, the task flips in the middle so “…contingencies will reverse.”
    1. In particular, there are two actions that can be selected from, both with stochastic payoff.  Once the better action is selected 4 times straight, the payoff distributions switch (so the bad arm becomes the good one)
    2. This is a POMDP, where people need to infer the hidden state dynamics in order to play well
  4. PFC is tasked with encoding high-order structure, and is associated with higher cognitive functions such as working memory, planning, and decision making, and also seems to be involved in encoding abstract rules
  5. Goal is to see if PFC activity “… would correlate better with an abstract state-based decision algorithm than with simple RL.” <I don’t yet understand what the distinction between these two are but I’m sure we’ll get there>
  6. Of the RL models tested, Q-learning with softmaxy selection had the best fit to subject data <I’m not sure that means their methodology was bad/lacking RL algos, or if people are just terrible>
  7. The “abstract state-based model” is a Bayesian hidden state Markov model
  8. The way the model is set up is that: “The decision to switch is implemented on the basis of the posterior probability that the last choice was incorrect.”
    1. “The state-based model predicts the subjects’ actual choice behavior (whether to switch or not) with an accuracy of  92 +/- 2%”
  9. Subjects made the right choice 61 +/- 2% of the time <this isn’t so much better than chance>
    1. “This is also close to the optimal performance of the state-based model that was 64% (using the actual task parameters).” <Is it that hard?>
  10. <I don’t really like their terminology (they call the “prior” the posterior before reward signal and the “posterior” the postererior after the reward signal)>
  11. <Anyway,> “The prior correct signal” was found to correlate with medial PFC (mPFC), adjacent OFC, and amygdala
  12. Difference in what they call prior/posterior (reward prediction error) lead to signal in ventral striatum as is found elsewhere
  13. “The prior correct signal from the state-based model is almost identical to the expected reward signal from the RL model.  Nevertheless, our paradigm permits sharp discrimination between the two models.  The predictions of the two models differ immediately after a switch in the subjects’ action choice.”
    1. Basically, the MDP planner (non-pomdp) just uses average rewards of the two actions, whereas the pomdp planner tries to differentiate action quality based on hidden state
  14. At p < 0.01, “…abstract state-based decision making may be especially localized to the vmPFC.”
  15. “The critical distinction between the state-based inference model and standard RL is what happens to the expected value of the newly chosen stimulus after subjects switch.  According to standard RL, the expected value of the new choice should be low, because that was the value it had when the subject had previously stopped selecting it (usually after receiving monetary losses on that stimulus).  In contrast, the state-based algorithm predicts that the expected value for the newly chosen action should be high, because unlike standard RL, it incorporates the knowledge that when one action is low in value, the other is high.”
    1. “… the expected value signal in the vmPFC jumps up even before a reward is delivered on the newly chosen action… updating seems to occur using previous knowledge of the task structure.”
  16. “The final decision whether to switch or stay was associated with activity in the anterior cingulate cortex and anterior insula, consistent with previous reports of a role for these regions in behavioral control (…).  These regions are in close proximity to areas that were significantly correlated with the prior probability that the current choice was incorrect as provided by the decision model.  A plausible interpretation of these findings is that the anterior insular and anterior cingulate cortex may actually be involved in using information about the inferred choice probabilities to compute the decision itself.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: