Neural Correlates of Forward Planing in a Spatial Decision Task in Humans. Simon, Daw. Journal of Neuroscience

  1. Compares models of MB and MF in spatial navigation task
  2. Choices and value-related BOLD signals in striatum (most commonly associated with TD) were better explained by the model-based theory
  3. Map of maze is known a-priori (to encourage MB behavior)
  4. Also, configuration of maze (where walking was allowed) changed over time and required continuous learning and replanning
  5. These reconfigurations generated discrepancies between proposed MB and MF mechanisms
  6. Rooms were connected by one way doors
  7. Navigation allowed was only to exit a room ahead or L or R, but not to return the way the person came from (I guess this is by definition as all doors are directional)
  8. With p=0.1, (but no more than once every 4 steps) a jump occurred at random to another room
  9. 4 rooms were selected that reward 2 or 3
  10. At each time step, doors changed direction with p = 1/24, although it was enforced that all rooms always had at least one exit
  11. Only directionality of doors in current room were visible
  12. Subjects were presented with all rules of domain and had 10 minutes of practice time
  13. Use softmax action selection over Q values for MF algorithms,
  14. Fit data to gamma and softmax temp according to across-subject data
  15. POMDP
  16. Used VI for MB, which is interesting because in the neuroscience literature MB means forward search; I suppose they reperform VI at each time step?
  17. QL for MF
  18. Said both VI and QL were best fitting
  19. Have a particular way of updating their belief state over doors
  20. Reward is deterministic and known
  21. I’m a little confused because: “Using the learned maze configuration, we compute state-action values based on a tree search planing process terminating at reward states.  For computational efficiency (…) we implemented this planning process using value iteration which simply unrolls a BFS tree over the states from leaves.”
    1. Not a good description, but perhaps written that way because of the common discussion as MB = tree search for the audience
    2. Their algorithm for VI is correct, but they describe it as a tree a number of times so it may be the case that they arent allowing same states to be merged at the same depth
    3. On the other hand they describe it as being across all states simultaneously so its not a tree.  Proceeding
  22. Seems like their policy is based on 16 steps of the algorithm, which may not be enough to get to good values – they mention that higher number of iterations doesn’t help fit, so OK.
  23. Reward states are treated as terminal, which does not lead to optimal behavior
  24. Model is frozen during VI, so planning is tractable
  25. MB fit better than MF
  26. MB worked better than other models like dead reckoning, although dead reckoning was still better than MF
  27. Reaction time measurements were a bit of a problem based on how the experiment was set up, but seemed higher for parts where search was more complex
  28. Testing which Q values (MB or MF) better relate to BOLD is complex, because they are strongly correlated with each other
    1. After all proper statistics, fit is better to MB
  29. Striatum is particularly implicated
  30. Since model-based methods need a model, looked for evidence of that.
    1. For reward, considered correlation with R that was significantly better than Q, showed up in left superior frontal cortex and right parahippocampal gyrus
    2. Medial temporal lobe is also implicated in future projections
  31. They also looked for evidence of maze transition structure, but did so indirectly in terms of search complexity (related to analysis on reaction time)
    1. Areas implicated there are “… a broad temporal and frontal network, areas broadly associated with memory and control.”
  32. “… our results together with those others suggest that the two putative systems [MB, MF] may be partly overlapping or convergent, with striatum potentially actions as a common value target and locus of action selection.”
  33. Activity of ventral striatum is interesting because it is traditionally associated with value estimate in MF, although it is sometimes thought to be more strongly connected to reward prediction error than value
  34. Looking for T and R apparently wasnt done before

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: