Uncertianty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Daw, Niv, Dayan. Nature neuroscience 2005

  1. 2 systems of choice in brain: prefrontal cortex and dorsolateral striatum
    1. How do you arbitrate between these systems?
  2. Computationally consider trade off between computational simplicity and statistically efficient use of experience
  3. Use a Bayesian arbitration mechanism, each system is used when most accurate
  4. Dorsolateral striatum is for habitual control
  5. PFC for “reflective or cognitive action planning”
  6. “We propose that the difference in the accuracy profiles of different reinforcement learning methods both justifies the plurality of control and underpins arbitration.  To make the best decisions, the brain should rely on a controller of each class in circumstances in which its predictions tend to be most accurate.”
  7. Extinction is when a behavior is permitted but no longer coupled with an extrinsic reward, such as food
  8. “Lesions or depletions of dopaminergic input to dorsolateral areas of the striatum evidently block this transfer of control to a caching system.” (when an output is devalued behavior changes quickly)
  9. There is some evidence that tasks that are simpler (in terms of # of decisions or number of actions between start and reward) that goal-directed behavior dominates as it is still tractable in such domains
  10. Lesions in a broad number of areas impair tree-search
  11. Their proposal is that the system used is the one most likely to yield actual reward, the one that is more certain about its value estimate.  The probability of selecting an action is proportional to its confidence.
    1. Uncertainty may also be used in other ways, such as driving exploration
  12. Add noise for each level in the search tree
  13. Bayesian estimates of value (guess this is detailed in appendix), value defined as posterior mean
  14. Seems like computations factor in nonstationarity and downweight older samples, this means that uncertainties have asymptotes that are nonzero
  15. On stochastic tasks, MB has more certainty and better expected value.  On determinstic tasks MB and MF are much closer
    1. Says this demonstrates the fact that behavior becomes habitual over time unless the task involves uncertainty
  16. “Of course, normativity only extends so far for us.  The true reason for multiple controllers in our theory is the computational intractability of the complete Bayesian solution (roughly speaking, the tree-search system unencumbered by computational incapacity) and the resulting need for approximations.”
  17. Classically, MB is associated with dopamine and dorsolateral striatum and MF with prefrontal cortex.  From lesion studies, it is known these two can operate independently, but in healthy individuals they are connected in multiple locations.
    1. At any rate, the simplifying assumption is made in the modelling results that they are completely independent
  18. Cached values can be used in trees to avoid significant costs that exhaustive search brings
  19. “There is limited evidence about the substrate for the uncertainty-based arbitration that has been our key focus.”
  20. These results make certain testable predictions about when the crossover between MB+MF occurs (related to for example proximity of reward, amount of training, time available, and noise)
  21. Says used a “Bayesian tree-search (‘value iteration’)” but its not a tree search algorithm – not sure how it is modified to be Bayesian.  Also says used a Bayesian Q-Learning, not sure what that formulation is either.  Will look in appendix
  22. Arbitration between MB and MF was based on the variance of the posterior between the two; the mean of the distribution with the smaller mean was used.
  23. Action was selected according to softmax

