Mechanisms of Hierarchical Reinforcement Learning in Cortico-Striatal Circuits 1: Computational Analysis. Badre, Frank. Cerebral Cortex 2011.

  1. “Growing evidence suggests that the prefrontal cortex (PFC) is organized hierarchically, with more anterior regions having increasingly abstract representations. How does this organization support hierarchical cognitive control and the rapid discovery of abstract action rules? We present computational models at different levels of description. “
  2. In the neural model, basal ganglia gate frontal actions, striatal units gate inputs to PFC, and others gate output to response selection
  3. Learning at all levels in the hierarchy is done through dopamine
  4. “This functionality allows the system to exhibit conditional if–then hypothesis testing and to learn rapidly in environments with hierarchical structure.”
  5. Also present a Bayesian RL mixture of experts (MOE) model, which finds the most likely hypothesis state based on their history
    1. It is “… intended to correspond to key computational features of the neural model.”
  6. “This model yields accurate probabilistic estimates about which hypotheses are attended by manipulating attentional states in the generative neural model and recovering them with the MoE model.”
  7. In a followup paper (although seems to be published contemporaneously) they look for neural correlates to this model
  8. There is evidence that frontal cortex manages hierarchical control, monitoring progress in super-and sub-tasks
  9. An earlier fMRI study on hierarchical learning “…suggest that from the outset of learning the search for relationships between context and action may occur at multiple levels of abstraction simultaneously and that this process differentially relies on systematically more rostral portions of frontal cortex for the discovery of more abstract relationships.”
  10. Authors work off a previous model of theirs for RL/working memeory and see if it can be extended for hierarchical use
    1. “In this model, the striatum modulates the selection of frontal cortical actions, including motor actions and working memory updating. It does so by gating the inputs to be maintained in frontal cortex (input gating) and gating which of these maintained representations has an influence on action selection (output gating).”
  11. Based on the extensions the model is more effective at working in hierarchical tasks
  12. Derive a Bayesian RL mixture of experts model to model the gating that is believed to occur in the brain
  13. Also try using the more strict neural model as an expert in the Bayesian mixture of experts <?>
  14. In terms of motor behavior, the premotor cortext selects candidate actions, and the basal ganglia “selectively amplifies representations of one of these candidates.”
  15. “Computational trade-offs indicate that it is adaptive to have separate systems implement gating and maintenance (…)”
  16. This relationship between prefrontal cortex and basal ganglia has been supported through studies on brain damage, as well as parkinsons
  17. “In cognitive tasks, it may be necessary to update and maintain multiple task-relevant items in working memory. In the corticostriatal network models, stimuli are selectively ‘input-gated’ into segregated PFC memory ‘stripes’ (structures of interconnected neurons that are isolated from other adjacent stripes; …). In this manner, motor response selection can then be contextualized by multiple potential working memory representations. However, in some scenarios only a limited subset of currently maintained PFC representations may be relevant for influencing the current motor decision (e.g., in tasks with multiple goals and subgoals, only some maintained representations are relevant for processing during intermediate stages). In such cases, the system can benefit from an additional ‘output-gating’ mechanism…
  18. Other pieces of information aside from working memory may also be relevant to decision making <natrually>
  19. The basic question is how to pick the motor responses and how to select what pieces of information are relevant to decision making
  20. Models are implemented at 2 levels of abstraction:
    1. “The first builds on existing neural models of corticostriatal circuits in reinforcement learning and working memory and extends this framework to accommodate hierarchical structure.
    2. The second is more abstract, but is still based on the neural model.  The idea is that this model can test whether individual behavior is based on behavior consistent with a hierarchical internal model or not
  21. The experiment was set up so a stimulus had 3 dimensions, and 3 actions could be selected.  In one version, the only option was to learn individual mappings from each stimulus to an action.  In the other version, there was structure to the problem so that rules could be composed that defined the optimal policy based on features of the problem set in a hierarchy
  22. The Bayesian model is composed of 3 experts – one for each action <?>
  23. Gating of experts is done according to state and accuracy at that state, as opposed to simple overall accuracy across the entire history
  24. The Bayesian model is tweaked to be mathematically a little suboptimal, in order to be more consistent with the neural model
  25. Weighing of experts and decision making by experts is both according to softmax
  26. The method of combining experts coverges to <soft?> optimal
  27. In the extension to the hierarchical model, each expert is composed of lower level experts
    1. Each lower expert can attend to a particular dimension of the input
  28. Individuals responses are checked for evidence of hierarchical understanding by doing a max likelihood fit to the hierarchical model according to their responses and looking at the weights
  29. They do some stuff to cut down on the number of hierarchical experts needed by parameterizing across all hierarchical experts and then estimating the parameters to decide which hierarchical expert to use (instead of training a polynomial number of them)
  30. As expected, the hierarchical model performs better than flat only in the case where the task actually has hierarchical structure
  31. An experiment where dopamine was turned off in the model lead to behavior that is not adaptive to reward, so in the model it is critical to improve performance
  32. Parts of the model that are related to hierarchy get downweighted with experience when the task is actually flat, because their assumptions dont fit the data – they have fMRI data that also shows this in the brain
  33. “Thus, these simulations show that the network learns hierarchical structure. This structure is abstract as it is not tied to any given feature but rather the general tendency for the higher order dimension (color) to increase propensity for gating the lower level dimension (shape/orientation). Thus, once these weights are learned, it is not clear that the ‘‘color’’ units in prePMD should be labeled as such because they now represent which of the other dimensions is relevant…
  34. “In summary, the neural model supports the notion that multiple BG–PFC circuits interact using standard reinforcement learning principles, modulated by dopaminergic prediction errors, to solve hierarchical tasks. Application of this model to a range of other hierarchical (RL and non-RL) tasks is outside the scope of the current study but is currently being investigated.
  35. Analysis of individual responses shows there is a large variability in how hierarchical their behavior seems to be
  36. Also a problem of overlearning where subjects may learn hierarchical structure in one block and then apply hierarchical rules to a flat problem in the next block of choices which leads to suboptimal behavior
  37. The analysis of the fMRI data is done in the companion paper
  38. “Our model builds on these prior notions by suggesting that—at least when rules have to be learned—the influence of anterior PFC on posterior PFC may be indirect (rather than directly corticocortical) such that action selection at one corticostriatal level is constrained by inputs from more anterior levels. In other words, hierarchical control may emerge, in part, from multiple nested frontostriatal loops.
  39. <Skipping related work section>

Notes from the companion paper on fMRI results:

  1. <I’m planning on skimming this as the primary paper was fairly long>
  2. Basic findings are:
    1. fMRI activation occurred in dorsal premotor cortex (PMd) and more rostral areas of premotor cortex (prePMd).  Learning declined in prePMd “…when no abstract rule [hierarchical?] was available”
    2. Subjects were capable of learning and applying abstract rules when they existed
    3. Variation across subjects of activation of prePMd early (but not late) in learning correlated with successful discovery of abstract rule when one existed in the task
    4. “striatum (caudate and putamen) showed functional connectivity with prePMd and PMd that was consistent across learning at different levels of the hierarchy.”
  3. These findings mean that search for rules may be undertaken based on context (contents of working memory), and may occur at multiple levels of abstraction at the same time
  4. “To summarize, then, based on an individual’s trial-to-trial sequence of responses and rewards, the MoE permits computation of 3 types of variables which can be correlated with the BOLD response:”
    1. The weight for each expert
    2. “whether model estimates of RPE, both generally and specifically associated with expected outcomes given a hierarchical rule, are systematically associated with regions of straiatum and frontal cortex…”
    3. If negative prediction error is associated with reduced activation in prePMd as predicted by the neural model
  5. Based on the model, 3 predictions:
    1. Activation in prePMd is related to more hierarchical understanding of the task
    2. “RPE [reward prediction error] associated with testing a hierarchical rule will be specifically associated with a local circuit between prePMd and striatum
    3. “RPE associated with hierarchical rule will correlated with individual differences in the decline in activation in prePMd during learning of the flat rule set.”
  6. prePMd is believed to be necessary but not sufficient for attention to hierarchy – and estimated weights don’t come directly from prePMd activity but something related to it

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: