Cognitive Control Over Learning: Creating, Clustering, and Generalizing Task-Set Structure. Collins, Frank. Psychological Review 2013.

  1. “Learning and executive functions such as task-switching share common neural substrates, notably prefrontal cortex and basal ganglia”
    1. This paper is concerned with understanding their interaction in terms of how cognitive control helps learning, as well as how learning produces understanding of things required for cognitive control, such as  “… abstract rules or task-sets.”
  2. Develop a Bayesian model of how learning can lead to understanding of hierarchical rules and decide when to, or not to, apply those rules to new settings
  3. Also, they “… develop a neurobiologically explicit network model to assess mechanisms of such structured learning in hierarchical frontal cortex and basal ganglia circuits.”
  4. The network approximately explains “… high-level context-task-set (C-TS)  computations, with specific neural mechanisms modulating distinct C-TS parameters.”
  5. This also leads to predictions about when errors are made and response times in learning and task-switching
  6. People spontaneously build task-set structure even when not cued – predicting positive and negative transfer to later tasks
  7. “These findings implicate a strong tendency to interactively engage cognitive control and learning, resulting in structured abstract representations that afford generalization opportunities and, thus, potentially long-term rather than short-term optimality.”
  8. Task-set representations exist and have been shown in fMRI
  9. These representations are “… abstract latent constructs that constrain simpler choices”
  10. Research in this area generally doesn’t deal with how task-sets are learned, and instead either deal with cases where options are common conventions (such as directional arrows corresponding to directions), or situations that are already highly trained.  In some studies people have to figure out when to apply different rules, but  not to learn the rule itself
    1. “Conversely, the reinforcement learning (RL) literature has largely focused on how a single rule is learned and potentially adapted, in the form of a mapping between a set of stimuli and responses.”
  11. In real life however, the problem is more complex than the above scenarios.  We may need to figure out if the current setting requires the application of a previously learned set of rules, or if in this setting we simply need to learn a new set of rules from scratch
  12. Some other previous work “… showed that subjects build repertoires of task-sets and learn to discriminate between whether they should generalize one of the stored rules or learn a new one… subjects were able to discover such structure and make efficient use of it to speed learning.”
    1. But this didn’t cover how the rules themselves are learned
  13. In terms of domains with hidden state, “… recent behavioral modelling studies have shown that subjects can learn hidden variables such as latent states relevant for action selection, as captured by Bayesian inference algorithms…”
  14. People even sometimes incorrecly infer hidden structure in problems that does not exist.  “Thus, humans may exhibit a bias to use more complex strategies even when they are not useful, potentially because these strategies are beneficial in many real-life situations.”
  15. The hypothesis is that people can identify cues that signify a change in task-set when learning simple low level policies  that are free of structure – they hypothesis is based on 3 previous findings:
    1. Cueing makes it possible to discover and use rules
    2. Learning may require cognitive control to improve learning
    3. People have a bias to infer more structure than exists
  16. In the problems they consider, cues provide clues in terms of what rule set should be used,  These tasks require:
    1. People can represent a rule abstractly, unrelated to the typical context in which it is used
    2. Different contexts can be clustered together when a similar task-set is required
    3. Ability to build a new task-set when needed
  17. This type of problem can be represented as a Bayesian nonparametric generative process – the Chinese Restaurant Process (CRP)
    1. Approximation methods for this can plausibly account for human behavior
  18. In previous studies, cues are generally based on some stimulus that is more similar when the task sets that should be used are similar.
    1. Here the clustering is done in terms of the dynamics of the environment<?>
  19. “…our contribution here is to establish a link between the clustering algorithms of category learning models on the one hand and the task-set literature and models of cognitive control and RL on the other.  The merger of these modelling frameworks allows us to address the computational model inspired by the Dirichlet process mixture…” along with added heuristics
  20. “Many neural models of learning and cognitive control rely on the known organization of multiple parallel frontal corticobasal ganglia loops…”
    1. Do action selection, leading to high-reward actions being selected over low-reward actions (where reward is encoded by dopamine)
    2. These mechanisms are also used to control use of other cognitive actions, “… such as working memory updating and maintenance via loops connecting more anterior frontal regions and basal ganglia”
  21. Moving onto the context-task-set (C-TS) model formally
  22. Its based on the standard RL model, but “… the link between state and action depends on higher task-set rules that are hidden…”
  23. State is also “…determined hierarchically…” <do they mean transitions?>
  24. State is assumed to consist of a context, which I supposed could be thought of as leading to indexing of different policies, and then states which determine the action based on the current policy
  25. Problem is additionally partially observable
  26. Assume that clustering of task sets is done according to a Dirichlet/ Chinese Restaurant Process (CRP)
  27. <They mention the difficulty of the exact Bayesian inference a few times, but dealing with the partial observability is just as significant>
  28. <I think the context and the state are fully observed, but the mapping between the context and the task-set (policy) needs to be inferred>
  29. Looks like the math they are doing is based on MAP estimate
  30. For comparison, they also have a simple flat learner
  31. There is also a model that determines whether there is some hierarchical c-ts structure or not, and if not to do flat learning <although eventually the CRP should learn to put everything in separate clusters if the data justifies it>
  32. In one domain, there are 16 contexts, but they map to two different classes, there the structured learning is faster than flat <I would have expected to see more difference between their algorithm and the flat one, though>
  33. Then move onto a learning transfer task
  34. The model can easily reproduce the phenomena of people attributing more structure to the problem than actually exists by setting the probability of “setting a new table” in the CRP to be low, so instead of being flat a lot of structure is believed to exist
  35. Next is “… a biologically detailed neural circuit model that can support, at the functional level, an analogous learning of higher and lower level structure using purely RL.”
  36. Builds on earlier models of action selection in corticostriatal circuits, but addition here allows for the hierarchy they are concerned with
  37. “In these networks, the frontal cortex ‘proposes’ multiple competing candidate actions (e.g., motor responses), and the basal ganglia selectively gate the execution of the most appropriate response via parallel reentrant loops linking frontal cortex to basal ganglia, thalamus, and back to cortex (…).”
    1. Which action to gate based on all the recomendations is learned by dopamine
  38. In their model, there are two sets of proposals that need to be gated on.  As before one of the proposal sets deals with actions, and now the addition deals with task-sets/contexts, and they are arranged hierarchically, with the addition at the top
  39. Experiments with these models <Not really taking notes on them>
  40. When penalizing for complexity, the models that take into account context are still better than flat
  41. <skipping about 10 pages of experimental results – I think get the idea.  Onto discussion>
  42. “… the C-TS model provided a good quantitative fit to human subject choices and that dynamics of choice were consistent with the mechanisms proposed.”
  43. Some human subjects seemed not to learn according to a hierarchy in the task, or had behavior that seemed consisted with hierarchy, but RTs not consistent with hierarchy <I would say evidence based on policy executed is much more convincing than that based on RT in this setting>
  44. The Bayesian mode has some Bayesian aspects, and some aspects more similar to the pure neural model

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: