Quantifying the Internal Structure of Categories Using a Neural Typicality Measure. Davis, Poldrack. Cerebral Cortex 2014

  1. Deals with the internal structure/representation of category information
  2. <Seems like assumption is there is something of an exemplar representation>
  3. “Internal structure refers to how the natural variability between-category members is coded so that we are able to determine which members are more typical or better examples of their category. Psychological categorization models offer tools for predicting internal structure and suggest that perceptions of typicality arise from similarities between the representations of category members in a psychological space.”
  4. Based on these models, develop a “neural typicality measure” that checks if a category member has a pattern of activation similar to other members of its group, as well as what is central to a neural space.
  5. Use an artificial categorization task, find a connection between stimulus and response
    1. “find that neural typicality in occipital and temporal regions is significantly correlated with subjects’ perceptions of typicality.”
  6. “The prefrontal cortex (PFC) is thought to represent behaviorally relevant aspects of categories such as
    rules associated with category membership (…). Motor and premotor regions may represent habitual responses associated with specific categories (…). The medial temporal lobe (MTL) and subregions of the striatum are thought to bind together aspects of category representations from these other systems.”
  7. Different areas and different neurons and patterns of activation in an area can “reliably discriminate
    between many real world object categories”
  8. Consider examples of category data as having some sort of “internal structure” or feature representation specific to that class.
    1. These features can say things like how typical a concrete example is, and is related to how quickly and accurately classification occurs
  9. “Depending on the specific model, a category representation may be a set of points associated with a given category (exemplar models; …), a summary statistic ( prototype models; …), or a set of statistics (clustering models; …) computed over points associated with a category.”
  10. Items closer to other examples in the class, or to the prototype are considered to be most typical or likely
  11. But they don’t propose that an accurate model is exactly the same thing a computer does, as there are examples of where nonintuitive things happen.
    1. Ex/ culture can influence how things are categorized, as can a current task or other context
  12. “Here, our goal is to develop a method for measuring the internal structure of neural category representations and test how it relates to physical and psychological measures of internal structure.”
  13. The neural typicality measure is related to nonparametric kernel density estimators, but “A key difference between our measure and related psychological and statistical models is that instead of using psychological or
    physical exemplar representations, our measure of neural typicality is computed over neural activation patterns…”
  14. Use a well studied research paradigm of categorizing simple bird illustrations into 4 categories based on neck angle and leg length.  Previous results show people reconstruct classes based on average item for each category
  15. “Our primary hypothesis is that psychological and neural measures of internal structure will be linked, without regard to where in the brain this might occur.”
    1. Also expect that some categorization will happen in visual cortex, and higher level temporal and medial-temporal regions, which “…. are theorized to bind together features from early visual regions into flexible conjunctive category representations (…).”
    2. There are other parts relevant to categorization, but not particularly this form of visual categorization, and other parts may be sensitive to things like entropy
  16. “To foreshadow the results, we find that neural typicality significantly correlates with subjects’ perceptions of typicality in early visual regions as well as regions of the temporal and medial temporal cortex. These results suggest that neural and psychological representational spaces are linked and validate the neural typicality measure as a useful tool for uncovering the aspects of category representations coded by specific brain regions.”
  17. “For analysis of behavioral responses, response time, and typicality ratings, a distance-to-the-bound variable was constructed that gave each stimulus’ overall distance from the boundaries that separate the categories in the stimulus space. Distance-to-the-bound is a useful measure of idealization: items that are distant from the bound are more idealized than items close to the bound (…).”
  18. “For the psychological typicality measure, a value for each of the Test Phase stimuli was generated by interpolating, on an individual subjects basis, a predicted typicality rating from the subjects’ observed typicality ratings…”
  19. Also did a physical typicality measure, which is pretty simple to understand (just neck angle, leg length measurements)
  20. Then a neural typicality <too much details to list here>
    1. “Our neural typicality measure is based on similarities between multivariate patterns of activation elicited for
      stimuli in the task. Stimuli that elicit activation patterns that are like other members of their category are more neurally typical than those that elicit dissimilar patterns of activation.”
  21. Subjects’ behavioral responses were predicted by SVM
  22. Typicality ratings were highly correlated with distance-to-the-bound
    1. Reveals that most typical items, and not the average item are the one that is used for category representation.  There are a few other results that show this is the case through other methodology
  23. Neural typicality is linked to psychological typicality
  24. Found activity in visual cortex and MTL that have been found to be linked to categorization
  25. “These results suggest that, in the present task, the internal structure of neural category representations in temporal and occipital regions are linked to subjects’ psychological category representations such that objects that are idealized or physical caricatures of their category elicit patterns of activation that are most (mathematically) similar to other members of their category.”
  26. “… in the present task, physical similarity is not a significant contributor to the internal structure of neural category representations, at least not at a level that is amenable to detection using fMRI.”
  27. Also did MDS for classification on the neural data, <results don’t seem amazing, but only ok>
  28. SVM for classification “The SVMs are given no information about the underlying stimulus space, and unlike
    the MDS analysis, do not make any assumptions about how the dimensions that separate the categories will be organized. Thus, the SVMs can be sensitive to regions that code rulebased or behavioral differences between categories, regions that encode information about their perceptual differences, or regions that code some combination of behavioral and perceptual information.”
  29. “Although there is strong overlap in the visual and MTL regions that discriminate between categories and represent
    internal structure, the motor/premotor, insula, and frontal regions were only identified in the between-category analysis. These results are consistent with the hypothesis that PFC and motor/premotor regions are more sensitive to behavioral aspects of categories (…). However, because behavioral responses are strongly associated with the perceptual characteristics of each category, the SVM results are also consistent with the hypothesis that these regions contain some perceptual information about the categories.”
  30. “The present research adds to the growing consensus that categorization depends on interactions between a number of
    different brain regions… An important point that this observation highlights is that there may not be any brain region that can be thought of representing all aspects of categories, and thus it might be most accurate to think of brain regions in terms of the aspects of category representations that they code.”
  31. “…in the present context, the deactivation of regions of the striatum with increasing typicality likely indicates an uncertainty signal, as opposed to category representation…”
  32. “Because our neural typicality measure is not based on mean activation-level differences between stimuli, it may be
    more directly interpretable and less susceptible to adjacency effects in studies of longer term internal category structure.”

    1. <Hm, should read their methodology more carefully on another read-through>
  33. They don’t have results that indicate suppression of adjacent stimulus
  34. Says their methodology should be tested in real-world, and more artificial settings
  35. Evidence of “dimensional selective attention” where not all features are attended to for classificaiton
    1. “Attentional mechanisms in the PFC that instantiate rule-based strategies (…) may contribute to selective attention effects by influencing neural representations in a top-down manner.”
    2. Although: “In the present context, dimensional selective attention is insufficient for explaining the idealization effect because dimensional selective attention affects an entire dimension uniformally… additional mechanisms are required.”
  36. “Attention has been found to create a spotlight around salient regions of visual space such that the processing of stimuli
    close to this location in space is enhanced (not just differences along a specific dimension of visual space; …). It is conceptually straightforward to predict that the same or similar spotlight mechanisms may affect the topography of stored neural stimulus representations, such that regions of a category space that contain highly idealized category members are enhanced and contribute more to categorization and typicality judgments than exemplars in ambiguous regions of category space.”
  37. Another model is one that specifically tries to “… to reduce prediction error and confusion between categories (…). In these models, category members are simultaneously pulled toward representations/members of their own categories and repelled by members of opposing categories.”
    1. But this doesn’t seem to be a possible explanation here because “… the neural effects as actual neuronal changes in regions of early visual cortex happen on a much longer scale than our task.”
  38. This study only tried to find correlation between “psychological” and “neurological” responses, but more in-depth exploration of their relationship is a good idea and left for future work
  39. “Our task involves learning to distinguish multiple categories, akin to A/B tasks, and so our finding that early visual cortex is involved with representing category structure may be at odds with theories emphasizing the role of task demands (as opposed to featural qualities) in determining which perceptual regions will be recruited to represent categories.”
    1. Although these distinctions may be an artifact of the type of analysis used

Precis of Unified Theories of Cognition. Newell. Behavior and Brain Sciences 1992

<SOAR, and his last publication>

  1. His goal is to develop a model that covers all of cognition
  2. Soar is used as a theory for many things, from low-level immediate responses to learning, language, and problem solving
  3. Discusses his book on SOAR, its a personal perspective
  4. “You can’t play 20 questions with nature and win”
  5. Chapter 2: Foundations of Cognitive Science
  6. Knowledge systems – behavior is based on knowledge (we use knowledge to make decisions)
  7. Representation (of knowledge): Discusses the applicability of factored symbolic representations.  A trick is finding the particular representation
  8. Computation: universal computers
  9. Symbols: Symbol systems contain:
    1. memory with independently modifiable structures containing symbols
    2. symbols, which provide distal access to other structures
    3. operations, which take in symbols and output symbols
    4. interpretation: taking symbol structures and executing operations
  10. Architectures: “Unified theories of cognition will be formulated as architectures.”
    1. Much of the arch of the mind is the same <or at least similar> between individuals
    2. “in biological systems it is the level of neural structure that is organized to provide symbols.”
    3. “The architecture provides the boundary that separates structure from content…”
  11. Intelligence: “A system is intelligent to the degree that it approximates a knowledge-level system.
    1. The distinction between knowledge and intelligence is key… intelligence is the ability to use the knowledge the system has in the service of the system’s goals.  This notion answers many requirements of a concept of intelligence, but it does not lead directly to a quantitative measure of intelligence, because knowledge per se is not quantifiable.”
  12. Search and problem spaces
    1. How does processing lead to intelligent behavior? How does it use knowledge to accomplish something?  This occurs by search
    2. “Search is not just another cognitive process, occurring alongside other processes (the view prior to the cognitive revolution), but the fundamental process for attaining tasks that require intelligence.”
    3. Search is used when error predictions occur – what do I do now.
    4. Used when there isn’t any indication as to what to do already sitting in knowledge and it has to be figured out “generate and test”
    5. Using a problem space is a way of framing the problem so search is tractable (restricting what is considered must be a part of this)
      1. Knowledge can be used to decide what the problem space is.  With enough knowledge, the problem space is so small it is answered immediately
  13. Chapter 3: Human Cognitive Architecture “This chapter attempts to discover some generally applicable constraints on the human architecture.”  Admittedly the points here are speculative
    1. Humans specifically are a symbol system because we can do such an enormous variation of things and ways of thought.
    2. “Any system that is sufficiently flexible in its response functions must be a symbol system (i.e., capable of universal computation).”
    3. A hierarchy of symbol systems. “Higher system levels are spatially larger and run more slowly than do lower ones, because the higher levels are composed of multiple systems at the next lower level and their operation at a higher level comes from the operation of multiple interactive systems at the next lower level.”  The change in size and speed per level is an order of magnitude
      1. Argument of biological basis of this: organelles -> neurons -> neural circuits (1/10 sec).  Thats called the biological band; above that is the cognitive band with different amounts of thinking taking different orders of magnitudes of time (basic thought starting at 1 sec)
    4. “Only about 100 operation times are available to attain cognitive behavior out of neural-circuit technology.   This constraint is extremely binding. It provides almost no time at all for the cognitive system to operate. The constraint may also be expressed as follows: Elementary but genuine cognition must be produced in
      just two system levels. Neural circuits (at —10 msec) can be assembled into some sorts of macrocircuits (one factor of 10) and these macrocircuits must then be assembled to produce cognitive behavior (the second factor of 10).”
    5. Basis of speed the lowest level of hierarchy is based on minimal possible speeds it can take place
    6. The new cognitive band: Timing is set up at

Image Theory: Principles, Goals, and Plans in Decision Making. Acta Psychologica 1987

  1. Idea that decision making is represented as images, each with different purposes:
    1. goals
    2. what would be the result of obtaining those goals
    3. plans to achieve goals
    4. anticipated results of plans <I suppose they mean near-term, because the end points are just #1>
  2. “Decisions consist of (1) adopting or rejecting potential candidates to be new principles, goals, or plans, and (2) determining whether progress toward goals is being made, i.e., whether the aspired-to future and the anticipated results of the plan implementation correspond.”
  3. Decisions are made either based on:
    1. Compatibility between candidates and existing goals, as well as compatibility with desired future states
    2. Potential reward of goal/plan
  4. <Some things here feel conflated, assume it will be cleared up>
  5. Is based on schemas.  “Images (…) are schemata that are specific to decision behavior and represent the decision maker’s guiding principles relevant to some sphere of decision making.  They also represent the decision maker’s goals in that sphere, what he or she is doing to reach those goals, and his or her view of how well those efforts are succeeding.”
  6. Self-image is how we see ourselves.  Self-image is made of principles, which drive selection of goals
  7. Trajectory image is where we see our plan taking us (both end points, and along the way).  Made up of goals (can be concrete, specific, vague, abstract).
  8. Action image is composed of plans
  9. Goal adoption is accepting the endpoint of a plan
  10. Plans are abstract, made concrete through tactics
    1. Not all tactics must be completely ironed out – items can be left to be dealt with as the plan unfolds
    2. Tactics may have a dependency order, or may be independent (so can be undertaken in any order or concurrently)
  11. The projected image consists of the anticipated events and states that one foresees occurring (1) if one adopts a particular candidate plan in order to attain a specific goal or (2) if one continues with the plans that already have been adopted and that currently are being implemented.”  Deals with expected outcomes
  12. Decisions are of two forms:
    1. Adoption: accept/reject parts of images
    2. Progress: are things going as they should according to plan, if things aren’t going well, a plan may need to be rejected and another on adopted.  If a new plan can’t be found, the goal may have to be rejected
  13. Decisions are made based on compatibility (does it fit well enough – above some threshold – doesn’t have to be perfect) and profitability (what is the reward)
    1. Violations of compatibility may not always be fully conscious, could lead to some emotional sense that something isn’t right
    2. By default we accept things as compatible and only discard with enough evidence.  Favors status quo
  14. Doubt in terms of likelihood of decision success discounts its reward (go by expected reward)
  15. Decisions made based on rejecting those that are incompatible and then selecting from those left the most profitable
  16. Adoption and the evolution of images – how images change over time (developmentally) <skipping>
  17. Future directions
  18. <Not a fruitful read>

Motor Effort Alters Changes of Mind in Sensorimotor Decision Making. Burk, Ingram, Franklin, Shadlen, Wolpert. PLOS One 2014.

  1. Studies when people change decisions after already committing to an action, and can even happen in situations where the stimulus provided is removed once movement starts (that means that the change takes place after the stimulus is already removed)
  2. Looks at the threshold where decisions change, and here proposes that it is linked to the physical effort associated with the movement (and how far the first target is from the second)
  3. Based on drift-diffusion model
  4. Change in time between stimulus removal and movement change are usually on the order of 400ms (because its removed after motion starts, its removal can’t impact the original motion, but is processed once it starts)
  5. “Fits of the model showed that the change of mind bound did not require as much information as for the initial decision and also that not all the information in the processing pipeline was used, that is there was a limited time for which new information was processed.”
  6. Random dot motion test, with a yoke that had to be moved to one of two positions to indicate motion.  Once motion stopped, stimulus was extinguished
  7. Model holds that there is two decision boundaries: one to start the initial motion, and another to cause change in motion to second target (the model has accumulation going after stimulus until a timeout, or a change in motion, whichever occurs first)
  8. Changes of mind were most common when the motion data was weak, and motion initiated in the wrong direction (in most cases, the changes lead to the correct choice being made)
  9. The 400ms measured is consistent with previous studies on humans and monkeys (neural recordings in the monkey have about “200 ms latency to the start of evidence accumulation [...] and latency from the signature of decision termination to the initiation of the behavioral response. (~70 ms for saccades [...] and ~170 ms for reaches[...])”
  10. 3 of 4 subjects reduced rate of direction change as the angular separation (and therfore end distance) of targets increased
  11. One model holds that there is different populations representing the left and right choice (as opposed to just one) and that there is “… a race between two diffusion mechanisms [...].  This implies that processing in the post-initiation period may not begin at the termination bound for the initial choice, but at a more intermediate value achieved by the losing mechanism.”
    1. A model for DDM thats not just 2AFC
Tagged ,

Social Cognitive Theory of Self-Regulation. Bandura. Organizational Behavior and Human Decision Processes 1991.

<Useful link http://homepages.rpi.edu/~verwyc/bandura.htm>

  1. Self-regulation has 3 major sources:
    1. Monitoring behavior, and the results
    2. Judgement of behavior relative personal standards and environment
    3. “affective self-reaction” self-reward or punishment based on outcomes of behavior
  2. Self-regulation is also related to personal agency
  3. <Just based on the language this guy uses this paper is dubious.  Redolent of snowing that philosophers use.>
  4. <Some points here are obvious, but I will try to note them briefly anyway>
  5. In order to do planning we must be able to do symbolic manipulation
  6. Planning is not only a factor of external environment – self reflection allows us to control our behavior, and we have to understand that our actions are meaningful.  This comes from self-monitoring
  7. If cause and effect are close in time, understanding implications through self-monitoring is simpler
  8. RL idea – actions with good outcomes are reinforced and actions with bad outcomes are suppressed
  9. Distinguishes between self-monitoring and self-observation, but its not clear to me what the distinction is.
  10. “Moreover, people differ in their self-monitoring orientations in the extent to which they guide their actions in terms of personal standards or social standards of behavior (Snyder, 1987).  Those who have a firm sense of identity and are strongly oriented toward fulfilling their personal standards display a high level of self-directedness.  Those who have a firm sense of identity and are strongly oriented toward fulfilling their personal standards display a high level of self-directedness.  Those who are not much committed to personal standards adopt a pragmatic orientation, tailoring their behavior to fit whatever the situation seems to call for.  They become adept at reading social cues, remembering those that have predictive value and varying their self-presentation accordingly.”
  11. We develop personal standards partially based on social feedback, also we often judge our performance on objectively scored tasks relative to others
  12. “In everyday life, people imbue remarkably varied activities, many seemingly trivial in character, with high evaluative significance as when they invest their self-esteem in how far they can toss a shot-put ball.”
  13. People are less satisfied with accomplishments when the results are partially the result of the actions of others.  Likewise they may be upset if something happens that is bad and their fault, but if not their fault the feeling of regret is reduced
  14. People motivate themselves by providing self-incentives – exercising and then having some ice cream.  Results have shown that self-incentivising is important for regulating behavior, especially in unstructured environments
  15. <Ok now we are getting to more useful stuff>
  16. Functioning of Self-regulatory Systems
  17. Self-efficacy system. Belief in efficacy has a huge impact in how/what decisions are made: “People’s beliefs in their efficacy influence the choices they make, their aspirations, how much effort they mobilize in a given endeavor, how long they persevere in the face of difficulties and setbacks, whether their thought patterns are self-hindering or self-aiding, the amount of stress they experience in coping with taxing environmental demands, and their vulnerability to depression.”
  18. Self-efficacy also ifluences how we attribute success/failure.  People with high belief of self-efficacy will attribute good outcomes to themselves and bad to external causes and vice versa
  19. We tend to enjoy tasks at which we deem ourselves efficacious, and derive pleasure from mastering them (becoming even more efficacious)
  20. The negative feedback model
    1. Discuss “the basic regulator in control theory” <like a linear quadratic regulator>
    2. “psychobiologic homeostatic theories”
    3. “cybernetic TOTE model”
    4. Equilibration (sole source of motivation in Piaget’s theory)
  21. The general idea in negative feedback models are that simply try to reduce the disparity between the current state and the goal state
  22. “A regulatory process in which matching a standard begets inertness does not characterize human self-motivation.  Such a feedback control system would produce circular action that leads nowhere.  Nor could people be stirred to action until they receive feedback of a short-coming.”
  23. Some form of feedback is necessary for regulation of motivation, but people self-motivate by taking goals before any feedback occurs.  Furthermore, goal-setting allows one to set a basis by which regulation can later take place. So this negative feedback doesn’t explain high-level planning well, but perhaps its ok for low-level: “… anticipative or proactive control operates as the primary system in the mobilization of motivation and reactive feedback specifies the further adjustments in effort <why only effort> needed to accomplish desired goals.”
  24. “Human self-motivation relies on both discrepancy production and discrepancy reduction.” (must have both feedback and goals)
  25. feedbackgoals
  26. Hierarchical Structure of Goal Systems
  27. “… proximal goals are not simply subordinate servitors of valued loftier ones as commonly depicted in machinelike hierarchical control systems.  Through engagement of the self-system, subgoals invest activities with personal significance.”  Indeed, sometimes, the motivation to performance of subgoals can override progress toward the actual goal.
  28. Aspirational standards: the standard you set for yourself determines the level you accept when satisficing
  29. <Ok, stopping here, think its enough>

Methods of Heuristics. Groner, Groner, Bischof (eds). Book 1983

<Pretty neat book actually, contains proceedings of multidisciplinary symposium on Methods of Heuristics.  Has a chapter by Minsky, for example.  No time to read the whole book.  George Polya, of Polya’s urn fame devoted a great deal of his work to heuristics, and is discussed here as well.  He was invited but health prevented him.  Piaget was also slated to talk, but died the day he was originally asked to speak – his longtime collaborator did come, though. De Groot is also here but I think I have it covered from previous reading.>

Chapter 6: Heuristics and Cognition in Complex Systems.  Dorner

  1. Well defined problems have the following features:
    1. Goal state is known
    2. Rules of the domain are known
  2. Often, following constraints are added
    1. State changes only through the planner
    2. Problem is not too big
    3. Is fully observable
  3. In complex systems
    1. The goal is vauge, perhaps multi-factor – goals that have multiple weighed aspects (this makes the problem into more of a reward problem than graph search)
    2. Results of all operators are unknown, or perhaps, not even all the operators are known
    3. Partial observability
  4. Move on to how to start attacking problem (such as where to start making unknown items known)
  5. When anchoring the problem by nailing down unknown items, a stopping rule may be needed, especially in cases with real-valued information.  The idea is to stop once a reasonable resolution level has been reached.  This should be the minimal amount needed to reach the goal
  6. In real life, people often attempt to achieve goals that are mutually exclusive, but are not aware of it
  7. In some cases, setting subgoals or simpler heuristic goals in place of the true goal can lead to poor behavior: “When a S [a participant] in the tailor shop game reasoned about a way to get more profit by selling his products, the S finally decided to strive for a higher sales rate.  First he tried to get a higher rate of sales by advertisement.  When this had not sufficient effect, the S decided to lower the prices.  This measure was effective; the S sold his whole production, without making any profit, as he sold products for less than his costs.”
  8. In some cases also, the interim goal gets all importance, and the individual forgets about the original goal entirely (this happens alot in science, where answering a preceding question becomes of significant interest, and can be a large distraction)
  9. People might also not choose goals at all.  Lindblom (1964) discussed a few symptoms of this sort of behavior:
    1. Thematic vagabonding: they continually change their course of action and therefore never make significant progress
    2. Encapsulation: over-commitment to some approach
    3. Both are escape tendencies where working on the actual problem is avoided: “They do not solve the problems the should solve (but can’t), but rather those they can (but shouldn’t).  Often the replacement of a final goal by an interim goal may be a kind of encapsulation.”
  10. Another potential problem is that individuals only collect or pay attention to data that fits in with their preconcieved (and potentially incorrect) conception of the problem.  This is called the use of a dogmatically entrenched system.  “That means that the individual never again gets negative feedback; his system of assumptions becomes dogmatic.”
  11. These “cognitive degenerations” can be due to a feeling of incompetence.  Seeking out information requires both that one doesn’t have enough information, but also that one is capable of obtaining information and using it.  Dogmatism is the wrong way of securing a feeling of competence.
  12. <…>

Toward a General Theory of Expertise. Ericsson, Smith (Eds). Book 1991.

<Notes will be very sparse>

Chapter 1: Prospects and Limits of the Empirical Study of Expertise: An Introduction.  Ericcson, Smith

  1. For chess, De Groot set up well-defined tasks for analyzing chess expertise not by watching players go through full games (which would be too diffuse in terms of the entire state space), and instead presented chess positions and asked players to only select the next move
    1. This isn’t exactly possible though, because in general you can’t exactly solve a board position in chess due to complexity
  2. De Groot used “thinking aloud” experiments by players of different skill levels
  3. De Groot found that when using the thinking aloud approach with next move queries, experts and masters took around 10 minutes: “In the beginning, the players familiarized themselves with the chess position, evaluated the position for strengths and weaknesses, and identified a range of promising moves.  Later they explored in greater depth the consequences of a few of those moves.  On average, both masters and experts considered more than thirty move possibilities involving Black and White and considered three or four distinctly different first moves.”
  4. He found that masters and experts didn’t differ in their rollout depth
  5. The differences between the two groups; but masters generally mention the best move during familiarization, whereas experts found the best move later on.  This implies that move selection in chess generally comes down not to improved computation but rather improved board-value representation. “De Groot (1978, p. 316) argued that mastery in ‘the field of shoemaking, painting, building, [or] confectionary’ is due to a similar accumulation of experiential linkings.”
  6. During tests on board memorization (exposure from 2-10 seconds) improved recall was linked to improved playing ability.  Chase and Simon followed up on these experiments
    1. For random board configurations (not arrived at during natural play), recall between masters and novices was equivalent… “showing that the superior memory performance of the master depends on the presence of meaningful relations between the chess pieces, the kinds of relations seen in actual chess games.”
    2. Recall of piece location did not occur smoothly over time – there would be bursts which corresponded with logical chunking; masters were found to have different <larger> chunk sizes
    3. “Chase and Simon (1093) found that the number of chunks recalled by chess players at all skill levels was well within the limit of around 7 +/- 2 <so it seems not to be the case that masters are simply better at all recall tasks>.  They attributed the difference in memory performance between strong and weak players to the fact that the more expert chess players were able to recognize more complex chunks, that is, chunks with a larger number of chess pieces per chunk.”
  7. Estimated 3,000 hours to be an expert, 30,000 to be a master
  8. Better expert memory in areas of expertise has been shown in many other domains.  Although experts may acutally forget parts of the information, it is usually in the case where that information is irrelevant (for example, forgetting symptoms that aren’t related to the diagnosis of a patient)
  9. (p.20) “The types of differences found in a wide range of domains of expertise are remarkably consistent with those originally noted by de Groot (1978) in the domain of chess.  Expert performers tend to retrieve a solution method (e.g., next moves for a chess position) as part of the immediate comprehension of the task, whereas less experienced subjects have to construct a representation of the task deliberately and generate a step-by-step solution, as shown by research on physics problems (…) and algebra-word problems (…).  Medical experts generate their diagnoses by studying the symptoms (forward reasoning), whereas less experienced medical students tend to check correctness of a diagnoses by inspecting relevant symptoms (backward reasoning) (Patel & Groen, chapter 4, this volume).”
  10. <Next paragraph> On the same theme, expert performers have a body of knowledge that not only is more extensive than for nonexperts but is also more accessible (…).  Whenever knowledge is relevant, experts appear to access it efficiently (…).  The experts are therefore able to notice inconsistencies rapidly, and thus inconsistent hypotheses are rejected rapidly in favor of the correct diagnosis (…).  On presentation, information in the problem is integrated with the relevant domain knowledge (Patel & Groen, chapter 4, this volume).”
  11. p.22 discusses domain specific memorization schemes
    1. In categorization of physics problems, experts categorized them based on solution methodology that could be applied, whereas novices categorized them based on superficial aspects of the problem, such as the types of objects being discussed.
    2. Studies of board recall in chess show that masters also utilize forms of long term memory (not just short-term) in the task.  Additionally, chunks are formed such that in many cases there is overlap so that there are also encodings of how chunks relate to each other.
    3. Recall also depends on task; as mentioned doctors may forget symptoms irrelevant to diagnosis, and similar results with studies on programming
  12. Studying performance of experts in the lab can be difficult because tasks in the lab must match the same tasks that the experts are experienced in
  13. Experts have faster response time, better ability to plan ahead, and better memory (all in the particular domain of expertise)
  14. Chase and Simon theory: (p.26)
    1. Difference in ability is related to immediate access to relevant knowledge (retrieving chess board positions/relevant chunking) (1973 – perception in chess)
    2. Theoretical account of how experts extract best moves from long-term memory
    3. Chunks serve as cues to activate best move recall
    4. “The chess masters’ richer vocabulary of chunks thus played a critical role in the storage and retrieval of superior chess moves.”
  15. Accounts focusing on practice and learning: (p.27)
    1. Improvement in a task often follows a power law <serious diminishing returns> (Newell & Rosenbloom, 1981).   They also consider chunking here
    2. Fitts proposed 3 stages in skill aquisition:
      1. Cognitive: cognitive effort to understand the task and what parts to pay attention to
      2. Associative: “… making the cognitive process efficient to allow rapid retrieval and perception of required information.”
      3. Autonomous: “… performance is automatic, conscious cognition is minimal.”
    3. “First, it is important to distinguish between practice and mere exposure or experience.  It is well known that learning requires feedback in order to be effective.  Hence, in environments with poor or even delayed feedback, learning may be slow or nonexistent.”
    4. In some domains, performance never really improves, even after enormous amounts of practice – this is often the case when the domain is chaotic.  Time spent doing something isn’t always a good measure of proficiency
  16. Accounts focusing on memory functioning: (p.28)
    1. “The Chase-Simon hypothesis that superior memory of the expert reflects the storage of more complex independent chunks in short-term memory has been seriously questioned, and most of the empirical evidence also suggests storage of interrelated information in long-term memory, as mentioned earlier.”
    2. Experts happen to develop excellent memory for the task of interest, although setting out with the goal just to develop the same memory ability (with no improvement in the actual task itself) one can develop memory on the level of a master quite quickly
    3. There is a school of thought that holds that in the above situation, those that trained specifically for recall are using only short-term memory, whereas experts go through the loop of accessing long term memory, but do that very quickly so it seems the same as short-term memory.
    4. Accounts focusing on the ability to plan and reason: (p.31)
      1. Chess masters can play “mental chess,” keeping track of the progress of a game simply by being told the move sequence.  “This research raises the possibility that acquisition of expert-level chess skill involves the development of skilled memory for chess positions.”
    1. “Charness (1981) found that the depth to which a possible move sequence for a chess position was explored was closely related to the level of chess skill, at least for chess players at or below the level of chess experts.” <but I think I remember reading that there wasn’t much difference between experts and masters, oh immediately they say that is what de Groot found.>
    2. “One should also keep in mind that the task of searching for a move for a middle-game chess position is not designed to measure the capacity to make deep searches and hence may well reflect pragmatic criteria for sufficient depth of exploration to evaluate a prospective move.”
    3. “In the absence of a strict time constraint, there appears to be no clear limit to the depth to which a chess master can explore a position.” <due to the ability to play mental chess perfectly>
    4. Abilities of chess masters to play mental chess “… was consistent with the characteristics of skilled-memory theory (Chase & Ericsson, 1982; Ericsson & Staszewski, 1989).”
    5. In medical diagnoses, doctors must integrate evidence, not all of which may be available at the same time
  17. “The most effective approach to organizing the results across different domains of expertise is to propose a small number of learning mechanisms that can account for the development of similar performance characteristics in different domains within the limits of human informational capabilities.  There is now overwhelming empirical support for the theory of acquisition of skill with mechanisms akin to those originally proposed by Chase and Simon (1973).”  Which they themselves claimed was just a preliminary attempt at a theory.

Chapter 2: Experts in Chess: The Balance Between Knowledge and Search.  Charness

  1. “Because of its unique properties – particularly its rating scale [elo] and its method of recording games – chess offers cognitive psychologists an ideal task environment in which to study skilled performance.  It has been called a Drosophila, or fruit fly, for cognitive psychology (Charness, 1989; Simon & Chase, 1973).”
  2. Here, what is considered is “… the opportunity for trading off knowledge and search to reach a a single goal: skilled play.”
  3. Also considers how computer chess works
  4. Research on chess found that between experts and masters, search size was about the same, but recall/chunking efficiency (not # of chunks) was better in masters.  The conclusion therefore was “… that chess skill depended on a large knowledge base indexed through thousands of familiar chess patterns.  They theorized that recognition drives move generation in search, enabling the skilled player to examine promising paths, but leaving the less skilled to wander down less productive paths.” <Better heuristic accuracy>
  5. “Nonetheless, further research has revealed some apparent flaws in a strictly recognition-based theory.  Other studies have brought into question the notion that recall of briefly seen chess positions would depend on the type of short-term memory system simulated by Simon and Gilmartin (1973).”  Masters were still better at move selection for unnatural board configurations (even though their recall and that of experts was the same).  This, along with a few other results showed “… a simple recognition-association theory was inadequate to account for all the data.”
  6. “Both I (Charness, 1976) and Frey and Adesman (1976) demonstrated that when chess players recalled briefly seen positions, information was not retrieved from short term memory.  My study showed virtually no interference when players had to perform interpolated processing between exposure to the chess position and recall… Clearly a more sophisticated view of skilled memory, such as that proposed by Chase and Ericsson (1982), Ericsson (1985), and Ericsson and Staszewski (1989), is needed to account for recall effects.  These theorists have stressed the importance of domain-specific, easily activated, long-term-memory retrieval structures in recall performance.”
  7. In a longitudinal study, Charness retested a player after a 9 year delay, where the player started at average tournament level strength and ended up an international master.  “DH [the player] showed virtually no change in search (depth, extent), but did show major changes in recall, evaluation, and chunking… The major changes seemed to be pattern-related… the significant factor in skilled chess play at the top levels is what is searched, not how exhaustively or deeply the search is conducted.”
  8. Masters are less impacted by time pressure than lower-quality players
  9. There is also literature on abacus calculation (Hatano, Miyake, & Binks, 1977) <I know that those skilled with the abacus can also do “mental calculation” and can keep track of bead positions and changes fully in their head, just as chess masters can>
  10. A questionnaire (partially dealing with openings) is a better predictor of chess ability than the recall task
  11. <Lots of discussion of size of chess, number of openings, middle, and endgame knowledge, other aspects of metagame, learning from books as opposed to direct play>
  12. “Incidental” serial memory: good players can often recall large portions of a game right after the match, and masters can sometimes recall entire games from months or years earlier.
    1. Game trajectories can be encoded partially in terms of openings, closings, and other logical chunks
  13. “It is probably fair to characterize much of human learning as pattern learning.  An unanswered question is that of whether certain patterns are easier to learn (and model) than others.  Both psychometric investigations and neuro-psychological research provide evidence that all processing is not the same: Some people are better at spatial tasks; others at verbal tasks.”

Chapter 4: The General and Specific Nature of Medical Expertise: A Critical Look. Patel, Groen

  1. “Two fundamental empirical findings in research on expert-novice comparisons have been the phenomena of enhanced recall and forward reasoning.  The first refers to the fact that experts have superior memory skills in recognizing patterns in their domain of expertise.  This is extensively reviewed by Ericsson and Smith (chapter 1, this volume).  The second pertains to the finding that in solving routine problems in their domains, expert problem-solvers tend to work ‘forward’ from the given information to the unknown.  With the exception of Anzai’s study (chapter 3, this volume <on reasoning of physics problems, I didn’t have time to read>), this is not so extensively treated in this volume, but it has been discussed at length in a recent article by Hunt (1989)…”
  2. For details on the Hunt paper, check this out, <turns out forward and backward have different meanings than what I am used to, and the type of planning I am considering at the moment is actually the backward style, as defined here>
  3. “It might be noted that the distinction is frequently made, perhaps more generally, in terms of goal-based (backward) versus knowledge-based (forward) heuristic search (e.g. Hunt, 1989).”
  4. “The distinction between forward and backward reasoning is closely related to another distinction between strong problem-solving methods, which are highly constrained by the problem-solving environment, and weak methods, which are only minimally constrained.  As Hunt pointed out, the distinctions are logically independent.  Forward reasoning, however, is highly error-prone in the absence of adequate domain knowledge because there are no built-in checks in the legitimacy of the inferences.  Therefore, success in using forward reasoning is constrained by the environment because a great deal of relevant knowledge is necessary.  Hence, it is a strong method for all practical purposes.  In contrast, backward reasoning is slower and may make heavy demands on working memory (because one has to keep track of things as goals and hypotheses).  It is, therefore, most likely to be used when domain knowledge is inadequate, in which case there is a need for a method of reasoning that is minimally hampered by this lack of knowledge.  Hence, backward reasoning usually is a symptom of a weak method.”
  5. Here the focus isn’t on differences between experts and novices, but rather “… an emphasis on the factors determining accurate performance and the robustness of the recall and forward-reasoning phenomena under variations of these factors… these phenomena are not as closely related as was implied by what Ericsson and Smith (chapter 1, this volume) refer to as the original theory.  Specifically, there appears to be a ceiling effect associated with the recall of clinical cases.  Beyond that level, however, there continues to be a strong relation between diagnostic accuracy and the use of forward reasoning.”
  6. Development from novice to expert is a 3 stage process:
    1. “… development of adequate knowledge-structure representations.”
    2. learning what is relevant and irrelevant in a problem
    3. “… learning how to use these relevant representations in an efficient fashion”
  7. Study presented data in a very structured (non-naturalistic manner)
  8. In identifying forward reasoning, did some graph-representation <although exactly how isn’t totally clear>
  9. “Forward reasoning corresponds to an oriented path from a fact to a hypothesis.  Thus, forward-directed rules are identified whenever a physician attempts to generate a hypothesis from the findings in a case.  Backward-directed rules correspond to an oriented path from a hypothesis to a fact.”
  10. They then asked other experts for causal rules explaining each case, and transformed them into production rules.
  11. Experts and “subexperts” (the next level below, but above “intermediate” – in this case it meant asking doctors questions about a medical issue outside their specialization) had the same recall, although diagnostic accuracy decreased
  12. An earlier study which seems to form the basis of this chapter(Patel & Groen 1986) found that all cases where pure forward reasoning were used corresponded to correct diagnoses, and that in any case where forward reasoning was not used
  13. Those working outside of their domain of expertise used a combination of forward and backward reasoning
  14. <skipping a bit>
  15. In the problems studied here, recall (as was studied by De Groot, among others) was not an accurate metric of performance due to ceiling effects (experts and subexperts both had perfect recall, although their actual performance in diagnosis differed).  There is actually a nonmonotonic relationship between recall and accuracy in theses studies (there were 5 levels of expertise)
  16. Previous studies assumed recall, diagnostic accuracy, and forward reasoning were all correlated.  “Thus, a theory that simply assumes that the development of expertise is related to the development of better representations cannot be true.”
  17. The findings argue against a couple of theories:
    1. Argues that medical diagnosis is not simply pattern recognition
    2. Argues against the idea that rules cannot be structured into some kind of hierarchy
    3. “Both of these theories posit a close relationship between chunk size in working memory and performance in problem-solving tasks.  Hence, they predict a monotonically increasing relationship between recall and diagnostic accuracy, which as we have seen, does not hold.”
  18. Results argue in favor of SOAR model (Laird, Rosenbloom, Newell 1985) (seems to be a pretty GOFAI model).  Has its own chunking system, and allows for forward and backward reasoning
  19. Argue for 3 kinds of expertise:
    1. Generic: development of adequate representations (for example experts and subexperts had the same recall)
    2. Specific: <not really clear on the point they are making here>
    3. Domain-independent: weak methods – used when there is not sufficient base information, and information must be searched for.  “In contrast, strong methods are more akin to decision making than to search and are highly dependent on an adequate knowledge base.”
  20. In the studies on physics problems, there is good evidence that problem solving is a mixture of forward and backward reasoning.  Forward reasoning is used on routine parts of problems, and backward reasoning on “nonroutine situations”
  21. That is backward reasoning can be used to “stitch together” a logical argument in situations that are difficult somehow (either because of lack of expertise, or because the problem is just hard)
  22. Argue that this form of generic expertise (at least being able to identify relevant parts of a problem, discard the rest, and use backward reasoning where there is a lack of expertise in that exact domain).  This is how doctors making diagnoses outside of their field of expertise function
  23. “Intermediates conduct irrelevant searches, whereas experts do not.  Novices do not conduct irrelevant searches simply because they do not have a knowledge base to search.”

Chapter 10: Techniques for Representing Expert knowledge

<Lots of the stuff here falls under categories of classical AI, linear algebra dimension reduction, hierarchical clustering, just making a concrete note about one item of interest>


  1. “Indeed, some of the continuing research themes have to do with how the organization of concepts for an expert differs from that for a novice…”
  2. The nature of the question requires questions about particular small testable aspects of the task of interest (such as recalling chess positions, as opposed to playing through games of chess)
  3. Major issue is how to elicit and then describe expertise
  4. Both direct (interviews, thinking out loud, observation of task performance, closed curves <see below>) and indirect methods (such as giving pairwise similarities and running through MDS, or hierarchical clustering)
  5. IMG_20140905_141807640~2[1]
  6. “Reitman (1976) asked a master of the game of Go to draw closed curves around related stones involved in a position in the game.  Figure 10.4 illustrates several aspects of his responses.  Two positions are displayed, with the master’s encircling of related stones.  In addition, each stone bears a number that represents the ordinal position in which that stone was placed on the board in a recall task six months later <!>.  Note that the recall order matches the closed curves to a remarkable degree: Nearly always, all stones of an encircled chunk were recalled before moving on to another chunk.  This regularity of behavior supports claims for the validity of the information contained in the originally closed curves.”
Tagged ,

Cognitive Science: Definition, Status, and Questions. Hunt. Annual Review of Psychology 1989.

  1. <Much of this is a description / justification of cognitive modelling, which I won’t be taking notes on>
  2. Newell & Simon’s Production System Programming
    1. Some representation of current state
    2. A policy
    3. Called a programming notation
    4. Problems can have different “grain sizes” which seems to mean level of abstraction (of atoms)
  3. ACT*
    1. Distinction between declarative and procedural information
    2. Learning is part of computation / program execution “… in which procedural and declarative information in long-term memory is used tor construct new productions.”
    3. <next P>”If learning is production construction, then transfer of learning from one task to another will be determined by the number of productions the two tasks share, rather than the number of common actions required by the originally learned and transfer tasks.”
  4. Studies of thought at the Representational Level: Reasoning
  5. “… psychometric studies have found that people who do well on pure deductive problems do not always show superior inductive reasoning, and vice versa (…).”
  6. “Newell & Simon’s (…) work on a domain independent deductive reasoning program, the General Problem Solver (GPS), is generally considered the foundation of cognitive science.  To use GPS a person provided the program with a set of known statements, rules for deriving new statements from old statements, and the description of a desired (goal) statement.  The program contained heuristics for problem solving, defined in its own internal language, that allowed it to use the domain-specific rules to develop a chain of inferences linking the known statements to the goal statement.  Three aspects of the GPS approach recur in varying forms in all modern studies of representational thought: searching a problem space, goal-directed problem solving, and reliance on weak (context-free) problem-solving methods.”
  7. “GPS’ heuristic procedure was means-end (or backward-driven) problem solving.  The program compared the goal to the current state, determined the differences between them, and then searched for an operator that would reduce the difference… A slightly different technique, which emphasizes the backward nature of the method somewhat more, is to find those states from which the goal state can be reached, and then establish the GPS problem of reaching one of those states… In forward driven problem solving, on the other hand, rules of inference are chosen by an examination of the current state of knowledge, without regard to the goal state.”
  8. “Forward-driven problem solving is riskier than goal-based problem solving, because operations are executed … without first checking to see if these operations are likely to be an advance toward the goal.  On the other hand, forward-driven reasoning is cheaper, because operator selection is made without contrasting the present state of knowledge to the goal state. Thus, forward-driven problem solving is preferable if the problem solver knows enough about the problem-solving domain to recognize when certain actions should be taken.  This implies that a rational problem solver would use forward-driven reason in those (limited) domains with which he or she was familiar.  This turns out to be true.”
  9. <Still not really grokking the difference between forward and backward here.  How do you make forward progress if not in comparison to a goal?>
  10. Weak problem solving methods only place weak constraints on the problem.  GPS is weak because it is general, so the same heuristics can be applied to all sorts of problems if defined properly (such as minimize distance between state and goal).  “Strong” problem solving, on the other hand, are strongly tied to a particular domain, such as minimize king exposure in chess.
  11. Although there is not a necessary connection between  forward/backward and weak/strong, in general forward planning goes with strong because it is necessary to know how to behave without focusing explicitly on the goal, which is mostly applicable when there is strong knowledge of the domain
  12. “This has been shown in numerous studies contrasting ‘expert’ and ‘novice’ problem solving in domains ranging from physics (Larkin et al 1980; Larkin 1983) to economics (Voss et al 1983) and the law (O’ Neil 1987).  In all these fields experts appear to utilize forward-driven reasoning.  They recognize a situation and ‘immediately’ apply the appropriate rules for extracting information about that situation.”
  13. “In the terminology of cognitive science, the experts seemed to have memorized ‘schemas’ that function like ‘fill in the blanks’ forms for solving certain classes of problems.  Once the experts recognized the problem type they could apply a schema to guide further problem solving.  It is easy to see that schema-based problem solving is compatible with the production system architecture.  A schema can be thought of as a set of productions that are triggered when the schema’s preconditions are satisfied.  Just as in production execution, thinking is driven by pattern recognition. “
  14. <Oh, so it seems like forward reasoning isn’t really like forward search (how the planing community would call it) with heuristics; that is actually backwards reasoning.>
  15. So its a bit of a bummer because most expert behavior seems to be based on rules that are domain specific, and therefore hard to study, and hard to draw meaningful conclusions from.
  16. Applying a schema is deductive (top down), whereas choosing one is inductive (bottom up).  Schema creation is also inductive.
  17. <Inductive reasoning and classification based on prototypes, skimming>
  18. Experts and novices may make different classifications (the example of physics where experts categorize based on deeper aspects of the problem, whereas novices do so based on superficial aspects)
  19. <For some reason I’m missing a bit of the article, but I think I read the relevant part.>
Tagged ,

Improving UCT Planning via Approximate Homomorphisms. Jian, Singh, Lewis. AAMAS 2014

  1. Improves performance of UCT by:
    1. finding local abstractions in the DAG built by UCT
    2. using approximate homomorphisms (as opposed to true homomorphisms which preserve optimal policies, but are rare and computationally difficult to develop)
  2. Derive a lower bound on the performance of the abstraction method when used with UCT
    1. <If the bounds are specific for UCT it might be moot anyway because of the horrible worst-case performance of the algorithm>
  3. Also global homomorphisms are global, but the paper is concerned with UCT which is generally applied when global methods are too expensive
  4. “One interesting overall insight obtained herein is that the more computationally limited UCT is, the coarser are the
    best-performing abstractions.”
  5. “A homomorphism is a perfect abstraction in the sense that the induced MDP is equivalent to the original MDP for planning purposes.”
  6. Here, they just glob states together into clusters and use that set as an abstracted state (counts average returns apply to set as a whole)
  7. <OK, think I get the idea, not so relevant for now but may come back to it>

The Function Space of an Activity. Veeraraghavan, Chellapa, Roy-Chowdhury. CVPR 2006

  1. <Was cited as a paper looking for underlying dimension of activity, although I didnt get that>
  2. Based on time warping
  3. <Basically all the abstract:> “Different instances of the same activity may consist of varying relative speeds
    at which the various actions are executed, in addition to other intra- and inter- person variabilities. Most existing
    algorithms for activity recognition are not very robust to intra- and inter-personal changes of the same activity, and
    are extremely sensitive to warping of the temporal axis due to variations in speed profile. In this paper, we provide a
    systematic approach to learn the nature of such time warps while simultaneously allowing for the variations in descriptors for actions. For each activity we learn an ‘average’ sequence that we denote as the nominal activity trajectory. We also learn a function space of time warpings for each activity separately. The model can be used to learn individualspecific warping patterns so that it may also be used for activity based person identification.”
  4. Classically, approaches attempted to correct for viewpoint, or in skeletal structure, but not so much time
  5. Independent of particular features <but depends on them, naturally>
  6. If doing something like averaging, warping is necessary because the arm, for example can only be in one place and not multiple locations; warping allows for proper interpolation
  7. “The model is composed of a nominal activity trajectory and afunction space capturing the permissible activity specific warping transformations.”
  8. Time warping based on more formal methods than heuristics
  9. “Activity recognition is performed by minimizing the warping error between the nominal activity trajectory and
    the test sequence.”
  10. Most time warping algorithms are based on template matching as opposed to “a model where observed trajectories are viewed as a realization of a stochastic process.”
  11. “Template based recognition algorithms are very effective when the test sequence is one among those in the gallery. But they usually have very poor generalization power. Our algorithm has sufficient generalization power since we explicitly make the function space of an activity convex.”

Get every new post delivered to your Inbox.