Atomic Components of Thought. Anderson. Book, 1998.


The ActR (theory of cognition) book. Each chapter is written by a different author

Chapter 1, Introduction: John R. Anderson, Christian Lebiere

  1. The 50s and 60s was based on a divide and conquer approach to cognition, focusing on very small, very controlled questions/studies, each with its own set paradigms and logic
  2. At the 1972 Cargenie Symposium, Allen Newell criticized this approach with a paper called “You can’t play 20 questions with nature and win.”  His claim that each little branch of research in cognition was producing tons of papers, but they weren’t pushing forward the understanding of cognition in a meaningful way “Thus, far from providing the rungs of a ladder by which psychology gradually climbs to clarity, this form of conceptual structure leads rather to an ever increasing pile of issues, which we weary of or become diverted from, but never really settle.”
    1. Was interested in unified theories
    2. He viewed cognition as an ongoing process which is subject to many constraints, and the goal was to figure out how they all worked together
  3. He also proposed a production system theory of cognition which was to be general
    1. Each production being a condition and action
    2. Has data structures
  4. He explored other paradigms, one called Soar
  5. Although the production paradigm needed a long incubation period, it can now explain phenomena in a way that is competitive with narrow models (presumably the upside here is that it is general)
  6. Active production systems theories are: ACT-R, 3CAPS, EPIC, Soar, as well as something similar called the construction-integration theory.  “The combined accomplishments of these theories is nothing less than staggering.”
  7. Newell said that these advances “set the stage for a new level of theory.”
  8. ACTR is easy to learn, cognitive models are natural to develop, and the models tend to be accurate
  9. Brief sketch – ACTR contains:
    1. theory of nature of knowledge
    2. theory of how this knowledge is deployed
    3. theory of how this knowledge is acquired
  10. Breaks memory into declarative and procedural
    1. Declarative knowledge is broken into chunks
  11. Its a form of state machine, with a pretty fine granularity in the type of instructions: “… only a very limited amount of processing can be accomplished in these steps.  The decision about what the steps are carry substantial psychological significance.  At each step, we can potentially record some action such as an utterance, a keystroke, or an eye movement.”
  12. Point out that since a larger task is composed of many small pieces, failures in those pieces can explain failures in the entire operation in different ways.
  13. Each of those pieces is intended to be a basic step in cognition
  14. Its a stack based model, where the end goal is the last item to be popped, and the top of the stack is the current piece of work being undertaken
    1. Can also add new work items to the stack
  15. There are 3 memories
    1. A goal stack – encodes hierarchy of intentions
    2. Procedural memory – production rules
    3. Declarative memory – containing chunks [of knowledge?]
  16. All 3 are organized through the current goal, representing the goal of attention
  17. There may be a conflict resolution step when a number of appropriate productions(which all may address the current goal) are available and one needs to be chosen
  18. A production can
    1. Change the outside world
    2. Change the current goal (by modifying the stack)
    3. Retrieve items from declarative memory
  19. “Chunks enter declarative memory either as popped goals, reflecting the solutions to past problems, or as perceptions from the environment.”
  20. “Productions are created from declarative chunks through a process called production compilation.
  21. Although its largely symbolic, there are bits that are intended to result in “neuron-like activation processes…”  that impact the speed and success rate of operations
    1. These values can be tuned based on environmental factors (learning, practice)
  22. There is a “no-magic” doctrine of Act-R which has 6 components
    1. Experimentally Grounded: tries to make the system amenable to grounding (embodiment) in the same way people are, based on their perceptual interaction with the environment.  There are even extensions to add interfaces for vision, audition, speech etc…
    2. Detailed and Precise Accounting of Data: because it act-R’s success is based on empirical modelling of phenomena, and not a theory, it can’t gloss over details (even latency must be specified)
    3. Learnable Through Experience: Although it is possible (and reasonable) to pre-program information to ACT-R, another goal is to have everything be learnable by experience
    4. Capable of Dealing with Complex Cognitive Phenomena: has a history of dealing with this, such as making models for intelligent tutors.  “This ability to model complex cognition is one of the great strengths of production systems over modeling formalisms.”  It has been used for learning the process for solving physics problems and how to design and interpret experiments to test a theory
    5. Principled Parameters: ACT-R has a large number of parameters at the low-level.  In this book an effort is made not to let parameter values jump around for each experiment.
    6. Neurally Plausible:  Although making a true brain model is too expensive, one of the intentions for ACTR is that “correspondences could be made between ACT and the brain.”  A future model, ACT-RN is intended to make a neural mapping clearer, as the production system isn’t so amenable to that as-is

Chapter 2, Knowledge Representation: John R. Anderson, Christian Lebiere

  1. Assumptions in ACT-R can be described in a 2x2x2 table, being <performance, learning> x <symbol, subsymbolic> x <declarative, procedural>.
  2. This chapter deals mostly with “… the procedural-declarative distinction, which is the most fundamental assumption in the ACT-R theory…”
  3. The declarative/procudral distinction was something not taken seriously in either AI or cognitive psychology at the time of the creation of ACT theory in the 70s
  4. Discussion of chunks – they aren’t general, for example they don’t cover addition in general, but rather there is a different fact/chunk for each pair of addends and its sum (ex/ 3+4=7, words)
  5. Thy have isa properties, seems like a struct with an arbitrary set of properties, where each property has a name (such as addend1) and value (3)
  6. Chunks are either an encoding of a goal or an encoding of objects in the environment
  7. They claim that an “addition fact” is a goal, and the addends are arguments
  8. Another example isa (goal) comprehension chunk, “Proposition 1”:
    1. isa COMPREHENSION-GOAL
    2. relation Give
    3. agent Mary
    4. object Fido
    5. recipient John
  9. “The claim about the origins of declarative knowledge has strong connections to well-worn philosophical issues about the origins of knowledge (…).  The chunks that come from the environment represent the empiricist claim that that knowledge originates from experience.  The chunks that are popped goals represent the rationalist claim that people construct their own knowledge.  As often is the case, a good scientific theory proclaims both sides of a philosophical controversy to be right.”
  10. Now to productions.  The basic structure of a production is:
    1. goal condition + chunk retrieval -> goal transformations
    2. “This ‘goal condition’ involves some tests on the goal state.  If these tests succeed and the production is chosen, then the retrieval is performed.  The retrieval involves the matching of one or more chunk patterns to declarative memory in order to retrieve information.  On the basis of the goal and the retrieved information, ACT-R then makes some transformations to the goal state.”
  11. Productions are usually set to fire anywhere from 50msec to 1sec
  12. “Lest this seem an overly serial image of cognition, it should be noted that at the subsymbolic level, millions of parallel computations are taking place to support this level of cognition.”
  13. There are four major claims associated with the use of production rules:
    1. Modularity: “… production rules are the units in which procedural knowledge is acquired and deployed.”
  14. Previous versions allowed production rules to be more complex in how much data each could operate on and how (also more complex data such as lists were allowed).  Newer versions have introduced more restrictions to make each production rule more plausible as a single cognitive atom.  These changes also lead to more accurate modelling of cognitive phenomena by the system
  15.  “… each cycle has the structure of:  Decide on a production (on basis of a global context) , retrieve (on specifications from the production), and modify the goal structure (using information retrieved).”
  16. Although it is a serial system, multitasking can be simulated in basically the same manner that task switching happens in computers
    1. They also point out again that subtask processing can be parallel
  17. They admit to being slightly unhappy with the fact that goals are always retained perfectly in order (when do we ever do anything perfectly)
  18. There are basically 6 (3×2) different production types (although technically there can be more)
    1. No Change (no stack change or goal modification): In order for this to have a purpose, the production would involve performing some action on the external world, so some state change occurred, although that state is external to the agent
    2. Goal Elaboration (no stack change, goal modification): Maintains the stack, as well as basic goal, but causes a refinement in some part of the goal (for example reading a variable and putting it into a variable slot of the goal struct
    3. Side Effect (push on stack, no goal modification): Basically push a new goal onto the stack that will accomplish #1 in this list, or to cause some other cognitive change in the agent
    4. Return Result (push on stack, goal modification): Pushes a new goal, but also modifies the previous goal so that the same production rule doesn’t repeat (for example, adding individual digits in a long addition problem and carrying results)
    5. Pop Unchanged (pop stack, no goal modification): Pop when a goal is met
    6. Pop Changed (pop stack, goal modification): Modify goal before popping
  19. Tower of Hanoi example: “A number of other researchers (e.g. Egan & Greeno, 1974; Karat 1982) have found that subjects will use hierarchical goal structures to solve the Tower of Hanoi problem.”
  20. Trace of execution shows expected increase and decrease of size of stack as subgoals are recognized that need to be done, and are later done.  “To implement this strategy, one needs to be able to stack a good number of goals.  Specifically, in the preceding case, one needs to hold a plan like ‘move 1 to B in order to move 2 to C in order to move 3 to B in order to move 4 to C.'”
  21. Actual results and ACT-R results in terms of latency of each move are surprisingly similar
  22. <The example is quite in depth and spans a number of pages>

Chapter 3, Performance: John R. Anderson, Christian Lebiere, Marsha Lovett

  1. Can measure performance based on quality of response, as well as how long it takes to produce that response
  2. <From my perspective, the argument about latency is not so compelling, wont be taking notes on that much.  There are some things we have no idea how to get a computer to do some things that are trivial for people even with unconstrained modelling and not timing constraints.  Some things we can have a computer do but it takes much longer.>
  3. Work occurs at 3 levels in a hierarchy:
    1. Goal and production rules
    2. Conflict resolution
    3. Chunk retrieval
  4. The only test for selecting from production rules is to find those that satisfy the current goal
    1. If there are conflicts, they are sorted by their “active gain,” which is the probability of that production satisfying the goal*goal value – cost
    2. This metric helps preserve the speed-accuracy tradeoff, where more time will be spent on more reliable methods that are costly when the benefit of solving the task (or penalty of failure) is high
    3. Once that production is popped from the priority queue, it may fail because retrieval of a chunk required fails (part of the model). <Can it fail for other reasons?>  If so failure occurs, moves onto the next production
    4. “If no production is found with positive expected utility, the goal is popped with failure.”
  5. Parallelization occurs when finding matching productions, as well as their ordering.  Executing each one based on score is done serially though
  6. Similarly, access of chunks matching productions is done in parallel.  “… this is the kind of parallelism that we think can be supported by the nervous system.”
  7. If failure is returned by the production, the goal can be modified so that subgoal isn’t called again (as it may loop failure otherwise)
  8. The recognize that the three variables (cost, probability of success, and reward on success) are parameters that must be selected somehow, which is a recognized issue
  9. The probability of success of a production is the joint probability of the probability of success of the production itself (that it itself won’t fail) along with the probability that the production will lead to accomplishment of the goal
  10. Similarly, the cost is the sum of the estimate of the cost of the production itself, plus the cost of future productions until the goal is achieved
  11. 4 values in previous 2 points (9 and 10) are  supposed to be based on experience
  12. “The setting of the initial value of G [the value of achieving the goal] for a task goal is a topic about which ACT-R has little to say.  When an experimenter tells a subject something is worth a dollar or a penny, how does this convert to an inter value of G (which is thought to be measured in time units)?.. In practice, the parameter G may need to be estimated in fitting the data.”
  13. Once the task goal is set, however, the other values of subtasks can be set based on discounting of the overall goal value based on likelihood of success and cost of the subtask
  14. The particular decomposition used makes the system sensitive to a number of different values relative to goal resolution
  15. Soft-max noise is added to the system that ranks competing productions so that the system isn’t deterministic
    1. This leads to probability matching behavior in ACT-R, which is what people often do
  16. Results in experiments where external reward values change, there are characteristic changes in behavior that are also reflected in behavior when G (a somewhat analogous value, although not entirely based on point 12 above) is changed in ACT-R
  17. Chunks are activated based on a base activation amount, and then a weighted sum of “slot values” (features?) of the chunk
    1. The base activation is based on how frequently and often the  chunk is activated
    2. The other components of activation “reflect the amount of attention given to elements of the goal.”
  18. The way it is set up in total all slots have 1 unit of weight for activiation in a chunk. Therefore, as the number of slots goes up, the weight/slot goes down (although it seems like weighing does not have to be uniform)
    1. “Thus, there are important consequences to what goal slots are filled-whatever slots are filled become sources of activation, and if extra slots are filled, the take source activation away from others.”
  19. The other value in the weighted sum (we just discussed weights, now the base values) are estimates of how often a particular chunk was needed when each slot was an element of the goal
  20. The initial base activation of a chunk starts according to some normal distribution, with a value that decays logarithmically over time if not used
  21. Then there is extra additional noise in the system (there are actually 2 sources of noise to have model more closely reflect empirical results – basically they are for different temporal scales).
  22. When chunks fall below a certain activation threshold they can no longer be retrieved – due to noise if they are close to this threshold they may be accessible on one query and then not on the next
    1. The probability of retrieval is sigmoidal based on the noise
    2. It is the exponential distribution
  23. More results that match nicely – in this case its a question of recall in a task where items are learns and then not reviews over a period of days until testing
  24. Errors in ACT-R can arise from omission (a chunk drops below activation threshold, but is needed), as well as comission, where a chunk is retrieved that only partially matches a production
  25. In many cases, there are no chunks that match exactly, and the system has to go with the best matching chunk available
    1. “ACT-R subtracts from the activation level of an item an amount that reflects its degree of mismatch.  ACT-R restricts the chunks it can partially match to those of the same type, as specified in the production condition.”
  26. Chunk retrieval is production specific, as chunks are selected (partially) based on the degree that they match the current production
  27.  Again, chunk selection <I suppose those found above threshold?> are selected based on soft-max
    1. This system is used to model addition of small numbers, where the assumption is that the individual only knows facts about very small numbers (so the exact chunk you are looking for – such as 4+5 may not exist).  Predictions of model are characteristically similar to empirical results based on experiments of young children (only 2 parameters had to be estimated)
  28. Time is exponentially distributed based on the match score and strength (thresholding prevents extremely long retrieval times as those items are filtered out)
  29. Compare results on something called “The Fan Experiment”, where latency increases with the number of facts learned.  Don’t show any results – just the setup?   Skipping.

Chapter 3, Learning: John R. Anderson, Christian Lebiere

  1. ACT-R can learn new information by creating new chunks from the environment (based on simulated perception), or “… in the action side of production rules.”
  2. ACT-R doesnt recreate chunks, but it can merge in new information; everytime it tries to make a duplicate chunk it simply strengthens the existing one
  3. The only other system that seriously considers is SOAR
  4. In ACT-R knowledge can transition from procedural to declarative
  5. <Starting to lose it – am needing more energy to keep track of whats going on than I have available to invest>
  6. ACT-R learns base-level activation of chunks, as well as strength of associations between chunks.
    1. “The activation of a chunk is taken to reflect the log posterior odds that the chunk will be needed (needed means ‘match to a production in the next cycle’), given the current context (where the current context is defined by the elements of the goal).”
  7. “Odds of recall and latency should be [are] power functions of delay [since use], which is the common empirical result known as the Power Law of Forgetting …  the Base-Level Learning Equation also predicts the Power Law of Learning…”
  8. As facts are repeated they can be retrieved more reliably
  9. Results presented on a letter-number addition task, latency predicted versus actual <is sort of shockingly accurate, but again I think other things may be more important to look at than latency (such as % correct)?>
  10. Can swap computation of an answer for direct retrieval when a chunk is used often enough (ex/counting to add vs remembering the addition table)
  11. The probability estimates of production rule success and cost are just based on sample means

<That seems to cover the fundamental stuff.  There are many other chapters, and of particular interest is the chapter on choice, reading that only very quickly>

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: