Category Archives: Perception

One-shot learning by inverting a compositional causal process. Lake, Salakhutdinov, Tenenbaum. NIPS 2013

  1. Deals with one-shot learning
  2. “…a Hierarchical Bayesian model based on compositionality and causality that can learn a wide range of natural (although simple) visual concepts, generalizing in human-like ways from just one image.”
  3. In 1-shot learning did about as well as people, and better than deep learning methods
  4. People can learn a new concept from a tiny amount of data – can learn a class from one image (which is a very high dimensional piece of data)
  5. Even MNIST, which is an old and small dataset, still has 6k samples/class, but people often only need 1 example
  6. “Additionally, while classification has received most of the attention in machine learning, people can generalize in a variety of other ways after learning a new concept. Equipped with the concept “Segway” or a new handwritten character (Figure 1c), people can produce new examples, parse an object into its critical parts, and fill in a missing part of an image. While this flexibility highlights the richness of people’s concepts, suggesting they are much more than discriminative features or rules, there are reasons to suspect that such sophisticated concepts would be difficult if not impossible to learn from very sparse data. ”
    1. Looks like people both have a rich hypothesis space (because they can do all the above with a small amount of data), but also don’t overfit, which is the theoretical downside to having a large hypothesis class.  How do they do it?
  7. Here, focus is on handwritten characters
    1. Idea is to use something more natural and complex than simple synthetic stimuli, but something less complex than natural images
  8. Use a new omniglot dataset, which has 16k characters with 20 examples each
    1. Also has time data so the strokes are recorded as well
  9. “… this paper also introduces Hierarchical Bayesian Program Learning (HBPL), a model that exploits the principles of compositionality and causality to learn a wide range of simple visual concepts from just a single example.”
  10. Also use the method to generate new examples of a class, and then do a Turing test with it by asking other humans which was human generated and which was machine generated
  11. The HBPL “…is compositional because characters are represented as stochastic motor programs where primitive structure is shared and re-used across characters at multiple levels, including strokes and sub-strokes.”
  12. The model attempts to find a “structural description” that explains the image by breaking the character down into parts
  13. A Character is made of:
    1. A set of strokes
      1. Each stroke is made of simple sub-strokes modeled by a “uniform cubic b-spline” and is built of primitive motor elements that are defined by a 1st order Markov Process
    2. Set of spatial relationships between strokes, can be:
      1. Independent: A stroke that has a location independent of other strokes
      2. Start/end: A stroke that starts at beginning/end of another stroke
      3. Along: A stroke that starts somewhere along a previous stroke
  14. ” Each trajectory … is a deterministic function of a starting location … token-level control points … and token-level scale …. The control points and scale are noisy versions of their type-level counterparts…”
  15. Used 30 most common alphabets for training, and another 20 for evaluation.  The training set was used to learn hyperparameters, a set of 1000 primitive motor elements, and stroke placement.  They attempted to do cross-validation within the training set
  16. The full set of possible ways a stroke could be created is enormous, so they have a botto-up way of finding a set of the K most likely parses.  They approximate the posterior based on this finite, size-K sample based on their relative likelihoods
    1. They actually then use metropolis-hasting to get a number of samples of each parse with a little variance each to get a better estimate of the likelihoods
  17. “Given an approximate posterior for a particular image, the model can evaluate the posterior predictive score of a new image by re-fitting the token-level variables…”
  18. Results
  19. For the 1-shot tasks, a letter from an alphabet was presented with 20 other letters from the same alphabet.  Each person did this 10 times, but each time was with a totally new alphabet, so no characters was ever seen twice
  20. Get K=5 parses of each character presented (along with MCMC), and then run K gradient searches to reoptimize the token-level variables to fit the query image.
  21. They can also, however, attempt to reoptimize the query image to fit the 20 options presented
  22. Compare against:
    1. Affine model
    2. Deep Boltzmann Machines
    3. Hierarchical Deep Model
    4. Simple Strokes (a simplified HBPL)
    5. NN
  23. Humans and HBPL ~4.5% error rate, affine model next at 18.2%
  24. Then they did one-shot Turing test where people and algorithms had to copy a single query character
    1. <For what its worth, I think Affine looks better than both results from people and HBPL>
  25. In the “Turing test” there was feedback after each 10 trials, for a total of 50 trials
    1. <Note that this test doesn’t ask which character looks best, it is which is most confusable with human writing (which is pretty sloppy from the images they show).  I’m curious if the affine model could be made more human just by adding noise to its output>
  26. <Playing devil’s advocate, the images of characters were collected on mTurk, and look like they were probably drawn with a mouse — that is to say I feel they don’t look completely like natural handwriting.  I wonder how much of this program is picking up on those artifacts?  At least in terms of reproduction, the affine method looks best>

 

Science 2015

  1. “Concepts are represented as simple probabilistic programs—that is, probabilistic generative models expressed as structured procedures in an abstract description language (…). Our framework brings together three key ideas—compositionality, causality, and learning to learn—that have been separately influential in cognitive science and machine learning over the past several decades (…). As programs, rich concepts can be built “compositionally” from simpler primitives
  2. “In short, BPL can construct new programs by reusing the pieces of existing ones, capturing the causal and compositional properties of real-world generative processes operating on multiple scales.”
  3. <Looks like exactly the same paper, just more brief.  The accuracies of both BPL and other methods seems improved here, though.  Convnets get 13.5% error; BPL gets 3.3%; people get 4.5%.  “A deep Siamese convolutional network optimized for this one-shot learning task achieved 8.0% errors”>
  4. “BPL’s advantage points to the benefits of modeling the underlying causal process in learning concepts, a strategy different from the particular deep learning approaches examined here.”
    1. <Or equivalently you can just say BPL does better because it has a small and highly engineered hypothesis class>
  5. Also run BPL with various “lesions” and gets error rates in the teens.  Also did more poorly in the “Turing test” part
  6. Instead of training on 30 background alphabets, they also did with just 5, and there the error rates are about 4%; on the same set convnets did about 20% error
  7. Supplementary Material

  8. <I assumed that they would ask individuals who actually learned how to write the languages to do the recordings.  Instead, they just took pictures of characters and had people write them.  This seems like a problem to me because of inconsistencies in the way people would actually do the strokes of a letter in an alphabet they do not know.>
  9. <Indeed, they were also drawn by mouse in a box on a screen, which is a very unnatural way to do things>
  10. <From what I can tell the characters are recorded in pretty low resolution as well which looks like it can cause artifacts, looks like 105×105>
  11. <This basically has the details that were included in the main part of the NIPS paper>
  12. Some extra tricks like convolving with Gaussian filter, randomly flipping bits
  13. Primitives are scale-selective
  14. “For each image, the center of mass and range of the inked pixels was computed. Second, images were grouped by character, and a transformation (scaling and translation) was computed for each image so that its mean and range matched the group average.”
  15. ” In principle, generic MCMC algorithms such as the one explored in (66) can be used, but we have found this approach to be slow, prone to local minima, and poor at switching between different parses. Instead, inspired by the speed of human perception and approaches for faster inference in probabilistic programs (67), we explored bottom-up methods to compute a fast structural analysis and propose values of the latent variables in BPL. This produces a large set of possible motor programs – each approximately fit to the image of interest. The most promising motor programs are chosen and refined with continuous optimization and MCMC.”
  16. “A candidate parse is generated by taking a random walk on the character skeleton with a “pen,” visiting nodes until each edge has been traversed at least once. Since the parse space grows exponentially in the number of edges, biased random walks are necessary to explore the most interesting parts of the space for large characters. The random walker stochastically prefers actions A that minimize the local angle of the stroke trajectory around the decision point…”
  17. For the ANN they used cafe, and took a network that works well on MNIST
    1. <But it seems like this system doesn’t have any of the special engineering that went into this that deals specifically with strokes as opposed to whole images>
    2. “The raw data was resized to 28 x 28 pixels and each image was centered based on its center of mass as in MNIST. We tried seven different architectures varying in depth and layer size, and we reported the model that performed best on the one-shot learning task.”
    3. <This may make the task easier, but MNIST deals with a small number of characters, many of which are much less complex than some of the characters used here.   It might be the case that some of the more complex characters can’t be accurately reduced to such a small size, so this may be hobbling performance>
    4. Also the network is not very deep – only 2 conv layers and a max-pooling
    5. “One-shot classification was performed by computing image similarity through the feature representation in the 3000 unit hidden layer and using cosine similarity.”
    6. They used a smaller net for the 1-shot classification with less data, <so that was nice of them>
  18. The full “Siamese network” did work on the 105×105 image, had 4 conv layers and 1 standard hidden layer.  Parameters were optimized with Bayesian method
  19. “The Hierarchical Deep model is more “compositional” than the deep convnet, since learning-to-learn endows it with a library of high-level object parts (29). However, the model lacks a abstract causal knowledge of strokes, and its internal representation is quite different than an explicit motor program. “
  20. For data collection “The raw mouse trajectories contain jitter and discretization artifacts, and thus spline smoothing was applied.”
  21. <Ok, skipping the rest>

Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Rao, Ballard. NatNeuro 1999Q

  1. “We describe a model of visual processing in which feedback connections from a higher- to a lowerorder visual cortical area carry predictions of lower-level neural activities, whereas the feedforward connections carry the residual errors between the predictions and the actual lower-level activities. When exposed to natural images, a hierarchical network of model neurons implementing such a model developed simple-cell-like receptive fields. A subset of neurons responsible for carrying the residual errors showed endstopping and other extra-classical receptive-field effects. These results suggest that rather than being exclusively feedforward phenomena, nonclassical surround effects in the visual cortex may also result from cortico-cortical feedback as a consequence of the visual system using an efficient hierarchical strategy for encoding natural images.”
  2. “Why should a neuron that responds to a stimulus stop responding when the same stimulus extends beyond the classical RF?”
    1. Why is it that a neuron would be end-sensitive? (respond more if a line terminates in its receptive field than continuing through it)
  3. There are some existing explanations but they end up being very complex when they have to explain everything we know
  4. “Here we show simulations suggesting that extra-classical RF effects may result directly from predictive coding of natural images. The approach postulates that neural networks learn the statistical regularities of the natural world, signaling deviations from such regularities to higher processing centers. This reduces redundancy by removing the predictable, and hence redundant, components of the input signal. “
  5. “Because neighboring pixel intensities in natural images tend to be correlated, values near the image center can often be predicted from surrounding values. Thus, the raw image-intensity value at each pixel can be replaced by the difference between a center pixel value and its spatial prediction from a linear weighted sum of the surrounding values. This decorrelates (or whitens) the inputs17,19 and reduces output redundancy, providing a functional explanation for center–surround receptive fields in the retina and LGN. “
  6. This also means one color gives information about others nearby “Thus, the color-opponent (red – green) and blue – (red + green) channels in the retina might reflect predictive coding in the chromatic domain similar to that of the spatial and temporal domains”
  7. “Using a hierarchical model of predictive coding, we show that visual cortical neurons with extra-classical RF properties can be interpreted as residual error detectors”
  8. Each level in the hierarchy attempts to predict what is lower through feedback connections – this is used to correct the estimate
  9. Lower levels operate on smaller temporal and spatial scales, so receptive field expands as go up, until the top covers everything
  10. Idea is that natural images have properties that can be organized hierarchically, and this should build a model of it
  11. So the model is that higher areas report expected neural activity to lower areas, and lower areas report back up the residual
  12. Trained a 3-level system based on this on natural images seems like standard NN weight-activation scheme
  13. Updates are done to maximize posterior probability
  14. After training, the first level becomes sensitive to oriented edges or line segments (like Gabors), level 2 is a mixture of these
  15. A short bar elicits a strong residual while a long bar a small residual, because the latter are more common in natural images
  16. “The removal of feedback from level 2 to level 1 in the model caused previously endstopped neurons to continue to respond to bars of increasing lengths (Fig. 5a), supporting the hypothesis that predictive feedback is important in mediating endstopping in the level-1 model neurons. “
  17. “Our simulation results suggest that certain extra-classical RF effects could be an emergent property of the cortex using an efficient hierarchical and predictive strategy for encoding natural images.”
  18. “when the stimulus properties in a neuron’s receptive field match the stimulus properties in the surrounding region, little response is evoked from the error-detecting neurons because the ‘surround’ can predict the ‘center.’”
  19. “In anesthetized monkeys, inactivation of higher-order visual cortical areas disinhibits responses to surround stimuli in lower-area neurons”
  20. ” For example, some neurons in MT are suppressed when the direction of stimulus motion in the surrounding region matches that in the center of the classical RF6. This suggests a hierarchical predictive coding strategy for motion analogous to the one suggested here for image features.”
  21. “Certain neurons in the anterior inferotemporal (IT) cortex of alert behaving monkeys fire vigorously whenever a presented test stimulus does not match the item held in memory, though showing little or no response in the case of a match”

Different Spatial Scales of Shape Similarity Representation in Lateral and Ventral LOC. Drucker, Aguirre. Cerebral Cortex 2009

Uses the same stimulus as earlier study of Fourier basis and shape

  1. ” The results, indicating a coarse spatial coding of shape features in lateral LOC and a more focused coding of the entire shape space within ventral LOC, may be related to hierarchical
    models of object processing.”
  2. Small regions of cortex respond to different classes of images, and furthermore “… small regions of cortex contain populations capable of representing the entire space of images in a category”
  3. “A counterpoint to this apparent specialization has been the demonstration that information regarding object category is also contained in the distributed pattern of voxel responses across
    and between these specialized regions (…).”
  4. The results show there are also coarse-scale representations.  “This type of representation might
    correspond to the ‘‘chorus of fragments’’ model of Edelman and Intrator (1997), where individual properties of objects are represented by separate neural populations.”
  5. “Our focus here is upon the representation of variations in stimulus identity within a simplified object category… In practice, the structure of a parameterized space of shapes can be recovered from human behavioral responses (e.g., reaction times or similarity judgments) …”
    1. This similarity may also be reflected in neural patterns, which is what they check out here
  6. “Does a similar system of neural representation exist within human visual cortex? The human lateral occipital complex (LOC) shows similar functional properties to those previously ascribed to IT structures in the macaque. This region responds more strongly when a viewer is presented with images of parseable objects, as opposed to images that have no 2- or 3- dimensional interpretation, and appears largely indifferent to the method of object perception, for example, objects may be defined by luminance, texture, motion, or stereo difference”
    1. For IT and macaque, see this
    2. Ventral LOC is also called posterior fusiform sulcus (pFS)
  7. “Two recent studies have demonstrated a relationship between the perceptual similarity and the
    distributed pattern of neural activity in LOC (Op de Beeck et al. 2008; Haushofer et al. 2008),”
  8. During fMRI scanning, subjects viewed 16 different shapes defined by radial frequency components (RFCs; a series of sine waves of various frequencies describing perturbations from
    a circle; Zahn and Roskies 1972; Fig. 1)

    1. Idea of RFCs actually being used in shape recognition was eventually “experimentally rejected” but it makes convenient stimuli, also totally abstracted with no categorical boundaries
  9. Shapes were modified by altering amplitude and phase of a particular frequency component
  10. Here neural adaptation (depends on habituation) to shape is studied on neural level
  11. “We asked in this study if the degree of recovery from neural habituation at different cortical sites was proportional to the transition in similarity between 2 stimuli.”
  12. “In this study, we investigated if the distributed pattern of response can inform as to the identity
    of stimulus variation within an object category;”
  13. “Continuous Neural Adaptation in Ventral LOC Is Proportional to Shape Similarity”
    1. No adaptation effects found in lateral LOC
    2. Magnitude in change in shape matched linearly with change in ventral LOC
  14. “An alternative explanation for the proportional recovery from adaptation in ventral LOC is that the extreme stimuli (those from the corners of the stimulus space) may evoke a larger neural response generally (e.g., Kayaert et al. 2005). As the larger distance stimulus transitions tend to include these
    extreme stimuli to a greater extent, perhaps the apparent recovery from adaptation is actually a larger response to these extreme stimuli independent of an adaptation effect.”

    1. This is not the case, however, as the results show that “the proportional recovery from adaptation seen in ventral LOC indicates the presence of a population code for stimulus shape and cannot be attributed to a generally greater neural response to extreme stimuli.”
  15. “Distributed Pattern Responses Distinguish between Shapes”
  16. Use SVMs to analyze data at coarse spatial level, which worked well
  17. “The accuracy of the SVM analysis and the identified patch within lateral LOC indicates that the distributed voxel pattern of activity in that area carries information about shape.However, the pattern difference between shapes need not reflect the similarity of the stimuli or indeed have any particular structure. The SVM requires only that patterns be different in order to distinguish them—no assumptions about similarity structure are made or used.”
  18. “Within lateral LOC, the strongly discriminant responses seen in the SVM analysis were found to also reflect stimulus similarity consistently across subjects (t4 = 10.0, P = 0.001). In contrast,
    the distributed pattern of response in ventral LOC had a weaker correlation with the perceptual similarity of the stimuli (t4 = 1.2, P = 0.3) (Fig. 4A). The difference between these subregions of
    area LOC was significant (t4 = 11.4, P = 0.0003).”

    1. Mixed evidence for this being attributable strictly to retinotopic similarity
  19. “The RFC-Amplitude and RFC-Phase Axes Are Differentially Represented at Coarse and Fine Neural Scales”
  20. “Although the distributed neural similarity matrix measured from lateral LOC was strongly correlated with the stimulus similarity matrix, there appeared to be aspects of the structure
    of the neural response not evident in the stimulus matrix”
  21. Earlier studies on these shapes showed results that had phase and amplitude being recognized as orthogonal and equally important but that wasn’t completely replicated here.  Results here say the dimensions are “equally perceptually salient”, but that they are not perceived equivalently
  22. “…both aspects of the stimulus space [amplitude, phase] are represented by the within-voxel population code within ventral LOC… A rather different result was observed for the distributed
    pattern of response within lateral LOC. There, the distributed pattern across subjects reflected the shapes primarily in terms of RFC-amplitude but not RFC-phase”
  23. “For example, clusters of neurons might represent the tightness of the ‘‘knobs’’ of the shapes (defined by RFC-amplitude) independent of the direction that those knobs point within the overall shape (defined by RFC-phase). RFC-amplitude and RFC-phase may be taken as similar to ‘‘feature’’ and ‘‘envelope’’ parameters of Op  de Beeck et al. (2008), respectively; we thus contribute a similar finding in that features are represented in the distributed pattern in lateral LOC much more reliably than the overall shape envelope.”
  24. “Based upon the differential sensitivity to shape identity for the adaptation and distributed pattern methods, we argue that although both the lateral and ventral components of area LOC contain neural population codes for shape, the spatial scale of these representations differ. Specifically, the
    absence of a distributed pattern effect within ventral LOC is evidence for a homogeneous representation of the shape space, such that the average response of any one voxel does not differentiate between the shapes, whereas the presence of a distributed code and the absence of an
    adaptation effect in lateral LOC suggests that there is a heterogenous distribution of shape representation,”
  25. “Within ventral LOC, no meaningful tuning for the shape space can be identified: The amplitude of the response is no different for different shapes. This indicates that ventral LOC voxels are broadly tuned for shape identity. In contrast, lateral LOC voxels show relatively narrow tuning: there is a progressive decline in the response of a voxel for shapes more distant from the shape for which the voxel is best tuned (which was frequently a stimulus from the edges of the stimulus space). Moreover, lateral LOC voxels appear more narrowly tuned for the RFC-amplitude, as compared with the RFC-phase dimension of the shape space, consistent with our previous observation”
  26. “The narrow tuning observed in lateral LOC may also explain the absence of a linear adaptation response in this region to transitions in shape space. If a given voxel is narrowly tuned to a particular region of the shape space, then it may only show recovery from adaptation for stimulus transitions within its tuned area.”
  27. <Discussion>
  28. ” By using a continuous carryover design, our study was capable of examining neural similarity both on a coarse, across-voxel scale by distributed pattern analysis, as well as on a fine, within-voxel scale using continuous neural adaptation. We can thus compare the information provided at distributed and focal levels.”
  29. “Unlike ventral LOC, the lateral portion of LOC did not show adaptation responses that were linearly related to shape similarity. We found that the narrow tuning of lateral LOC voxels could explain this finding, indicating that each particular voxel has a population of neurons that are tuned to one specific region of the shape space. Consequently, most of the transitions between stimuli would not induce neural adaptation within the voxel as they would be transitions between stimuli not within the voxel’s receptive field.”

Perceptual Similarity of Shapes Generated from Fourier Descriptors. Cortese, Dyre. Journal of Experimental Psychology. 1996.

  1. A metric representation of shape is preserved by a Fourier analysis of the cumulative angular
    bend of a shape’s contour. Three experiments examined the relationship between variation in
    Fourier descriptors and judgments of perceptual shape similarity. Multidimensional scaling of
    similarity judgments resulted in highly ordered solutions for matrices of shapes generated by
    a Fourier synthesis of a few frequencies. Multiple regression analyses indicated that particular
    Fourier components best accounted for the recovered dimensions. In addition, variations in
    the amplitude and the phase of a given frequency, as well as the amplitudes of 2 different
    frequencies, produced independent effects on perceptual similarity. These results suggest that
    a Fourier representation is consistent with the perceptual similarity of shapes, at least for the
    relatively low-dimensional Fourier shapes considered.”
  2. Although many things are useful for object recognition (color, texture, etc) earlier work shows outline (contour) shape being the most important
  3. Mention approach for shape representation as having an alphabet of shape-piece prototypes that are then assembled – can be represented hierarchically or spatially in some other manner.
    1. Pinker was a proponent of this
  4. But there hasn’t been any real traction on this from a practical sense, as “The difficulty lies in representing he infinite variety of shapes with a small set of primitives. Typically, the parts are distinguished only by qualitative differences in shape. ”
    1. Idea of geons, codons, but they dont deal with metric variations which seems important
    2. Marr also had idea of some form of decomposition
  5. An alternative is to use a system that doesn’t involve parsing an object into parts
    1. Fourier descriptors is one system from computer vision
  6. ” In this method, given an arbitrary starting point on a closed contour, the function relating cumulative arc length to local contour orientation is expanded in a Fourier series ”
    1. Has some nice properties, including that global shape characteristics can be determined just by the first few low-frequency terms, also its basically invariant to starting point
  7. Fourier descriptors were used in early computer vision and have been considered in biological vision as well
  8. One study found “…hat approximately half of the visually responsive neurons in the inferior temporal cortex were selectively tuned to the frequency of FD stimuli ”
    1. “… all frequencies were about equally represented, except for a reduced incidence of the frequency 64 cycles per perimeter. ”  Fits werent quite linear but were still good
  9. “In the present experiments, we tested this prediction [that FDs are related to categorization] by obtaining ratings of perceived shape similarity and subjecting them to multidimensional scaling”
  10. “…if, on the one hand, perceived shape similarity is related to variation in the amplitude and phase parameters of the contour, then vectors representing these Fourier components should account for the dimensions of the recovered similarity space. If, on the other hand, qualitative
    stimulus attributes are used to represent shape (e.g., smoothness, number of parts, or orientation), then vectors representing these qualities should account for the majority of variability in similarity judgments. For this reason, we also obtained ratings on a number of unidimensional scales representing qualitative aspects of the stimuli.”
  11. “In Experiment 1, we varied the amplitude and the phase of a single FD frequency. A Fourier representation of shape would predict that the perceptual similarity space should reflect variation of these two parameters. Also, because of the independence of amplitude and phase in a Fourier representation, we made an additional prediction: The amplitude and the phase of a given FD frequency should show independent effects on perceived similarity.”
  12. Screen Shot 2015-01-13 at 1.22.45 PM
  13. Participants were shown 45 pairs, and were told to rate them for similarity on a numeric scale, and then after that they rated each shape on 7 independent numeric scales (width, straightness, smoothness, # of parts, complexity, symmetry, orientation) – these criteria were intended to be alternatives for doing classification
  14. MDS using euclidian distance on the similarity ratings – there was a sharp elbow with 2 dimensions
  15. Screen Shot 2015-01-13 at 1.32.06 PM
  16. This reproduces almost exactly the earlier figure (just rotated and flipped), “….which suggests that perceived dissimilarity is monotonically related to distance in a 2-D Euclidean space with, in this case, amplitude and phase as the two dimensions. Indeed, the relationship between distances in this space and perceived dissimilarities may be linear: A linear multidimensional-scaling analysis produced a 2-D solution with virtually the same pattern as that for the monotonic analysis”
  17. ” that the phase and the amplitude of Frequency 6 accounted for more variability in the judgments of similarity than did any of the unidimensional scales, with the exception of smoothness.”
  18. Experiment 2

  19. ” Fourier theory also predicts another pattern of effects on similarity judgments: the independence of amplitude values at different frequencies. The purpose of Experiment 2 was to test this prediction…”
  20. Based on MDS “it appears that the perceived dissimilarities of these shapes are monotonically related to distance in a Fourier space, with amplitude of frequency 2 and amplitude of frequency 4 as the two dimensions “
  21. “Fitted vectors for the amplitudes of the two frequency components were found to be orthogonal
    (angular difference = 88.8°, suggesting that there were independent perceptual effects of variations in amplitude on two different frequencies. This observation, along with the observed independence of amplitude and phase in Experiment 1, is consistent with a representation of shape based on FDs.”
  22. Variation in amplitude of freq 2 was highly correlated with judgements of “width” and freq 4 was with “smoothness”
  23. Experiment 3

  24. “Experiment 3 tested the effects of variation in the phases of two different frequencies on
    judgments of similarity. As in Experiments 1 and 2, this was an investigation of the perceptual effects of variation in two parameters of the Fourier expansion. However, unlike the previous experiments, the parameters manipulated in this experiment did not exhibit independent effects on the shape of the contour, because the relative phases, and not the absolute phases, determined the shape”
  25. Here stimuli were constructed from freqs of 4,6,8 cycles/perimiter, with amplitudes held constant.
    1. Phases of freqs 6, 8 were varied indep (need an extra freq of 4 around for the comparison to work)
  26. Here stress plot from MDS didn’t have a clear elbow, but plotting with 2 dimensions made items in a ‘U’ shape (a linear manifold), implying a one-dimensional solution – they are dependent
  27. “This relationship, taken together with the results of Experiments 1 and 2, which found a significant relationship between number of parts and amplitude of frequencies 4 and 6, suggests that object parsing may be related to the amplitude and the relative phase of frequencies in this range (4 to 8 cycles per perimeter). “
  28. “Of particular importance was the evidence found for independent perceptual effects for variations of amplitude and phase on a single frequency and for variations of amplitudes on two different frequencies. Both of these results predicted by a Fourier theory.”
Tagged , ,

Categorical Clustering of the Neural Representation of Color. Brouwer, Heeger. JNeuro 2013.

  1. fMRI study where subjects viewed 12 colors did either a color-naming or distractor task
  2. “A forward model was used to extract lower dimensional neural color spaces from the high-dimensional fMRI responses.”
  3. Vision areas of V4 and V01 showed clustering for color naming task but not for distractor
  4. “Response amplitudes and signal-to-noise ratios were higher in most visual cortical areas for color naming compared to diverted attention. But only in V4v and VO1 did the cortical representation
    of color change to a categorical color space”
  5. We can perceive thousands of colors but have only a handful of descriptive categories for colors, so we can see two different colors but would still potentially call it the same thing
  6. Inferotemporal cortex (IT) is believed to deal with categorization of color
  7. “…performing a categorization task alters the responses of individual color-selective neurons in macaque IT (…).”
  8. Similar colors cause overlapping patterns of neural activity, “… neural representations of color can be characterized by low-dimensional ‘neural color spaces’…”
  9. “Activity in visual cortex depends on task demands (…).”
    1. Use fMRI to study this
  10. “Forward model” is used to reduce fMRI signals to a lower dimensional space of bases
  11. ” Normal color vision was verified by use of the Ishihara plates (Ishihara, 1917) and a computerized version of the Farnsworth–Munsell 100 hue scoring test (Farnsworth, 1957).”
    1. <Need to learn about this>
  12. “The 12 stimulus colors were defined in DKL (Derrington, Krauskopf and Lennie) color space… We chose the DKL space because it represents a logical starting point to investigate the neural representation of color in visual cortex. Although there is evidence for additional higher-order color mechanisms in visual cortex (Krauskopf et al., 1986), the color tuning of neurons in V1 can be approximated by linear weighted sums of the two chromatic axes of DKL color space
    (Lennie et al., 1990).”
  13. Color categorization task was done outside the scanner, involved putting 64 colors into one of 5 categories
  14. When in the fMRI, there were two types of episodes.  In one, subjects had to press one of 5 buttons to categorize the color (R,G,B,Y, or purple).  Distractor task was a 2-back test (is color the same as the color 2 steps ago)
  15. <details on fMRI processing>
  16. Used the forward model from this paper.
  17. ” We characterized the color selectivity of each neuron as a weighted sum of six hypothetical
    channels, each with an idealized color tuning curve (or basis function) such that the transformation from stimulus color to channel outputs was one to one and invertible. Each basis function was a half-wave-rectified and squared sinusoid in DKL color space.”
  18. Assume voxel response is proportional to the number of responding neurons in that voxel
  19. Channel responses C (an n x c matrix where n was number of colors, and c # channels(6)).  Then did PCA on this to “extract neural color spaces from the high-dimensional space of voxel responses(…)”
  20. “According to the model, each color produces a unique pattern of responses in the channels, represented by a point in the six-dimensional channel space.  By fitting voxel responses to the forward model, we projected the voxel responses into this six dimensional subspace.”
    1. PCA <A competing method they previously used to do this analysis> did not work as well – had similar results but more variability because it tries to fit noise where the forward model throws it out
  21. To visualize the forward model, they ran PCA to project the 6D space to 2D (these 2 dimensions accounted for almost all the variance)
  22. “Reanalysis of the current data using PCA to reduce dimensionality directly from the number of voxels to two also yielded two-dimensional neural color spaces that were similar to those published previously. Specifically, the neural color spaces from areas V4v and VO1 were close to circular, whereas the neural color spaces of the remaining areas (including V1) were not circular, replicating our previously published results and supporting the previously published conclusions (Brouwer and Heeger, 2009).”
  23. Used many different clustering methods to see if colors labeled in the same color category had a more similar response than those in other categories
  24. On to results
  25. Subjects were pretty consistent where they put color class boundaries.  Blue and green were the most stable
  26. Subjects weren’t told category labels–basically that they were doing clustering–but still categories were intuitively identifiable and pretty stable
  27. Color clustering was strongest in V01 and V4v, during the color-naming task.  Responses from neighboring area V3 were more smoothly circular and therefore not as good at clustering.
  28. Screen Shot 2015-01-06 at 12.38.08 PM
  29. “The categorical clustering indices were significantly larger for color naming than diverted attention in all but one (V2) visual area (p 0.001, nonparametric randomization test), but the
    difference between color naming and diverted attention was significantly greater in VO1 relative to the other visual areas (p 0.01, nonparametric randomization test). One possibility is that all visual areas exhibited clustering of within-category colors, but that the categorical clustering indices were low in visual areas with fewer color-selective neurons, i.e., due to a lack of statistical power”
  30. “… no visual area exhibited categorical clustering significantly greater than baseline for the diverted attention task.”
  31. Manual clustering done by subjects matched that done from the neural data, aside from the fact that neurall turqoise/cyan matched with blues, whereas people matched it with greens
  32. “Hierarchical clustering in areas V4v and VO1 resembled the perceptual hierarchy of color categories”
    1. In V01 when doing color naming.
    2. The dendogram resulting from the distractor task looks pretty much like garbage
  33. <Shame on the editor.  Use SNR without defining the abbreviation – I assume its signal to noise ratio?>
  34. “Decoding accuracies from the current data set were similar; forward-model decoding
    and maximum-likelihood decoding and were nearly indistinguishable.”
  35. <Between this and the similarity of the result of PCA, what does their forward model buy you?  Is it good because it matches results and is *less* general?>
  36. “… we propose that some visual areas (e.g., V4v and VO1) implement an additional color-specific change in gain, such that the gain of each neuron changes as a function of its selectivity relative to the centers of the color categories (Fig. 8C). Specifically, neurons tuned to a color near the center of a color category are subjected to larger gain increases than neurons tuned to intermediate colors”
    1. <It is only shown that doing this in simulation helps clustering, which is in the neural data, but they don’t show that the neural data specifically supports this over other approaches>
  37. “Task-dependent modulations of activity are readily observed throughout visual cortex, associated with spatial attention, feature-based attention, perceptual decision making, and task structure (Kastner and Ungerleider, 2000; Treue, 2001; Corbetta and Shulman, 2002; Reynolds and Chelazzi, 2004; Jack et al., 2006; Maunsell and Treue, 2006; Reynolds and Heeger, 2009). These task-dependent modulations have been characterized as shifting baseline responses, amplifying gain and increasing SNR of stimulus-evoked responses, and/or narrowing tuning widths. The focus in the current study, however, was to characterize task-dependent changes in distributed neural representations, i.e., the joint encoding of a stimulus by activity in populations of neurons.”
  38. <Need to read all references in section “Categorical specificity of areas V4v and VO1”>
  39. Lots of results that show V4 and nearby areas respond to chromatic stimuli.  They have a previous paper (their one from 2009) that V4v and V01 better match perceptual experience of color than other regions, but there aren’t many results dealing with “… the neural representation of color categories, the representation of the unique hues, or the effect of task demands on these representations”
  40. Previous EEG studies show that the differences in EEG when looking at one color and then another “…  appear to be lateralized, providing support for the influence of language on color  categorization, the principle of linguistic relativity, or Whorfianism (Hill and Mannheim, 1992; Liu et al., 2009; Mo et al., 2011). Indeed, language-specific terminology influences preattentive color perception. The existence in Greek of two additional color terms, distinguishing light and dark blue, leads to faster perceptual discrimination of these colors and an increased visual mismatch negativity of the visually evoked potential in native speakers of Greek, compared to native speakers of English (Thierry et al., 2009).”
    1. Here however, no evidence of lateralized categorical clustering from fMRI
  41. Neural research on Macaques and color, but there are differences in brain structure and sensitivities in photoreceptors between them and us so we need to keep that in mind when examining the results from animal experiments on color
  42. “We proposed a model that explains the clustering of the neural color spaces from V4v and VO1, as well as the changes in response amplitudes (gain) and SNR observed in all visual areas. In this model, the categorical clustering observed in V4v and VO1 is attributed to a color-specific gain change, such that the gain of each neuron changes as a function of its selectivity relative to the centers of the color categories.”
Tagged ,

Toward a Universal Law of Generalization for Psychological Science. Shepard. Science 1987.

  1. “A psychological space is established for any set of stimuli by determining metric distances between the stimuli such that the probability that a response learned to any stimulus will generalize to any other is an invariant monotonic function of the distance between them.  To a good approximation, this probability of generalization (i) decays exponentially with this distance, and (ii) does so in accordance with one of two metrics, depending on the relation between the dimensions along which the stimuli vary.  These empirical principles are mathematically derivable from universal principles of natural kinds and probabilistic geometry that may, through evolutionary internalization, tend to govern the behaviors of all sentient organisms.”
  2. Psychology is about generalization, because nothing happens exactly the same way twice
    1. This idea though, is often left as a secondary topic in psychology.
    2. This generalization occurs according to some sort of metric
  3. Aristotle’s principle of association by resemblance goes back 2000 years, but this was only studied more formally at the beginning of the 1900s with Pavlov (the original whistle or bell caused a response, but he also tested other bells and whistles of differing levels of similarity)
  4. Since Pavlov, a common basis of experimentation was around “‘gradients of stimulus generalization’ relating the strength, probability, or speed of a learned response to some measure difference between each test stimulus and the original training stimulus.”
    1. Measuring this accurately began in ’56, when Guttman and Kalish examined Skinner’s work
    2. Author then expanded upon this by testing people in a passive noisy n to n association task, gradients were found when distributions for items in terms of their mapping were similar
  5. These gradients were originally defined in terms of hand-designed features (such as the wavelength of light emitted by each button in a set of buttons), but in some cases generalization was nomonotonic, (such as tones separated by an octave) or varied across individuals, species, and stimuli in differing ways
  6. Lashley, along with others like Robert R Bush and Frederick Mosteller felt like there was not going to be any invariant law of generalization
  7. The idea was, that instead of measuring things based on objective properties (such as the wavelength of the light) to do so according to how that physical parameter space maps to that individuals psychological space.
  8. More specifically, consider if there is “… an invariant monotonic function whose inverse will uniquely transform those data into numbers interpretable as distances is some appropriate metric space?… Thus, in a K-dimensional space, the distances between points within each subset of K+2 points must satisfy definite conditions…”
  9. The function must be unique based on the properties of the constraints set up: “Provided that the number, n, of points in a space is not too small relative to the number of dimensions of the space, teh rank order of teh n(n-1)/2 distances among those n points permits a close approximation to the distances themselves, up to multiplication by an arbitrary scale factor.”
  10. This unknown function can be determined by “nonmetric” multidimensional scaling.  “The plot of the generalization measures gij against the distances dij between points in the resulting configuration is interpreted as the gradient of generalization.  It is a psychological rather than psychophysical function because it can be determined in the absence of any physical measurements on the stimuli.”
  11. Basically the P matrix consists of how confusable pairs of stimuli are, and MDS is commonly done on a normalized version of that matrix
    1. Applying this to data from all sorts of experiments, even on different animals, yields basically the same exponential decay function.  This is not something that must fall out of MDS, but is in the data itself that MDS picks up on
  12. MDS will not impose monotonicity, so when MDS yielded something nonmonotonic, going up to higher dimensional representations has done the trick.
    1. Interesting discussion about what exactly it yields in terms of colors (for example, colors should be 2D so a circle can be formed connecting red and violet instead of putting them at opposite ends of a line), tones
  13. When you can define a reasonable metric (such as lightness, saturation in color) those are usually the closest thing to the MDS results.  Sometimes different metrics are needed though, such as Euclidian or Manhattan
  14. “Are these regularities of the decay of generalization in psychological space and of the implied metric of that space reflections of no more than arbitrary design features… Or do they have a deeper, more pervasive source?  I now outline a theory of generalization based on the idea that these regularities may be evolutionary accommodations to universal properties of the world.”
  15. Different organisms have different things they have to attend to in order to survive, and how they need to be able to distinguish between a particular stimulus varies.  This is from both evolutionary and individual perspectives.
  16. Assume psychological space is in some dimension K.   Color might be 3D in terms of lightness, hue, saturation
  17. The exponential law is derived from a set of assumptions about how an organism considers this feature space.
    1. All locations are equally probably
    2. Probability that the region (of the test stimulus) has a size s is based on density function p(s).  The way p(s) looks exactly doesn’t actually make much of a difference in practice, for reasonable distributions
    3. Region is convex and “centrally symmetric” <whats that>.  As is the case for the probability distribution, for the most part things are quite robust to the particular shape of the region
  18. The theory of generalization described “applies only to the highly idealized experiment in which generalization is tested immediately after a single learning trial with a novel stimulus.”  Empirical evidence from other test settings of either very long training times on very similar stimuli, or delayed test stimuli will lead to deviations from what is discussed here, which may happen in a few ways:
    1. Instead of exponential, an inflected Gaussian function
    2. “deviation away from rhombic and toward elliptical curves of equal generalization” <?>
  19. Brief discussion of how to extend the theory to deal with these cases (such as how to deal with sharply bounded “consequential regions”
  20. “We generalize from one situation to another not because we cannot tell the difference between the two situations but because we judge that they are likely to belong to a set of situations having the same consequence.”
  21. “probability of generalization approximates an exponential decay function of distance in psychological space”
  22. “to the degree that the spreads of consequential stimuli along orthogonal dimensions of that space tend to be correlated, psychological distances in that space approximate the Euclidian or non-Euclidian metrics associated, respectively with the L2- and L1- norms for that space.”
Tagged ,

Decoding and Reconstructing Color from Responses in Human Visual Cortex. Brouwer, Heeger. JNeuro 2009

  1. Tried to decode color from FMRI with “conventional pattern classification, a forward model of idealized color tuning, an d ” PCA
    1. The conventional classifier was able to match training data to colors, but the forward model was able to extrapolate to new colors
  2. Color was decoded accurately from:
    1. V1, V2, V3, V4, and V01
    2. But not L01, L02, V3A/B or MT+
  3. In V4 and V01, 1st 2 principcal components “revealed progression through perceptual color space” (closeness defined a special way)
    1. This similarity didn’t manifest itself anywhere else, even though classification may have been accurate, and classification was actually most accurate in V1, where this similarity effect didn’t manifest itself.
    2. “This dissociation implies a transformation from the color representation in V1 to reflect color space in V4 and V01.”
  4. There is color sensitivity throughout visual cortex
  5. Classification of visual information through fMRI has been done previously on object categories, hand gestures, and visual features
  6. <mostly skipping notes on materials and methods>
  7. Stimulus was a slowly drifting series of concentric rings<, actually a little unclear about this, the description of the colors, and motion are not clear to me>
  8. Classification was done through an 8-way (8 colors originally presented) classifier, not some means of regression
  9. “The first two principal components of the simulated cone-opponency responses revealed results similar to those observed in V1.”
  10. “The forward model assumed that each voxel contained a large number of color-selective neurons, each tuned to a different hue.” <There are more details>
  11. The cone-opponency model, however was worse at recreating a space that pushed all the colors apart, their forward model was successful at that, however
  12. Forward model not only allowed for decoding, but also reconstructing stimulus colors from test data
  13. <Skipping to discussion, running out of time>
  14. Mean voxel responses themselves did not reliably distinguish color
  15. Here saturation didnt vary, only hue varied
  16. “Obviously, the lack of progression in the early visual areas (in particular V1) should not be taken as an indication that these areas are colorblind…  An alternative model of color selectivity, based on cone-opponent tuning rather than hue tuning, reproduced many features of the non-circular and self-intersecting color space derived from teh V1 PCA scores”
  17. “…spatially distributed representations of color in V4 supported ‘interpolation’ to decode a stimulus color based on the responses to perceptually similar colors.”
  18. “Nonetheless, our results support the hypothesis that V4 and VO1 play a special role in color vision and the perception of
    unique hues…”

Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis. Ganguli, Sompolinsky. Annual Review of Neuroscience 2012.

  1. Considers ways to work with high dimensional data (and dimension reduction) with a particular focus on visual processing
  2. “… natural images are often sparse in the sense that if you view them in the wavelet domain (roughly as a superposition of the edges), only a very small number of K wavelet coefficients will have significant power, where K can be on the order of 20,000 for a 1-million-pixel image.”
    1. “… similarly, neuronal activity patterns that actually occur are often a highly restricted subset of all possible patterns (…) in the sense that they often lie along a low K-dimensional firing-rate space…”
  3. Here they consider random projections to do dimensionality reduction 
  4. “…compressed sensing… shows that the shadow can contain enough information to reconstruct the original image … as long as the original image is sparse enough.
  5. L1 minimization
  6. Random projections can be achieved with neurons
  7. “… CS [compressed sensing] and RPs [random projections] can provide a theoretical framework for understanding one of the most salient aspects of neuronal information processing: radical changes in the dimensionality, and sometimes sparsity, of neuronal representations, often within a single stage of synaptic transformation.”
  8. CS also deals with modeling high dimensional data, which is also something the brain has to deal with
  9. “CS provides mathematical guarantees that one can achieve perfect recovery with a number of measurements  M that is only slightly larger than K [the underlying basis dimension basically], as long as the M measurement vectors are sufficiently incoherent with respect to the sparsity domain (…). [next paragraph]  An important observation is that any set of measurement vectors, which are themselves random, will be incoherent with respect to any fixes sparsity domain.”
  10. Only M > O(K log(N/K)) measurements are needed to guarantee perfect reconstruction with high probability
  11. “… no measurement matrices and no reconstruction algorithm can yield sparse signal recovery with substantially fewer measurements [than is required with random projection]”
  12. L1 regularization solves a slightly different problem than what one would usually care for, but the original problem is intractable whereas L1 regularization is ok, and it gets very close to optimal results
  13. “… any projection that preserves the geometry of all K-sparse vectors allows one to reconstruct these vectors from the low-dimensional projection efficiently and robustly using L1 minimization”
  14. “The celebrated Johnson-Lindenstrauss (JL) lemma (…)  provides a striking answer [to how small we can the dimension until the distances in the projection become significantly distorted from that in the original high-dimensional representation]… It states that RPs with M > O(log P) will yield, with high probability, only a small distortion in distance between all pairs of points in the cloud.  Thus the number of projected dimensions M needs only be logarithmic in the number of points P independent of the embedding dimension of the source data, N.”
  15. For “…data distributed along a nonlinear K-dimensional manifold embedded in N-dimensional space… M > O(K log NC) RPs preserve the geometry of the manifold with small distortion.”  Where C represents curvature of the manifold
  16. Results mean that few RPs can be used to represent data accurately
  17. When noise exists, use LASSO
  18. “The main outcome is roughly that for an appropriate choice of λ, which depends on the signal-to-noise ratio (SNR), the same conditions that guaranteed exact recovery of K-sparse signals by L1 minimization in the absence of noise also ensure good performance of the LASSO for approximately sparse signals in the presence of noise.”
  19. Can be used for dictionary learning
  20. <discussion of a number of particular topics, such as MRI, gene expression analysis (and others), skipping>
  21. In the brain “information stored in a large number of neurons is often compressed into a small number of axons, or neurons in a downstream system.  For example, 1 million optic nerve fibers carry information about the activity of 100 times as many photoreceptors.”
  22. The JL Lemma shows that you only need a logarithmic number of projections in the number of classes you care about to maintain accuracy.  In the brain, “… 20,000 images can be represented by the corresponding population activity in the IT cortex. The the similarity structure between all pairs of images can be preserved to 10% precision in a downstream area using only ~1000 neurons.  Furthermore, this result can be achieved by a very simple dimensionality-reduction scheme, namely by a random synaptic connectivity matrix.”
  23. In the case where the points are continuous and not discrete, significant compression is possible when the points lie on a manifold.  In this case only a logarithmic number of neurons (in the number of neurons in the source area) are required.
  24. “The ubiquity of this low-dimensional structure in neuronal systems may be intimately related to the requirement of communication and computation through widespread anatomical bottlenecks.”
  25. “Another bottleneck is posed by the task of working memory, where streams of sensory inputs must presumably be stored within the dynamic reverberations of neuronal circuits.  This is a bottleneck from time into space.”  Temporally extended input streams must be represented in a finite number of neurons.  Recurrent networks allow for this sort of representation.
  26. Other work shows “…a connection between CS and short-term memory by showing that recurrent neuronal networks can essentially perform online, dynamical compressed sensing of an incoming sparse sequence, yielding sequence memory traces that are longer than the number of neurons, again in units of the intrinsic time constant.”
  27. In many cases, the goal is not to compress the signal, but instead to project it to a higher dimensional space. “For example, information in 1 million optic nerve fibers is expanded into more than 100 million primary visual cortical neurons.  Also in the cerebellum, a small number of mossy fibers target a large number of granule cells, creating a 100-fold expansion.”
  28. Can design ANNs that do either random dimension reduction or expansion, both require just 2-layers
  29. In general, you need more data than neurons in order to train a system.  If the data lies on a lower dimensional manifold you can get away with less
  30. For classification “A remarkable theoretical result is that if synapses are constrained to be either excitatory or inhibitory, then near capacity, the optimal solution is sparse, with most of the synapses silent (…) even if the input patterns themselves show no obvious sparse structure  This result has been proposed as a functional explanation for the abundance of silent synapses in the cerebellum and other brain areas.”
    1. If weights are unconstrained, optimal solutions are sparse, but not in the basis of neurons.  Instead it can be expressed as a linear combination of support vectors
  31. Because RPs basically maintain Euclidian distances they also maintain margins in the original space
  32.  <May not have finished this>
Tagged

Quantifying the Internal Structure of Categories Using a Neural Typicality Measure. Davis, Poldrack. Cerebral Cortex 2014

  1. Deals with the internal structure/representation of category information
  2. <Seems like assumption is there is something of an exemplar representation>
  3. “Internal structure refers to how the natural variability between-category members is coded so that we are able to determine which members are more typical or better examples of their category. Psychological categorization models offer tools for predicting internal structure and suggest that perceptions of typicality arise from similarities between the representations of category members in a psychological space.”
  4. Based on these models, develop a “neural typicality measure” that checks if a category member has a pattern of activation similar to other members of its group, as well as what is central to a neural space.
  5. Use an artificial categorization task, find a connection between stimulus and response
    1. “find that neural typicality in occipital and temporal regions is significantly correlated with subjects’ perceptions of typicality.”
  6. “The prefrontal cortex (PFC) is thought to represent behaviorally relevant aspects of categories such as
    rules associated with category membership (…). Motor and premotor regions may represent habitual responses associated with specific categories (…). The medial temporal lobe (MTL) and subregions of the striatum are thought to bind together aspects of category representations from these other systems.”
  7. Different areas and different neurons and patterns of activation in an area can “reliably discriminate
    between many real world object categories”
  8. Consider examples of category data as having some sort of “internal structure” or feature representation specific to that class.
    1. These features can say things like how typical a concrete example is, and is related to how quickly and accurately classification occurs
  9. “Depending on the specific model, a category representation may be a set of points associated with a given category (exemplar models; …), a summary statistic ( prototype models; …), or a set of statistics (clustering models; …) computed over points associated with a category.”
  10. Items closer to other examples in the class, or to the prototype are considered to be most typical or likely
  11. But they don’t propose that an accurate model is exactly the same thing a computer does, as there are examples of where nonintuitive things happen.
    1. Ex/ culture can influence how things are categorized, as can a current task or other context
  12. “Here, our goal is to develop a method for measuring the internal structure of neural category representations and test how it relates to physical and psychological measures of internal structure.”
  13. The neural typicality measure is related to nonparametric kernel density estimators, but “A key difference between our measure and related psychological and statistical models is that instead of using psychological or
    physical exemplar representations, our measure of neural typicality is computed over neural activation patterns…”
  14. Use a well studied research paradigm of categorizing simple bird illustrations into 4 categories based on neck angle and leg length.  Previous results show people reconstruct classes based on average item for each category
  15. “Our primary hypothesis is that psychological and neural measures of internal structure will be linked, without regard to where in the brain this might occur.”
    1. Also expect that some categorization will happen in visual cortex, and higher level temporal and medial-temporal regions, which “…. are theorized to bind together features from early visual regions into flexible conjunctive category representations (…).”
    2. There are other parts relevant to categorization, but not particularly this form of visual categorization, and other parts may be sensitive to things like entropy
  16. “To foreshadow the results, we find that neural typicality significantly correlates with subjects’ perceptions of typicality in early visual regions as well as regions of the temporal and medial temporal cortex. These results suggest that neural and psychological representational spaces are linked and validate the neural typicality measure as a useful tool for uncovering the aspects of category representations coded by specific brain regions.”
  17. “For analysis of behavioral responses, response time, and typicality ratings, a distance-to-the-bound variable was constructed that gave each stimulus’ overall distance from the boundaries that separate the categories in the stimulus space. Distance-to-the-bound is a useful measure of idealization: items that are distant from the bound are more idealized than items close to the bound (…).”
  18. “For the psychological typicality measure, a value for each of the Test Phase stimuli was generated by interpolating, on an individual subjects basis, a predicted typicality rating from the subjects’ observed typicality ratings…”
  19. Also did a physical typicality measure, which is pretty simple to understand (just neck angle, leg length measurements)
  20. Then a neural typicality <too much details to list here>
    1. “Our neural typicality measure is based on similarities between multivariate patterns of activation elicited for
      stimuli in the task. Stimuli that elicit activation patterns that are like other members of their category are more neurally typical than those that elicit dissimilar patterns of activation.”
  21. Subjects’ behavioral responses were predicted by SVM
  22. Typicality ratings were highly correlated with distance-to-the-bound
    1. Reveals that most typical items, and not the average item are the one that is used for category representation.  There are a few other results that show this is the case through other methodology
  23. Neural typicality is linked to psychological typicality
  24. Found activity in visual cortex and MTL that have been found to be linked to categorization
  25. “These results suggest that, in the present task, the internal structure of neural category representations in temporal and occipital regions are linked to subjects’ psychological category representations such that objects that are idealized or physical caricatures of their category elicit patterns of activation that are most (mathematically) similar to other members of their category.”
  26. “… in the present task, physical similarity is not a significant contributor to the internal structure of neural category representations, at least not at a level that is amenable to detection using fMRI.”
  27. Also did MDS for classification on the neural data, <results don’t seem amazing, but only ok>
  28. SVM for classification “The SVMs are given no information about the underlying stimulus space, and unlike
    the MDS analysis, do not make any assumptions about how the dimensions that separate the categories will be organized. Thus, the SVMs can be sensitive to regions that code rulebased or behavioral differences between categories, regions that encode information about their perceptual differences, or regions that code some combination of behavioral and perceptual information.”
  29. “Although there is strong overlap in the visual and MTL regions that discriminate between categories and represent
    internal structure, the motor/premotor, insula, and frontal regions were only identified in the between-category analysis. These results are consistent with the hypothesis that PFC and motor/premotor regions are more sensitive to behavioral aspects of categories (…). However, because behavioral responses are strongly associated with the perceptual characteristics of each category, the SVM results are also consistent with the hypothesis that these regions contain some perceptual information about the categories.”
  30. “The present research adds to the growing consensus that categorization depends on interactions between a number of
    different brain regions… An important point that this observation highlights is that there may not be any brain region that can be thought of representing all aspects of categories, and thus it might be most accurate to think of brain regions in terms of the aspects of category representations that they code.”
  31. “…in the present context, the deactivation of regions of the striatum with increasing typicality likely indicates an uncertainty signal, as opposed to category representation…”
  32. “Because our neural typicality measure is not based on mean activation-level differences between stimuli, it may be
    more directly interpretable and less susceptible to adjacency effects in studies of longer term internal category structure.”

    1. <Hm, should read their methodology more carefully on another read-through>
  33. They don’t have results that indicate suppression of adjacent stimulus
  34. Says their methodology should be tested in real-world, and more artificial settings
  35. Evidence of “dimensional selective attention” where not all features are attended to for classificaiton
    1. “Attentional mechanisms in the PFC that instantiate rule-based strategies (…) may contribute to selective attention effects by influencing neural representations in a top-down manner.”
    2. Although: “In the present context, dimensional selective attention is insufficient for explaining the idealization effect because dimensional selective attention affects an entire dimension uniformally… additional mechanisms are required.”
  36. “Attention has been found to create a spotlight around salient regions of visual space such that the processing of stimuli
    close to this location in space is enhanced (not just differences along a specific dimension of visual space; …). It is conceptually straightforward to predict that the same or similar spotlight mechanisms may affect the topography of stored neural stimulus representations, such that regions of a category space that contain highly idealized category members are enhanced and contribute more to categorization and typicality judgments than exemplars in ambiguous regions of category space.”
  37. Another model is one that specifically tries to “… to reduce prediction error and confusion between categories (…). In these models, category members are simultaneously pulled toward representations/members of their own categories and repelled by members of opposing categories.”
    1. But this doesn’t seem to be a possible explanation here because “… the neural effects as actual neuronal changes in regions of early visual cortex happen on a much longer scale than our task.”
  38. This study only tried to find correlation between “psychological” and “neurological” responses, but more in-depth exploration of their relationship is a good idea and left for future work
  39. “Our task involves learning to distinguish multiple categories, akin to A/B tasks, and so our finding that early visual cortex is involved with representing category structure may be at odds with theories emphasizing the role of task demands (as opposed to featural qualities) in determining which perceptual regions will be recruited to represent categories.”
    1. Although these distinctions may be an artifact of the type of analysis used

Motor Effort Alters Changes of Mind in Sensorimotor Decision Making. Burk, Ingram, Franklin, Shadlen, Wolpert. PLOS One 2014.

  1. Studies when people change decisions after already committing to an action, and can even happen in situations where the stimulus provided is removed once movement starts (that means that the change takes place after the stimulus is already removed)
  2. Looks at the threshold where decisions change, and here proposes that it is linked to the physical effort associated with the movement (and how far the first target is from the second)
  3. Based on drift-diffusion model
  4. Change in time between stimulus removal and movement change are usually on the order of 400ms (because its removed after motion starts, its removal can’t impact the original motion, but is processed once it starts)
  5. “Fits of the model showed that the change of mind bound did not require as much information as for the initial decision and also that not all the information in the processing pipeline was used, that is there was a limited time for which new information was processed.”
  6. Random dot motion test, with a yoke that had to be moved to one of two positions to indicate motion.  Once motion stopped, stimulus was extinguished
  7. Model holds that there is two decision boundaries: one to start the initial motion, and another to cause change in motion to second target (the model has accumulation going after stimulus until a timeout, or a change in motion, whichever occurs first)
  8. Changes of mind were most common when the motion data was weak, and motion initiated in the wrong direction (in most cases, the changes lead to the correct choice being made)
  9. The 400ms measured is consistent with previous studies on humans and monkeys (neural recordings in the monkey have about “200 ms latency to the start of evidence accumulation […] and latency from the signature of decision termination to the initiation of the behavioral response. (~70 ms for saccades […] and ~170 ms for reaches[…])”
  10. 3 of 4 subjects reduced rate of direction change as the angular separation (and therfore end distance) of targets increased
  11. One model holds that there is different populations representing the left and right choice (as opposed to just one) and that there is “… a race between two diffusion mechanisms […].  This implies that processing in the post-initiation period may not begin at the termination bound for the initial choice, but at a more intermediate value achieved by the losing mechanism.”
    1. A model for DDM thats not just 2AFC
Tagged ,