Category Archives: Clustering

Categorical Clustering of the Neural Representation of Color. Brouwer, Heeger. JNeuro 2013.

  1. fMRI study where subjects viewed 12 colors did either a color-naming or distractor task
  2. “A forward model was used to extract lower dimensional neural color spaces from the high-dimensional fMRI responses.”
  3. Vision areas of V4 and V01 showed clustering for color naming task but not for distractor
  4. “Response amplitudes and signal-to-noise ratios were higher in most visual cortical areas for color naming compared to diverted attention. But only in V4v and VO1 did the cortical representation
    of color change to a categorical color space”
  5. We can perceive thousands of colors but have only a handful of descriptive categories for colors, so we can see two different colors but would still potentially call it the same thing
  6. Inferotemporal cortex (IT) is believed to deal with categorization of color
  7. “…performing a categorization task alters the responses of individual color-selective neurons in macaque IT (…).”
  8. Similar colors cause overlapping patterns of neural activity, “… neural representations of color can be characterized by low-dimensional ‘neural color spaces’…”
  9. “Activity in visual cortex depends on task demands (…).”
    1. Use fMRI to study this
  10. “Forward model” is used to reduce fMRI signals to a lower dimensional space of bases
  11. ” Normal color vision was verified by use of the Ishihara plates (Ishihara, 1917) and a computerized version of the Farnsworth–Munsell 100 hue scoring test (Farnsworth, 1957).”
    1. <Need to learn about this>
  12. “The 12 stimulus colors were defined in DKL (Derrington, Krauskopf and Lennie) color space… We chose the DKL space because it represents a logical starting point to investigate the neural representation of color in visual cortex. Although there is evidence for additional higher-order color mechanisms in visual cortex (Krauskopf et al., 1986), the color tuning of neurons in V1 can be approximated by linear weighted sums of the two chromatic axes of DKL color space
    (Lennie et al., 1990).”
  13. Color categorization task was done outside the scanner, involved putting 64 colors into one of 5 categories
  14. When in the fMRI, there were two types of episodes.  In one, subjects had to press one of 5 buttons to categorize the color (R,G,B,Y, or purple).  Distractor task was a 2-back test (is color the same as the color 2 steps ago)
  15. <details on fMRI processing>
  16. Used the forward model from this paper.
  17. ” We characterized the color selectivity of each neuron as a weighted sum of six hypothetical
    channels, each with an idealized color tuning curve (or basis function) such that the transformation from stimulus color to channel outputs was one to one and invertible. Each basis function was a half-wave-rectified and squared sinusoid in DKL color space.”
  18. Assume voxel response is proportional to the number of responding neurons in that voxel
  19. Channel responses C (an n x c matrix where n was number of colors, and c # channels(6)).  Then did PCA on this to “extract neural color spaces from the high-dimensional space of voxel responses(…)”
  20. “According to the model, each color produces a unique pattern of responses in the channels, represented by a point in the six-dimensional channel space.  By fitting voxel responses to the forward model, we projected the voxel responses into this six dimensional subspace.”
    1. PCA <A competing method they previously used to do this analysis> did not work as well – had similar results but more variability because it tries to fit noise where the forward model throws it out
  21. To visualize the forward model, they ran PCA to project the 6D space to 2D (these 2 dimensions accounted for almost all the variance)
  22. “Reanalysis of the current data using PCA to reduce dimensionality directly from the number of voxels to two also yielded two-dimensional neural color spaces that were similar to those published previously. Specifically, the neural color spaces from areas V4v and VO1 were close to circular, whereas the neural color spaces of the remaining areas (including V1) were not circular, replicating our previously published results and supporting the previously published conclusions (Brouwer and Heeger, 2009).”
  23. Used many different clustering methods to see if colors labeled in the same color category had a more similar response than those in other categories
  24. On to results
  25. Subjects were pretty consistent where they put color class boundaries.  Blue and green were the most stable
  26. Subjects weren’t told category labels–basically that they were doing clustering–but still categories were intuitively identifiable and pretty stable
  27. Color clustering was strongest in V01 and V4v, during the color-naming task.  Responses from neighboring area V3 were more smoothly circular and therefore not as good at clustering.
  28. Screen Shot 2015-01-06 at 12.38.08 PM
  29. “The categorical clustering indices were significantly larger for color naming than diverted attention in all but one (V2) visual area (p 0.001, nonparametric randomization test), but the
    difference between color naming and diverted attention was significantly greater in VO1 relative to the other visual areas (p 0.01, nonparametric randomization test). One possibility is that all visual areas exhibited clustering of within-category colors, but that the categorical clustering indices were low in visual areas with fewer color-selective neurons, i.e., due to a lack of statistical power”
  30. “… no visual area exhibited categorical clustering significantly greater than baseline for the diverted attention task.”
  31. Manual clustering done by subjects matched that done from the neural data, aside from the fact that neurall turqoise/cyan matched with blues, whereas people matched it with greens
  32. “Hierarchical clustering in areas V4v and VO1 resembled the perceptual hierarchy of color categories”
    1. In V01 when doing color naming.
    2. The dendogram resulting from the distractor task looks pretty much like garbage
  33. <Shame on the editor.  Use SNR without defining the abbreviation – I assume its signal to noise ratio?>
  34. “Decoding accuracies from the current data set were similar; forward-model decoding
    and maximum-likelihood decoding and were nearly indistinguishable.”
  35. <Between this and the similarity of the result of PCA, what does their forward model buy you?  Is it good because it matches results and is *less* general?>
  36. “… we propose that some visual areas (e.g., V4v and VO1) implement an additional color-specific change in gain, such that the gain of each neuron changes as a function of its selectivity relative to the centers of the color categories (Fig. 8C). Specifically, neurons tuned to a color near the center of a color category are subjected to larger gain increases than neurons tuned to intermediate colors”
    1. <It is only shown that doing this in simulation helps clustering, which is in the neural data, but they don’t show that the neural data specifically supports this over other approaches>
  37. “Task-dependent modulations of activity are readily observed throughout visual cortex, associated with spatial attention, feature-based attention, perceptual decision making, and task structure (Kastner and Ungerleider, 2000; Treue, 2001; Corbetta and Shulman, 2002; Reynolds and Chelazzi, 2004; Jack et al., 2006; Maunsell and Treue, 2006; Reynolds and Heeger, 2009). These task-dependent modulations have been characterized as shifting baseline responses, amplifying gain and increasing SNR of stimulus-evoked responses, and/or narrowing tuning widths. The focus in the current study, however, was to characterize task-dependent changes in distributed neural representations, i.e., the joint encoding of a stimulus by activity in populations of neurons.”
  38. <Need to read all references in section “Categorical specificity of areas V4v and VO1”>
  39. Lots of results that show V4 and nearby areas respond to chromatic stimuli.  They have a previous paper (their one from 2009) that V4v and V01 better match perceptual experience of color than other regions, but there aren’t many results dealing with “… the neural representation of color categories, the representation of the unique hues, or the effect of task demands on these representations”
  40. Previous EEG studies show that the differences in EEG when looking at one color and then another “…  appear to be lateralized, providing support for the influence of language on color  categorization, the principle of linguistic relativity, or Whorfianism (Hill and Mannheim, 1992; Liu et al., 2009; Mo et al., 2011). Indeed, language-specific terminology influences preattentive color perception. The existence in Greek of two additional color terms, distinguishing light and dark blue, leads to faster perceptual discrimination of these colors and an increased visual mismatch negativity of the visually evoked potential in native speakers of Greek, compared to native speakers of English (Thierry et al., 2009).”
    1. Here however, no evidence of lateralized categorical clustering from fMRI
  41. Neural research on Macaques and color, but there are differences in brain structure and sensitivities in photoreceptors between them and us so we need to keep that in mind when examining the results from animal experiments on color
  42. “We proposed a model that explains the clustering of the neural color spaces from V4v and VO1, as well as the changes in response amplitudes (gain) and SNR observed in all visual areas. In this model, the categorical clustering observed in V4v and VO1 is attributed to a color-specific gain change, such that the gain of each neuron changes as a function of its selectivity relative to the centers of the color categories.”
Tagged ,

Mapping the stereotyped behaviour of freely-moving fruit flies. Berman, Choi, Bialek, Shaevitz. Journal of the Royal Society Interface 2014

  1. “A frequent assumption in behavioural science is that most of an animal’s activities can be described in terms of a small set of stereotyped motifs.”
  2. Create a system to analyze behavior and find that about half the time, behavior is based on 100 different stereotyped behavioral states
  3. Stereotypy – “that an organism’s be- haviours can be decomposed into discrete, reproducible elements”
  4. Although animals can move in a really enormous space of movements, they are thought to keep motion in a small set of motion, made up of stereotyped actions (may be specific to time range, individual throughout life, or a species)
  5. “A discrete behavioural repertoire can potentially arise via a number of mechanisms, including mechanical limits of gait control, habit formation, and selective pressure to generate robust or optimal actions.”
  6. For the most part, stereotypy hasn’t been studied experimentally, mostly because of a “lack of a comprehensive and compelling mathematical frame- work for behavioural analysis”
    1. They introduce a system for doing so
  7. Most previous methods for characterizing behavior fall into one of two categories:
    1. Very coarse metrics (like mean velocity, count # times a barrier is crossed).  This makes analysis and data collection easy, but only captures a tiny amount of relevant information
    2. The other approach is to log behavior in terms of a number of different possible categories.  This can be done by hand or by machine.  The problem with this is it introduces bias (you find only what classes you defined in the first place, and assumes small number of discrete high-level behaviors)
  8. A better system would be one that works from the bottom up, starting directly with the data as opposed to operator-defined classes
  9. “The basis of our approach is to view behaviour as a trajectory through a high-dimensional space of postural dynamics. In this space, discrete behaviours correspond to epochs in which the trajectory exhibits pauses, corresponding to a temporally-extended bout of a particular set of motions. Epochs that pause near particular, repeatable positions represent stereotyped behaviours. Moreover, moments in time in which the trajectory is not stationary, but instead moves rapidly, correspond to non-stereotyped actions.”
  10. Specifically, based on the data found from fruit flies
    1. “These stereotyped behaviours manifest themselves as distinguishable peaks in the behavioural space and correspond to recognizably distinct behaviours such as walking, running, head grooming, wing grooming, etc.”
  11. The fly is isolated into a 200×200 window in each frame (full image has 40k pixels), and then rotated/translated/resized to get standard representation
  12. Nearly all variance (93%) in images can be represented by PCA down to 50D
  13. Use a spectrogram representation of postural dynamics based on Morlet continuous wavelet transform <?>
    1. “Although similar to a Fourier spectrogram, wavelets possess a multi-resolution time- frequency trade-off, allowing for a more complete description of postural dynamics occurring at several time scales”
  14. Embedding is made up of 25 frequency channels for the 50 eigenmodes, so each point in time is represented by 1,250D
    1. Because behavior is highly correlated, they speculate a much lower dimensional manifold lies inside this space that describes behavior
  15. The goal is to map the 1250D data to something much smaller where trajectories in that space pause when stereotyped behavior occurs. “This means that our embedding should minimise any local distortions.”<I don’t know why>
  16. So they approach they chose “reduces dimensionality by altering the distances between more distant points on the manifold.”
    1. PCA, MDS, Isomap to exactly the opposite of this, prioritizing large-scale accuracy over local
  17. “t-Distributed Stochas- tic Neighbor Embedding (t-SNE)” however, does satisfy their requirement
    1. “For t-SNE, the conserved invariants are related to the Markov transition probabilities if a random walk is performed on the data set.”  Transition probabilities between two time points are based on a Gaussian kernel over distance
    2. Minimizes “local distortions” <?>
  18. With t-SNE, transition probabilities are similar to larger-space transition probabilities, but are proportional to Cauchy/Student-t kernel of points’ Euclidian distances
  19. Problem with t-SNE is quadratic memory use – they use importance sampling to subsample to 35k data points and then run on that
  20. A distance function is still needed.  They use KL-divergence
  21. They are able to embed data nicely in 2D – going to 3D leads to a 2% reduction of embedding cost function
  22. Get a probability density over behavior by convolving each point in embedded map with Gaussian
    1. Space has peaks, and trajectories pause at peak locations when conducting stereotyped behavior
  23. This 2D space is then decomposed by a “watershed transform”, where points are grouped together if hill-climbing from them leads to the same (local) maximum
  24. Peaks correspond to commonly defined behaviors, but here, everything is bottom-up from the data.  Also, nearby regions encode similar but distinct behavior
  25. As expected, many behaviors (like running) are confined to an orbit in the low-dimensional space
  26. Were able to pull apart distinguishing characteristics between male and female behavior

Sharing Features among Dynamical Systems with Beta Processes. Fox, Sudderth, Jordan, Willsky. NIPS 2009

  1. Bayesian nonparametric for modeling time series
    1. Beta process prior “approach is based on the discovery of a set of latent dynamical behaviors that are shared among multiple time series.  The size of the set and the sharing pattern are both inferred from data.”
  2. Develop efficient MCMC method based on indian buffet process
  3. “We specifically focus on time series where behaviors can be individually modeled via temporally independent or linear dynamical systems, and where transitions between behaviors are approximately Markovian.”
    1. Examples are HMM, switching vector autoregressive process, linear dynamical systems
  4. “Our approach envisions a large library of behaviors, and each time series or object exhibits a subset of these behaviors.  We then seek a framework for discovering the set of dynamic behaviors that each object exhibits.”
  5. Behaviors an object can exhibit is described in a feature list, N objects with K features can be described by a NxK matrix
    1. Beta process is used to infer # of features, Indian buffet process
  6. “Given a feature set sampled from the IBP, our model reduces to a collection of Bayesian HMMs (or SLDS) with partially shared parameters.”
  7. Also mention:
    1. HDP-HMM: “does not select a subset of behaviors for a given time series, but assumes that all time series share the same set of behaviors and switch among them in exactly the same manner.”
    2. Infinite factorial HMM: “models a single time-series with emissions dependent on a potentially infinite dimensional feature that evolves with independent Markov dynamics.”
  8. MCMC method is “efficient and exact”
  9. <This is a little heavy for my faculties at the moment so skimming>
  10. MCMC interleaves Metropolis-Hastings with Gibbs “We leverage the fact that fixed feature assignments instantiate a set of finite AR-HMMs, for which dynamic programming can be used to efficiently compute marginal likelihoods. Our novel approach to resampling the potentially infinite set of object-specific features employs incremental “birth” and “death” proposals…”
  11. Screen Shot 2014-11-03 at 2.03.04 PM
  12. Mocap experiments
  13. Discusses other methods that have been good for “describing simple human motion…”, “However, there has been little effort in jointly segmenting and identifying common dynamic behaviors amongst a set of multiple motion capture (MoCap) recordings of people performing various tasks.”  BP-AR-HMM does this
  14. Looked at 6 CMU exercise routines.  Original data was 62D, they manually selected 12 dimensions from that set, subsample in time as well
  15. Does a pretty nice job clustering motions

Quantifying the Internal Structure of Categories Using a Neural Typicality Measure. Davis, Poldrack. Cerebral Cortex 2014

  1. Deals with the internal structure/representation of category information
  2. <Seems like assumption is there is something of an exemplar representation>
  3. “Internal structure refers to how the natural variability between-category members is coded so that we are able to determine which members are more typical or better examples of their category. Psychological categorization models offer tools for predicting internal structure and suggest that perceptions of typicality arise from similarities between the representations of category members in a psychological space.”
  4. Based on these models, develop a “neural typicality measure” that checks if a category member has a pattern of activation similar to other members of its group, as well as what is central to a neural space.
  5. Use an artificial categorization task, find a connection between stimulus and response
    1. “find that neural typicality in occipital and temporal regions is significantly correlated with subjects’ perceptions of typicality.”
  6. “The prefrontal cortex (PFC) is thought to represent behaviorally relevant aspects of categories such as
    rules associated with category membership (…). Motor and premotor regions may represent habitual responses associated with specific categories (…). The medial temporal lobe (MTL) and subregions of the striatum are thought to bind together aspects of category representations from these other systems.”
  7. Different areas and different neurons and patterns of activation in an area can “reliably discriminate
    between many real world object categories”
  8. Consider examples of category data as having some sort of “internal structure” or feature representation specific to that class.
    1. These features can say things like how typical a concrete example is, and is related to how quickly and accurately classification occurs
  9. “Depending on the specific model, a category representation may be a set of points associated with a given category (exemplar models; …), a summary statistic ( prototype models; …), or a set of statistics (clustering models; …) computed over points associated with a category.”
  10. Items closer to other examples in the class, or to the prototype are considered to be most typical or likely
  11. But they don’t propose that an accurate model is exactly the same thing a computer does, as there are examples of where nonintuitive things happen.
    1. Ex/ culture can influence how things are categorized, as can a current task or other context
  12. “Here, our goal is to develop a method for measuring the internal structure of neural category representations and test how it relates to physical and psychological measures of internal structure.”
  13. The neural typicality measure is related to nonparametric kernel density estimators, but “A key difference between our measure and related psychological and statistical models is that instead of using psychological or
    physical exemplar representations, our measure of neural typicality is computed over neural activation patterns…”
  14. Use a well studied research paradigm of categorizing simple bird illustrations into 4 categories based on neck angle and leg length.  Previous results show people reconstruct classes based on average item for each category
  15. “Our primary hypothesis is that psychological and neural measures of internal structure will be linked, without regard to where in the brain this might occur.”
    1. Also expect that some categorization will happen in visual cortex, and higher level temporal and medial-temporal regions, which “…. are theorized to bind together features from early visual regions into flexible conjunctive category representations (…).”
    2. There are other parts relevant to categorization, but not particularly this form of visual categorization, and other parts may be sensitive to things like entropy
  16. “To foreshadow the results, we find that neural typicality significantly correlates with subjects’ perceptions of typicality in early visual regions as well as regions of the temporal and medial temporal cortex. These results suggest that neural and psychological representational spaces are linked and validate the neural typicality measure as a useful tool for uncovering the aspects of category representations coded by specific brain regions.”
  17. “For analysis of behavioral responses, response time, and typicality ratings, a distance-to-the-bound variable was constructed that gave each stimulus’ overall distance from the boundaries that separate the categories in the stimulus space. Distance-to-the-bound is a useful measure of idealization: items that are distant from the bound are more idealized than items close to the bound (…).”
  18. “For the psychological typicality measure, a value for each of the Test Phase stimuli was generated by interpolating, on an individual subjects basis, a predicted typicality rating from the subjects’ observed typicality ratings…”
  19. Also did a physical typicality measure, which is pretty simple to understand (just neck angle, leg length measurements)
  20. Then a neural typicality <too much details to list here>
    1. “Our neural typicality measure is based on similarities between multivariate patterns of activation elicited for
      stimuli in the task. Stimuli that elicit activation patterns that are like other members of their category are more neurally typical than those that elicit dissimilar patterns of activation.”
  21. Subjects’ behavioral responses were predicted by SVM
  22. Typicality ratings were highly correlated with distance-to-the-bound
    1. Reveals that most typical items, and not the average item are the one that is used for category representation.  There are a few other results that show this is the case through other methodology
  23. Neural typicality is linked to psychological typicality
  24. Found activity in visual cortex and MTL that have been found to be linked to categorization
  25. “These results suggest that, in the present task, the internal structure of neural category representations in temporal and occipital regions are linked to subjects’ psychological category representations such that objects that are idealized or physical caricatures of their category elicit patterns of activation that are most (mathematically) similar to other members of their category.”
  26. “… in the present task, physical similarity is not a significant contributor to the internal structure of neural category representations, at least not at a level that is amenable to detection using fMRI.”
  27. Also did MDS for classification on the neural data, <results don’t seem amazing, but only ok>
  28. SVM for classification “The SVMs are given no information about the underlying stimulus space, and unlike
    the MDS analysis, do not make any assumptions about how the dimensions that separate the categories will be organized. Thus, the SVMs can be sensitive to regions that code rulebased or behavioral differences between categories, regions that encode information about their perceptual differences, or regions that code some combination of behavioral and perceptual information.”
  29. “Although there is strong overlap in the visual and MTL regions that discriminate between categories and represent
    internal structure, the motor/premotor, insula, and frontal regions were only identified in the between-category analysis. These results are consistent with the hypothesis that PFC and motor/premotor regions are more sensitive to behavioral aspects of categories (…). However, because behavioral responses are strongly associated with the perceptual characteristics of each category, the SVM results are also consistent with the hypothesis that these regions contain some perceptual information about the categories.”
  30. “The present research adds to the growing consensus that categorization depends on interactions between a number of
    different brain regions… An important point that this observation highlights is that there may not be any brain region that can be thought of representing all aspects of categories, and thus it might be most accurate to think of brain regions in terms of the aspects of category representations that they code.”
  31. “…in the present context, the deactivation of regions of the striatum with increasing typicality likely indicates an uncertainty signal, as opposed to category representation…”
  32. “Because our neural typicality measure is not based on mean activation-level differences between stimuli, it may be
    more directly interpretable and less susceptible to adjacency effects in studies of longer term internal category structure.”

    1. <Hm, should read their methodology more carefully on another read-through>
  33. They don’t have results that indicate suppression of adjacent stimulus
  34. Says their methodology should be tested in real-world, and more artificial settings
  35. Evidence of “dimensional selective attention” where not all features are attended to for classificaiton
    1. “Attentional mechanisms in the PFC that instantiate rule-based strategies (…) may contribute to selective attention effects by influencing neural representations in a top-down manner.”
    2. Although: “In the present context, dimensional selective attention is insufficient for explaining the idealization effect because dimensional selective attention affects an entire dimension uniformally… additional mechanisms are required.”
  36. “Attention has been found to create a spotlight around salient regions of visual space such that the processing of stimuli
    close to this location in space is enhanced (not just differences along a specific dimension of visual space; …). It is conceptually straightforward to predict that the same or similar spotlight mechanisms may affect the topography of stored neural stimulus representations, such that regions of a category space that contain highly idealized category members are enhanced and contribute more to categorization and typicality judgments than exemplars in ambiguous regions of category space.”
  37. Another model is one that specifically tries to “… to reduce prediction error and confusion between categories (…). In these models, category members are simultaneously pulled toward representations/members of their own categories and repelled by members of opposing categories.”
    1. But this doesn’t seem to be a possible explanation here because “… the neural effects as actual neuronal changes in regions of early visual cortex happen on a much longer scale than our task.”
  38. This study only tried to find correlation between “psychological” and “neurological” responses, but more in-depth exploration of their relationship is a good idea and left for future work
  39. “Our task involves learning to distinguish multiple categories, akin to A/B tasks, and so our finding that early visual cortex is involved with representing category structure may be at odds with theories emphasizing the role of task demands (as opposed to featural qualities) in determining which perceptual regions will be recruited to represent categories.”
    1. Although these distinctions may be an artifact of the type of analysis used

A Local Algorithm for Finding Well-Connected Clusters. Lattanzi. Mirrokni, Zhu. ICML 2013 Talk.


  1. A good cluster should have few edges going relative to the size of the size of the cluster
  2. Called the conductance (clustering by cutting according to that criteria)
  3. Google research, so google scale, doesn’t want even linear data cost
  4. Algorithm is initialized on some vertex, moves around locally, and then finally produces some set S, hopefully with a low conductance
  5. Has theoretical bounds, cant do better than that due to Cheeger’s inequality
  6. Algs have runtime linear in size of cluster made from starting point
  7. Algorithm is quite simple
  8. Proposes a different metric than standard conductance; connectivity
  9. Want small conductance and high connectivity (imposed by a relative combination); want better connection inside than outside
  10. This metric improves theoretical bounds, this beats other random walk based spectral algorithms
  11. Use of this extra data allows for tighter bounds
  12. <Proofs>
  13. Claim a followup work that is local but is big-O optimal based on flow analysis