- Considers ways to work with high dimensional data (and dimension reduction) with a particular focus on visual processing
- “… natural images are often sparse in the sense that if you view them in the wavelet domain (roughly as a superposition of the edges), only a very small number of K wavelet coefficients will have significant power, where K can be on the order of 20,000 for a 1-million-pixel image.”
- “… similarly, neuronal activity patterns that actually occur are often a highly restricted subset of all possible patterns (…) in the sense that they often lie along a low K-dimensional firing-rate space…”
- Here they consider random projections to do dimensionality reduction
- “…compressed sensing… shows that the shadow can contain enough information to reconstruct the original image … as long as the original image is sparse enough.
- L1 minimization
- Random projections can be achieved with neurons
- “… CS [compressed sensing] and RPs [random projections] can provide a theoretical framework for understanding one of the most salient aspects of neuronal information processing: radical changes in the dimensionality, and sometimes sparsity, of neuronal representations, often within a single stage of synaptic transformation.”
- CS also deals with modeling high dimensional data, which is also something the brain has to deal with
- “CS provides mathematical guarantees that one can achieve perfect recovery with a number of measurements M that is only slightly larger than K [the underlying basis dimension basically], as long as the M measurement vectors are sufficiently incoherent with respect to the sparsity domain (…). [next paragraph] An important observation is that any set of measurement vectors, which are themselves random, will be incoherent with respect to any fixes sparsity domain.”
- Only M > O(K log(N/K)) measurements are needed to guarantee perfect reconstruction with high probability
- “… no measurement matrices and no reconstruction algorithm can yield sparse signal recovery with substantially fewer measurements [than is required with random projection]”
- L1 regularization solves a slightly different problem than what one would usually care for, but the original problem is intractable whereas L1 regularization is ok, and it gets very close to optimal results
- “… any projection that preserves the geometry of all K-sparse vectors allows one to reconstruct these vectors from the low-dimensional projection efficiently and robustly using L1 minimization”
- “The celebrated Johnson-Lindenstrauss (JL) lemma (…) provides a striking answer [to how small we can the dimension until the distances in the projection become significantly distorted from that in the original high-dimensional representation]… It states that RPs with M > O(log P) will yield, with high probability, only a small distortion in distance between all pairs of points in the cloud. Thus the number of projected dimensions M needs only be logarithmic in the number of points P independent of the embedding dimension of the source data, N.”
- For “…data distributed along a nonlinear K-dimensional manifold embedded in N-dimensional space… M > O(K log NC) RPs preserve the geometry of the manifold with small distortion.” Where C represents curvature of the manifold
- Results mean that few RPs can be used to represent data accurately
- When noise exists, use LASSO
- “The main outcome is roughly that for an appropriate choice of λ, which depends on the signal-to-noise ratio (SNR), the same conditions that guaranteed exact recovery of K-sparse signals by L1 minimization in the absence of noise also ensure good performance of the LASSO for approximately sparse signals in the presence of noise.”
- Can be used for dictionary learning
- <discussion of a number of particular topics, such as MRI, gene expression analysis (and others), skipping>
- In the brain “information stored in a large number of neurons is often compressed into a small number of axons, or neurons in a downstream system. For example, 1 million optic nerve fibers carry information about the activity of 100 times as many photoreceptors.”
- The JL Lemma shows that you only need a logarithmic number of projections in the number of classes you care about to maintain accuracy. In the brain, “… 20,000 images can be represented by the corresponding population activity in the IT cortex. The the similarity structure between all pairs of images can be preserved to 10% precision in a downstream area using only ~1000 neurons. Furthermore, this result can be achieved by a very simple dimensionality-reduction scheme, namely by a random synaptic connectivity matrix.”
- In the case where the points are continuous and not discrete, significant compression is possible when the points lie on a manifold. In this case only a logarithmic number of neurons (in the number of neurons in the source area) are required.
- “The ubiquity of this low-dimensional structure in neuronal systems may be intimately related to the requirement of communication and computation through widespread anatomical bottlenecks.”
- “Another bottleneck is posed by the task of working memory, where streams of sensory inputs must presumably be stored within the dynamic reverberations of neuronal circuits. This is a bottleneck from time into space.” Temporally extended input streams must be represented in a finite number of neurons. Recurrent networks allow for this sort of representation.
- Other work shows “…a connection between CS and short-term memory by showing that recurrent neuronal networks can essentially perform online, dynamical compressed sensing of an incoming sparse sequence, yielding sequence memory traces that are longer than the number of neurons, again in units of the intrinsic time constant.”
- In many cases, the goal is not to compress the signal, but instead to project it to a higher dimensional space. “For example, information in 1 million optic nerve fibers is expanded into more than 100 million primary visual cortical neurons. Also in the cerebellum, a small number of mossy fibers target a large number of granule cells, creating a 100-fold expansion.”
- Can design ANNs that do either random dimension reduction or expansion, both require just 2-layers
- In general, you need more data than neurons in order to train a system. If the data lies on a lower dimensional manifold you can get away with less
- For classification “A remarkable theoretical result is that if synapses are constrained to be either excitatory or inhibitory, then near capacity, the optimal solution is sparse, with most of the synapses silent (…) even if the input patterns themselves show no obvious sparse structure This result has been proposed as a functional explanation for the abundance of silent synapses in the cerebellum and other brain areas.”
- If weights are unconstrained, optimal solutions are sparse, but not in the basis of neurons. Instead it can be expressed as a linear combination of support vectors
- Because RPs basically maintain Euclidian distances they also maintain margins in the original space
- <May not have finished this>