I Picked up this paper because I saw a reference to it saying this paper shows SFA equivalent to PCA of a low-pass fiter

- Explores how SFA can be implemented “…within the limits of biologically realistic spike-based learning rules.”
- Show a few ways SFA could be implemented with different neural models
- Fastness is measured by the average variance of the time derivative of the output features
- The part of the algorithm that is least plausible from the perspective of a biological system is the eigen decomposition. The main aim of this paper is to show how this could be implemented “… in a spiking model neuron”
- “In the following, we will first consider a continuous model neuron and demonstrate that a modified Hebbian learning rule enables the neuron to learn the slowest (in the sense of SFA) linear combination of its inputs. Apart from providing the basis for the analysis of the spiking model, this section reveals a mathematical link between SFA and the trace learning rule, another implementation of the slowness principle. We then examine if these findings also hold for a spiking model neuron, and find that for a linear Poisson neuron, spike-timing-dependednt plasticity (STDP) can be interpreted as an implementation of the slowness principle
- “Even though there are neurons with transient responses to changes in the input, we believe it would be more plausible if we could derive an SFA-learning rule that does not depend on the time derivative, because it might be difficult to extract, especially for spiking neurons.”
- The time derivative can be replaced by a low-pass filter (a fair amount of math to show that)
- <But earlier in the paper they wrote> “It is important to note that the function
*g*_{1}(*x*) is required to be an instantaneous function of the input signal. Otherwise, slow output signals could be generated by low-pass filtering the input signal. As the goal of the slowness principle is to detect slowly varying features of the*input*signals, a mere low-pass filter would certainly generate slow output signals, but it would not serve the purpose.” <So then what is the difference between this and the low pass filter they just mentioned?> - <After all the math> “Thus, SFA can be achieved either by minimizing the variance of the time derivative of the output signal or by maximizing the variance of the appropriately filtered output signal.” <Oh, I see. You can’t just filter the output, you have to set up the system so it maximizes the variance of the filtered output?>

- Basically from a whitened input, you can either: use the time derivative and then choose the direction of minimal variance,
*or*use a low-pass filter and the choose the direction of maximal variance - “… standard Hebbian learning under the constraint of a unit weight vector applied to a linear unit maximizes the variance of the output signal. We have seen in the previous section that SFA can be reformulated as a maximization problem for the variance of the low-pass filtered output signal. To achieve this, we simply apply Hebbian learning to the filtered input and output signals, instead of the original signals.” <The goal with analysis for the Hebbian rule is to find something more biologically plausible>
- “Thus, the filtered Hebbian plasticity rule … optimizes slowness … under the constraint of unit variance… ” The requirement that the data already has unit variance “… underlines the necessity for a clear distinction between processing <cleaning up the data?> and learning.” <They talk more about processing vs learning but its not clear to me what they mean, which is unfortunate because they say the distinction is even more important when moving to the Poisson model neuron>
- SFA is a quadratic approximation of the trace rule (which comes from different power spectrums for the low pass filters the two use) <I don’t know what anything on this line means.>. In the case that the power spectrum is parabolic (most input power is concentrated at lower frequencies) <Is that what that would mean?>, then the results of both will be similar.
- In SFA the outputs of each step are real-valued, but in a neuron there is simply spiking information. Here a neuron is modeled by “inhomogenous Poisson processes” – here they simply consider information contained in the spike rate, and ignore information held in the exact spike timing (they describe this model mathematically)
- “…in an ensemble-averaged sense it is possible to generate the same weight distribution as in the continuous mode by means of an STDP rule with a specific learning window.”
- <I guess the model is optimal for learning slowness (mostly skipped that section) because it then follows up saying that they have yet to discuss why it is optimal>
- <Skipping neural modelling almost entirely>
- “In the first part of the paper, we show that for linear continuous model neurons, the slowest direction in the input signal can be learned by means of Hebbian learning on low-pass filtered versions of the input and output signal. The power spectrum of the low-pass filter required for implementing SFA can be derived from the learning objective and has the shape of an upside-down parabola.”
- <Immediately following> “The idea of using low-pass filtered signals for invariance learning is a feature that our model has in common with several others […]. By means of the continuous model neuron, we have discussed the relation of our model to these ‘trace rules’ and have shown that they bear strong similarities.”
- In the implementation of SFA on the a Poisson neuron, “The learning window that realizes SFA can be calculated analytically.”
- “Interestingly, physiologically plausible parameters lead to a learning window whose shape and width is in agreement with experimental findings. Based on this result, we propose a new functional interpretation of the STDP learning window as an implementation of the slowness principle that compensates for neuronal low-pass filters such as EPSP.”

- “Of course, the model presented here is not a complete implementation of SFA, the extraction of the most slowly varying direction from a set of whitened input signals. To implement the full algorithm, additional steps are necessary: a nonlinear expansion of the input space, the whitening of the expanded input signals, and a means of normalizing the weights… On the network level, however, whitening could be achieved by adaptive recurrent inhibition between the neurons […]. This mechanism may also be suitable for extracting several slow uncorrelated signals as required in the original formulation of SFA […] instead of just one.”
- For weight normalization “A possible biological mechanism is synaptic scaling […], which is believed to multiplicatively rescale all synaptic weights according to postsynaptic activity… Thus it appears that most of the mechanisms necessary for an implementation of the full SFA algorithm are available, but that it is not yet clear how to combine them in a biologically plausible way.”
- <Immediately following>”Another critical point in the analytical derivation for the spiking model is the replacement of the temporal by the ensemble average, as this allows recovery of the rates that underlie the Poisson process.” Data should be ergodic
- Not yet clear if these results can be reproduced with more realistic model neurons.
- “In summary, the analytical considerations presented here show that (i) slowness can be equivalently achieved by minimizing the variance of the time derivative signal or by maximizing the variance of the low-pass filtered signal, the latter of which can be achieved by standard Hebbian learning on the low-pass filtered input and the output signals; (ii) the difference between SFA and the trace learning rule lies in the exact shape of the effective low-pass filter–for most practical purposes the results are probably equivalent; (iii) for a spiking Poisson model neuron with an STDP learning rule, it is not the learning window that governs the weight dynamics but the convolution of the learning window with EPSP; (iv) the STDP learning window that implements the slowness objective is in good agreement with learning windows found experimentally. With these results, we have reduced the gab between slowness as an abstract learning principle and biologically plausible STDP learning rules, and we offer a completely new interpretation of the standard STDP learning window.”