## What is the Relation Between Slow Feature Analysis and Independent Component Analysis? Blaschke, Berkes, Wiskott. Neural Computation 2006.

1. In the case of “one time delay” <not sure what that means yet>, linear SFA and “second-order independent component analysis” <also not sure what second-order means>
2. Also considers other time delays and extensions
3. ICA finds a representation of the data such that signal components are mutually statistically independent, which can be used to separate the two speakers [two people talking in the same recording; how do you separate them?] in the example above.”
4. SFA is constrained to nonwhite signals with a temporal structure (e.g., speech signals) and it is based on second-order statistics. We therefore compare it to ICA algorithms that only use second-order information and need a temporally structured signal as well.”
5. SFA usually uses nonlinear basis functions, whereas ICA must be linear as it is otherwise underdetermined
1. Because of the limitation of ICA, this paper considers SFA where basis functions are only linear
6. The features in the source signal are statistically independent (didn’t know this requirement existed)
7. Assumes that data is “whitened” – manipulated s.t. zero-mean and unit variance
8. “It can be shown that after the whitening step

an orthogonal transformation Q on y is sufficient to yield independent components…”

9. <Ah> In ICA, the general idea is that  the two signals in the data are independent – there are a few ways this can be measured, (such as equivalence of joint and marginal distributions) here K-L divergence is used (which they call higher-order ICA).  If however, one signal is white noise, and the other signal is simply the first signal shifted in time, when looking within a time step the data will looking completely independent (no information between them).  On the other hand, if the analysis is done across timesteps, it is clear that one distribution contains all the information needed to describe the other distribution (just with a temporal delay) “This dependence across time can be taken into account using a different measure where two signals are considered statistically independent if all time-delayed correlations are zero (second-order ICA)”
1. <Ok, a little confusing – it seems like the K-L divergence can measure across any length of time (higher-order), but just looking forward or back one step is “second-order”.  Not clear>
10. Sources must have a time structure (this makes sense because otherwise there would be nothing to pick up)
11. First focus is on the case where there is a single time delay <not clear if it is assumed to be known a-priori or not>
12. Given a whitened input signal, y(t) = Wx(t) for whitening matrix W, and input vector x, SFA finds a rotation matrix Q such that u_i of u(t) = Qy(t) vary as slowly as possible in time and are ordered be decreasing slowness (slowest at beginning, fastest at end)
13. In the setup used, rewriting the equations that derive SFA in a different form produces the equations for ICA
14. “… we arrive at the important result that linear SFA is formally equivalent to second-order ICA with time delay one. [their emphasis]”
15. Then an extension is included so ICA can operate over multiple time-distances (as opposed to one preselected one), which they call joint-diagonalization ICA
1. It is intuitively clear that by enlarging the window length the unmixing performance should improve until the width of the autocorrelation function is reached. Exceeding this limit would introduce matrices consisting entirely of zero-mean noise, which would degrade the unmixing performance.”
16. They then add the same extension to SFA
1. “Thus, if the auto-correlations of a component have different

signs for different time-delays, the objective function appears to be inconsistent, at least for that

component. This conflict cannot be solved as easily… We believe that by weighting the first auto-correlation stronger than the others, e.g. with an exponential decay of the weights, the inconsistencies can be largely avoided.”

17. An alternative to examine a number of time-delays is to (weighted) average over a range of them, which is simpler
18. <Conclusion>
19. This [equivalence of SFA and ICA] is surprising, because SFA and ICA are based on two very different principles: slowness vs. statistical independence. These principles might seem to contradict each other, because two analog signals of finite length would typically become more statistically dependent if they are more slowly varying.”
20. In the non-linear case, equivalence between the two goes away, as the objectives change
21. Also, there are known differences between higher-order ICA (which has been shown to recreate Gabors, which model receptive fields in V1) and SFA/second-order ICA produce spatial low-pass filters
1. This suggests that the solutions found by second-order ICA and higher-order ICA can be very different in practice even though both methods try to maximize statistical independence.”
22. “… when taking several time delays into account, the conceptual differences between ICA and SFA become relevant.”
23. … in the nonlinear case the conceptual differences between ICA and SFA also matter.”