## Understanding Slow Feature Analysis: A Mathematical Framework. Sprekeler, Wiskott.

1. “Here, we present a mathematical analysis of slow feature analysis for the case where the input-output functions are not restricted in complexity. We show that the optimal functions obey a partial di erential eigenvalue problem of a type that is common in theoretical physics.”
1. So the paper assumes a theoretically interesting, but practically unachievable, setting, which is different from what is considered in practice where the function space has finite dimension
2. Looking for invariant representations
3. SFA looks for temporal correlations in data
4. There are some other approaches aside from SFA that address this general questions
5. “…it has turned out that SFA is closely related to independent component analysis techniques that rely on second order statistics (…). SFA or variations thereof can therefore also be used for problems of blind source separation (…).”
6. In the infinite-dimension function space assumption, “The key results are that the output signals extracted by SFA are independent of the representation of the input signals and that the optimal functions for SFA are the solutions of a partial diff erential eigenvalue problem.
7. “The optimization problem with a restricted function space is generally not invariant with respect to coordinate changes of the input signals.”
8. Also assume full conditional probabilities of P(s’|s) are known, gets into integrals so its another <less strong> reason why this paper is of mathematical as opposed to practical interest
9. The original optimization problem is cast as another one, and a differential equation is presented for that second problem
10. <Skipping the derivations>
11. “The unit variance constraint requires that the square sum of the coecients w_i is unity [1]”
12. <Blind source separation is also discussed a bit, but I’m not really paying much attention to that either.  Point is SFA in the case considered can be used for that as well>
1. “The main result of the above theorem is that in the case of statistically independent sources, the output signals are products of harmonics of the sources.”
2. First harmonic is monotonic
13. Then move onto “… the sources are reversible Gaussian stochastic processes, (i.e., that the joint probability density of s(t) and s(t + dt) is Gaussian and symmetric with respect to s(t) and s(t + dt)). In this case, the instantaneous values of the sources and their temporal derivatives are statistically independent…
1. <This is nice because it makes the work more practically applicable>
2. Then move on to other special cases: homogeneously distributed sources, weakly inhomogeneous sources
14. Then moves on to analogies in physics <also not really paying attention to>
15. On to discussion
16. “The predictive power of the presented framework is particularly strong in applications where high-dimensional training data are taken from low-dimensional manifolds”
1. Covers problem of “… learning place and head direction codes (…) from quasi-natural images…”