Understanding Slow Feature Analysis: A Mathematical Framework. Sprekeler, Wiskott.

  1. “Here, we present a mathematical analysis of slow feature analysis for the case where the input-output functions are not restricted in complexity. We show that the optimal functions obey a partial di erential eigenvalue problem of a type that is common in theoretical physics.”
    1. So the paper assumes a theoretically interesting, but practically unachievable, setting, which is different from what is considered in practice where the function space has finite dimension
  2. Looking for invariant representations
  3. SFA looks for temporal correlations in data
  4. There are some other approaches aside from SFA that address this general questions
  5. “…it has turned out that SFA is closely related to independent component analysis techniques that rely on second order statistics (…). SFA or variations thereof can therefore also be used for problems of blind source separation (…).”
  6. In the infinite-dimension function space assumption, “The key results are that the output signals extracted by SFA are independent of the representation of the input signals and that the optimal functions for SFA are the solutions of a partial diff erential eigenvalue problem.
  7. “The optimization problem with a restricted function space is generally not invariant with respect to coordinate changes of the input signals.”
  8. Also assume full conditional probabilities of P(s’|s) are known, gets into integrals so its another <less strong> reason why this paper is of mathematical as opposed to practical interest
  9. The original optimization problem is cast as another one, and a differential equation is presented for that second problem
  10. <Skipping the derivations>
  11. “The unit variance constraint requires that the square sum of the coecients w_i is unity [1]”
  12. <Blind source separation is also discussed a bit, but I’m not really paying much attention to that either.  Point is SFA in the case considered can be used for that as well>
    1. “The main result of the above theorem is that in the case of statistically independent sources, the output signals are products of harmonics of the sources.”
    2. First harmonic is monotonic
  13. Then move onto “… the sources are reversible Gaussian stochastic processes, (i.e., that the joint probability density of s(t) and s(t + dt) is Gaussian and symmetric with respect to s(t) and s(t + dt)). In this case, the instantaneous values of the sources and their temporal derivatives are statistically independent…
    1. <This is nice because it makes the work more practically applicable>
    2. Then move on to other special cases: homogeneously distributed sources, weakly inhomogeneous sources
  14. Then moves on to analogies in physics <also not really paying attention to>
  15. On to discussion
  16. “The predictive power of the presented framework is particularly strong in applications where high-dimensional training data are taken from low-dimensional manifolds”
    1. Covers problem of “… learning place and head direction codes (…) from quasi-natural images…”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: