Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses. Wiskott. Neural Computation 2003.

1. The principle of slowness is based on the observation that different things around us change at different rates.  If we can extract things that change slowly then we may have a useful representation of the environment
2. This paper is a supplement to Wiskott and Sejnowski 2002.
3. The constraints have been discussed previously, so not copying here.
4. Most general approach to the problem is variational calculus – combined with assumptions you get your Lagrangian
5. Any solution to the constraints solves the Euler-Lagrange differential equation
1. We are using the Euler-Lagrange to solve the constraint problem though, which is the opposite direction.  As it turns out, the Euler-Lagrange is necessary but not sufficient.  Several solutions may exist and in that case the optimal one must be found based on additional conditions/constraints
6. There is also an algebraic approach.  The distinction is that with variational calculus the function space is infinite-dimensional, wheras with the algebraic approach the analysis has finite dimension.  This isn’t much of a problem though, because the dimension can still be arbitrary, and computers can only operate on finite dimension anyway
7. Assumptions of smoothness, differentiability, and L2
8. The goal is to be able to build the response y_j(t) as a  linear combination of basis functions
1. sum_m w_{jm}b_m(t)
2. Weight vectors must be orthonormal
9. Constraints 2 and 3 (of 3; #1 is zero-mean which is easy to satisfy anyway) are satisfied by taking the Eigenvectors with smallest value from the matrix of the inner products of the time derivative
10. Make a note (saying it is non-obvious) that the time derivatives are mutually orthogonal
11. Theorem says that its possible to set criteria slightly differently, but to end up with a solution to the original problem.  This is the case when ys:
1. Are zero mean
2. Have unit variance
3. Are decorrelated
4. Have orthogonal time derivatives
5. Are ordered by slowness
6. <I was originally confused how they could do this in terms of criteria on y, but since this is an unsupervised learning problem, isn’t given, it has to be found.  The point is either the variational calculus approach or geometric approach also accomplishes this, or that this would accomplish the work those do.>
12. Notice that it is equivalent to say that two functions y_j and y_are uncorrelated or that they are orthogonal, because the functions have zero mean. This is not true for the time derivatives, which must be orthogonal but not necessarily uncorrelated, because they may not have zero mean.”
13. The optimal set of output functions y are unique iff the Eigenvalues of the matrix of the inner products of the time derivative are all unique (cannot have equal Eigenvalues).
14. <The notation is a little confusing.  I think they use dot for derivative, but there are /primes all over the place, I don’t think it means anything in particular?>
15. Fourier analysis (over some fixed range) produces optimal responses.  The basis functions and their time derivatives are mutually orthogonal
16. <But theyre doing it over a fixed time range instead of range as normally considered?>