- <An on-line method. See point #13>
- Basic point is that vanilla SFA does not scale well to increasing dimension
- Here they use a <different?> objective function “…that maximizes output variance over a long period whilst minimizing variance over a shorter period.”
- This work shows how to do this via kernel trick

- “This leads to an efficient method that maps to an architecture that could be biologically implemented either by Sigma-Pi neurons, or fixed RBF networks (as described for SFA[…]).”
- “We demonstrate that using this method to extract features that vary slowly in natural images leads to the development of both the complex-cell properties of translation invariance and disparity coding simultaneously.”
- Yes the approach here uses a different objective function that tries to minimize the relative short term vs long term variance
- Instead of just expanding the input data by a series of functions, instead uses kernel trick over the data corpus to achieve a similar result while keeping computational costs in check (at least in terms of what would exist in the expansion of the raw input space – for example a polynomial expansion of image data would be quite large)
- “We assume the solution
can be written as an expansion in terms of mapped training data:**w****w**^{l}_{i=1}α_{i}Φ(*x*)”_{i} - From this you can rewrite the objective function in terms of the kernel, as opposed to standard basis-functions
- Need sparsification because otherwise the
*l*x*l*matrix operation quickly becomes computationally problematic as well- They subsample the original data set, and call this a
*basis set*, or BS

- They subsample the original data set, and call this a
- Some data is computed only between elements of the basis set (such as estimating covariances and eigen decomposition), while others goes between elements of the basis set and the entire corpus <seems like these are the long and short term averages?>
- To select their basis set they use a greedy method that minimizes least-squares error between data points
- “The complete online algorithm requires minimal memory, making it ideal for very large data sets. The implementation estimates long- and short-terms kernel means online using exponential time averages parameterised using half-lives of Λ
_{s}, Λ_{l}(as in […]). Likewise, the covariance matrices … are updated online at each time step … there is therefore no need to explicitly compute or store kernel matrices.” - Empirical results come from stereo 128×128 greyscale images. They translated images by a pixel at random
- <Seems like they trained 20 8×8 simple cells by some other algorithm, and then used that as some sort of underlying step? These simple cells “maximises a nonlinear measure of temporal correlation (TRS) between the present and previous output…” Resulting simple cells are similar to Gabor weight vectors>
- <Yeah> “The complex cells received input from these 20 types of simple cells when processing both the left and right eye images. Complex cells had a spatial receptive field of 4×4; each cell therefore received 320 simple cell inputs (2x4x4x20)…”
- Claim translation invariance <but its hard to know how much is due to SFA and how much falls out from the simple cells below>

## Kernel-based Extraction of Slow Features: Complex Cells Learn Disparity and Translation Invariance from Natural Images. Bray, Martinez. NIPS 2002.

**Tagged**kernel trick, SFA