- Instead of using hand-coded features like SIFT for processing static images in a video, here unsupervised learning is used to generate feature detectors.
- Use extension of Independent Subspace Analysis <I don’t know what that is – ah they say its an extension of independent component analysis>
- Used in conjunction with deep learning methods lick stacking, convolution to generate hierarchical representations
- Method beats previously published results on a number of datasets
- Previous results show that ISA can generate receptive fields simialr to V1 and MT
- As opposed to ICA, ISA “… learns features that are robust to local translation while being selective to frequency, rotation, and velocity.”
- On the other hand, ISA/ICA scale poorly to high dimensional data, so it is modified here to work well in high-D, by leveraging ideas from convnets: convolution and stacking
- In comparison to previous state of the art, all steps of processing are the same aside from the first level of video processing, replacing hand-designed features with learned ones.
- <Since I am exactly interested in the latter parts of processing, I’m going to leave this paper now.>