- Consider continuous, high dimensional, noisy time series data
- Also assume it may not be the case that there is enough information in the data to do very accurate prediction (high Bayes error)
- Assume nonstationary: x(t) predicts y(t), but at a different t, the same value for x may produce a different prediction
- To resolve this, some form of context is required. With the correct context the process is stationary
- Amount of time for context may be unkown, may be very large

- May be nonstationary so that summary statistics change over time. In some cases, change in frequency is so relevant that its better to work in the frequency-domain than the time-domain
- <Been meaning to learn about that>

- In vision, often things like translation and scale invariance are desired. In time series analysis, we desire invariance to translations in time
- So yeah its a gnarly problem. Picking the right representation is key
- Here they consider finding a representation

- Discusses hidden (and hidden and gated) Boltzmann machines, although I won’t take notes because its probably isn’t what they really will use anyway
- Auto-encoders
- Basic linear auto-encoder is same as PCA
- Terms in cost function include for sparsity and to keep weights close to 0 <listed as 2 different things, but how are they distinct?>
- Recurrent neural network
- Regularization terms “… prevents a the trivial learning of a 1-to-1 mapping of the input to the hidden units.”
- RBMs don’t need regularization because stochastic binary hidden unit acts as a regularizer, although it is possible to add regularization on top anyway
- Recurrent neural network
- Trained by backprop-through-time
- “RNNs can be seen as very deep networks with shared parameters at each layer when unfolded in time.”
- Deep learning
- Convolution, pooling
- Other methods for dealing with time-data aside from simple recurrent networks is penalizing changes in the hidden layer from one time step to the next
- Also mention slow feature analysis
- “Temporal coherence is related to invariant feature representations since both methods want to achieve small changes in the feature representation for small changes in the input data.”
- Hidden Markov Models
- But require discrete states
- Limited representational capacity
- Not set up well to track history

- “The use of Long-short term memory (…) or hessian-free optimizer (…) can produce recurrent networks that has a memory of over 100 time steps.”
- Some models are generative, others are discriminative. Auto-encoder isn’t generative, but “a probabilistic interpretation can be made using auto-encoder scoring (…)”
- <According to the table in the paper, recurrent neural networks are most appropriate for what we are considering>
- Discusses video data
**Lots of relevant work to investigate here**

- “The use of deep learning, feature learning, and convolution with pooling has propelled the advances in video processing.” Deep learning is natural because it is state of the art on still images, but extensions are needed to deal with the temporal aspect
- “The early attempts at extending deep learning algorithms to video data was done by modelling the transition between two frames. The use of temporal pooling extends the time-dependencies a model can learn beyond a single frame transition. However, the time-dependency that has been modeled is still just a few frames. A possible future direction for video processing is to look at models that can learn longer time-dependencies.”
- Other examples <with a fair amount of space given that I’m skipping> is stock prices and music recognition
**Motion Capture Data**- Previous applications were Temporal Restricted Boltzmann Machines (TRBM), and conditional RBM (Hinton in both papers), then recurrent TRBM
- Mirowski and LeCun used dynamic factored graphs to fill in missing mocap data
- “A motivation for using deep learning algorithms for motion capture data is that it has been suggested that human motion is composed of elementary building blocks (motion templates) and any complex motion is constructed from a library of these previously learned motion primitives (Flash and Hochner, 2005). Deep networks can, in an unsupervised manner, learn these motion templates from raw data and use them to form complex human motions.”
- Section on machine olfaction, physiological eeg, meg, ecg
- “In order to capture long-term dependencies, the input size has to be increased, which can be impractical for multivariate signals or if the data has very long-term dependencies. The solution is to use a model that incorporates temporal coherence, performs temporal pooling, or models sequences of hidden unit activations.”

Advertisements