A Review of Unsupervised Feature Learning and Deep Learning for Time-Series Modeling. Langkvist, Karlsson, Loutfi. Pattern Recognition Letters 2014.


  1. Consider continuous, high dimensional, noisy time series data
  2. Also assume it may not be the case that there is enough information in the data to do very accurate prediction (high Bayes error)
  3. Assume nonstationary: x(t) predicts y(t), but at a different t, the same value for x may produce a different prediction
    1. To resolve this, some form of context is required.  With the correct context the process is stationary
    2. Amount of time for context may be unkown, may be very large
  4. May be nonstationary so that summary statistics change over time.  In some cases, change in frequency is so relevant that its better to work in the frequency-domain than the time-domain
    1. <Been meaning to learn about that>
  5. In vision, often things like translation and scale invariance are desired.  In time series analysis, we desire invariance to translations in time
  6. So yeah its a gnarly problem.  Picking the right representation is key
    1. Here they consider finding a representation
  7. Discusses hidden (and hidden and gated) Boltzmann machines, although I won’t take notes because its probably isn’t what they really will use anyway
  8. Auto-encoders
  9. Basic linear auto-encoder is same as PCA
  10. Terms in cost function include for sparsity and to keep weights close to 0 <listed as 2 different things, but how are they distinct?>
  11. Recurrent neural network
  12. Regularization terms “… prevents a the trivial learning of a 1-to-1 mapping of the input to the hidden units.”
  13. RBMs don’t need regularization because stochastic binary hidden unit acts as a regularizer, although it is possible to add regularization on top anyway
  14. Recurrent neural network
  15. Trained by backprop-through-time
  16. “RNNs can be seen as very deep networks with shared parameters at each layer when unfolded in time.”
  17. Deep learning
  18. Convolution, pooling
  19. Other methods for dealing with time-data aside from simple recurrent networks is penalizing changes in the hidden layer from one time step to the next
  20. Also mention slow feature analysis
  21. “Temporal coherence is related to invariant feature representations since both methods want to achieve small changes in the feature representation for small changes in the input data.”
  22. Hidden Markov Models
    1. But require discrete states
    2. Limited representational capacity
    3. Not set up well to track history
  23. “The use of Long-short term memory (…) or hessian-free optimizer (…) can produce recurrent networks that has a memory of over 100 time steps.”
  24. Some models are generative, others are discriminative.  Auto-encoder isn’t generative, but “a probabilistic interpretation can be made using auto-encoder scoring (…)”
  25. <According to the table in the paper, recurrent neural networks are most appropriate for what we are considering>
  26. Discusses video data
    1. Lots of relevant work to investigate here
  27. “The use of deep learning, feature learning, and convolution with pooling has propelled the advances in video processing.”  Deep learning is natural because it is state of the art on still images, but extensions are needed to deal with the temporal aspect
  28. “The early attempts at extending deep learning algorithms to video data was done by modelling the transition between two frames.  The use of temporal pooling extends the time-dependencies a model can learn beyond a single frame transition.  However, the time-dependency that has been modeled is still just a few frames.  A possible future direction for video processing is to look at models that can learn longer time-dependencies.”
  29. Other examples <with a fair amount of space given that I’m skipping> is stock prices and music recognition
  30. Motion Capture Data
  31. Previous applications were Temporal Restricted Boltzmann Machines (TRBM), and conditional RBM (Hinton in both papers), then recurrent TRBM
  32. Mirowski and LeCun used dynamic factored graphs to fill in missing mocap data
  33. “A motivation for using deep learning algorithms for motion capture data is that it has been suggested that human motion is composed of elementary building blocks (motion templates) and any complex motion is constructed from a library of these previously learned motion primitives (Flash and Hochner, 2005).  Deep networks can, in an unsupervised manner, learn these motion templates from raw data and use them to form complex human motions.”
  34. Section on machine olfaction, physiological eeg, meg, ecg
  35. “In order to capture long-term dependencies, the input size has to be increased, which can be impractical for multivariate signals or if the data has very long-term dependencies.  The solution is to use a model that incorporates temporal coherence, performs temporal pooling, or models sequences of hidden unit activations.”
Advertisements
Tagged

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: