Empirical Evaluation of Gated Recurrent Neural Networks of Sequence Modelling. Chung, Gulcehre, Cho, Bengio. Arxiv 2014

  1. Compares different types of recurrent units in RNNs
  2. Compares LSTMs and newer Gated Recurrent Unit (GRU)
  3. Test on music and speech signal modelling
  4. These units are found to be better than “more traditional recurrent units such as tanh units”
  5. GRU is found to be comparable to LSTM, although GRU is a bit better
  6.  Of all the impressive recent work of RNNs (including everything that works off of variable size inputs), nothing is from vanilla RNNs
  7. Vanilla RNNs are hard to use because of both exploding and vanishing gradients
    1. Discussed many of the points related to this here
  8. GRUs are somewhat similar to LSTM although the model is a bit simpler
    1. Both can capture long-term dependencies
  9. GRU doesn’t have a separate memory cell like LSTM does
    1. Doesn’t have a mechanism to protect memory like LSTM
  10. shot
  11. Calculating the activation with GRU is simpler as well
  12. Both LSTM and GRUs compute deltas as opposed to completely recomputing values at each step
  13. “This additive nature has two advantages. First, it is easy for each unit to remember the existence of a specific feature in the input stream for a long series of steps. Any important feature, decided by either the forget gate of the LSTM unit or the update gate of the GRU, will not be overwritten but be maintained as it is. Second, and perhaps more importantly, this addition effectively creates shortcut paths that bypass multiple temporal steps. These shortcuts allow the error to be back-propagated easily without too quickly vanishing (if the gating unit is nearly saturated at 1) as a result of passing through multiple, bounded nonlinearities,”
  14. “Another difference is in the location of the input gate, or the corresponding reset gate. The LSTM unit computes the new memory content without any separate control of the amount of information flowing from the previous time step. Rather, the LSTM unit controls the amount of the new memory content being added to the memory cell independently from the forget gate. On the other hand, the GRU controls the information flow from the previous activation when computing the new, candidate activation, but does not independently control the amount of the candidate activation being added (the control is tied via the update gate).”
  15. When comparing gated to vanilla RNN “Convergence is often faster, and the final solutions tend to be better. “

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: