An Empirical Exploration of Recurrent Network Architectures. Jozefowicz, Zaremba, SutskeverAR


  1. Vanilla RNNs are usually difficult to train.  LSTMS are a form of RNN that are easier to train
  2. LSTMs though, have arch that “appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear.”
  3. Tested thousands of different models with different architectures based on LSTM, and also compared new Gated Recurrent Units
  4. “We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.”
  5. RNNs suffer from exploding/vanishing gradients (the latter was addressed successfully in LSTMs)
    1. There are many other ways to work on the vanishing gradient, such as regularization, second-order optimization, “giving up on learning the recurrent weights altogether”, as well as careful weight initialization
  6. Exploding gradients were easier to address with “a hard constraint over the norm of the gradient”
    1. Later referred to as “gradient clipping”
  7. “We discovered that the input gate is important, that the output gate is unimportant, and that the forget gate is extremely significant on all problems except language modelling. This is consistent with Mikolov et al. (2014), who showed that a standard RNN with a hard-coded integrator unit (similar to an LSTM without a forget gate) can match the LSTM on language modeling.”
  8. exploding/vanishing gradients “are caused by the RNN’s iterative nature, whose gradient is essentially equal to the recurrent weight matrix raised to a high power. These iterated matrix powers cause the gradient to grow or to shrink at a rate that is exponential in the number of timesteps.”
  9. Vanishing gradient issue in RNNs make it easy to learn short-term interactions but not long-term
  10. Through reparameterizing, LSTM cannot have a gradient that vanishes
  11. Basically, instead of recomputing weights from weights at the previous state, it only computes a weight delta which is added to the previous weights
    1. The network has additional machinery to do so
    2. Many LSTM variants
  12. Random initialization of the forget gate will leave it with some fractional value, which introduces a vanishing gradient.
    1. It is commonly ignored, but initializing it to a “large value” such as 1 or 2 will prevent vanishing gradient over time
  13. Use genetic algorithms to optimize architecture and hyperparams
  14. Evaluated 10,000 architectures, 1,000 made them past the first task (which would allow them to compete genetically).  Total of 230,000 hyperparameter configs tested
  15. Three problems tested:
    1. Arithmetic: read in a string which has numbers with an add or subtract symbol inside, then the network has to feed out the output.  There are distractor symbols in the string that need to be ignored
    2. Completion of a random XML dataset
    3. Penn Tree-Bank (word level modelling)
    4. Then there was an extra task to test generalization <validation?>
  16. “Unrolled” RNNs for 35 timesteps, minibatch of size 20
  17. Had a schedule for adjusting the learning rate once learning stopped on the initial value
    1. <nightmare>
  18. “Though there were architectures that outperformed the LSTM on some problems, we were unable to find an architecture that consistently beat the LSTM and the GRU in all experimental conditions.”
  19. “Importantly, adding a bias of size 1 significantly improved the performance of the LSTM on tasks where it fell behind the GRU and MUT1. Thus we recommend adding a bias of 1 to the forget gate of every LSTM in every application”
Advertisements

6 thoughts on “An Empirical Exploration of Recurrent Network Architectures. Jozefowicz, Zaremba, SutskeverAR

  1. | これは素晴らしいですが、これが優れている |こんにちは、私は思います私は信じていません。私はそれをstumbledupon;)私がします 再訪 再びまだ 私が持っているので、、ブックマークそれとして保存。お金と自由は最大である ガイドヘルプ その他あなたが金持ちになるとし続けることができる、変更する方法。
    国内即発 激安全国送料無料 http://www.sentrabusanamuslim.com

  2. あなたのテーマ/デザイン|本当に楽しん愛する | 私は私は サイトをWebサイトのブログ。あなたがこれまでの任意に実行することができませんインターネットブラウザ 問題?私のブログカップルの 読者は Explorerで正常に| 作業動作するが、素晴らしいですねなく、私の不満がクロームで。 問題を修正するのに役立つために何かを持っていますか?
    人気ブランド多数 2015春夏新色追加 http://fotobzik.pl

  3. ブログ、あなたのための著者あなたが探している場合は私に知らせてください。 記事と私は|あなたはいくつかの本当に良い素晴らしいを持っている思う私は良い資産になります。バック鉱山へのリンクと引き換えにあなたのブログのための記事材いくつかを書くことがあなたがこれまで負荷オフのいくつかを取りたい場合は、私が本当にのように絶対的に愛すると思います。興味があれば| 電子メールを電子メール 爆発を送信してください。 感謝を!
    15年秋冬セール セール対象商品 http://restech.co.za

  4. |すべてのの賛成で ポスト ウェブ 視聴者にはです訪問者;私は確信して、それから| 利点利益を彼らは ます。
    セール対象商品 新入荷·数量限定 http://stationenhabo.se

  5. 単に したい欲望あなたの記事のようにあると言う驚くべき。 | 単にあなたのポストで鮮明透明性がある と私はでき |この主題の専門家あなたがしているあなたがあると仮定します。 まあ今後のポストに| 更新日まで保つために| | フィードRSSフィード私はあなたをつかむためにあなたの許可を持つましょう許可。おかげで百万と継続 楽しい仕事。
    春夏物続々送料無料 メール便送料無料 http://www.loopfit.nl

  6. 美少女!素晴らしいポスト | これがしたこれはされています。 供給提供 この情報 。
    メール便送料無料 国内即発 http://www.scitech.org.au

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: