Deep Learning. LeCun, Bengio, Hinton. Nature 2015

  1. “Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. “
  2. Previous machine learning methods traditionally relied on significant hand-engineering to process data into something the real learning algorithm could use
  3. “Deep-learning methods are representation-learning methods with multiple levels of representation, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level. With the composition of enough such transformations, very complex functions can be learned. For classification tasks, higher layers of representation amplify aspects of the input that are important for discrimination and suppress irrelevant variations.”
  4. Has allowed for breakthroughs in many different areas
  5. We think that deep learning will have many more successes in the near future because it requires very little engineering by hand, so it can easily take advantage of increases in the amount of available computation and data.
    1. <really? they have a very different definition of very little than i do>
  6. “In practice, most practitioners use a procedure called stochastic gradient descent (SGD).”
  7. Visual classifiers have to learn to be invariant to many things like background, shading, contrast, orientation, zoom, but also have to be very sensitive to other things (for example, learning to distinguish a german shepard from a wolf)
  8. “As long as the modules are relatively smooth functions of their inputs and of their internal weights, one can compute gradients using the backpropagation procedure. ”
    1. Which is just an application of the chain rule for derivatives
  9. ReLUs are best for deep networks, can help remove the need for pre training
  10. Theoretical results as to why NNs rarely get stuck in local minima (especially large networks)
  11. Deep NN work started in 2006, when pretraining was done by having each layer model the activity of the layer below
  12. 1st major application of deep nets was speech recognition in 09, by 12 it was doing speech recognition on Android
  13. For small datasets, unsupervised pretraining is helpful
  14. Convnets for vision
  15. “There are four key ideas behind ConvNets that take advantage of the properties of natural signals: local connections, shared weights, pooling and the use of many layers.”
  16. “Recent ConvNet architectures have 10 to 20 layers of ReLUs, hundreds of millions of weights, and billions of connections between units. Whereas training such large networks could have taken weeks only two years ago, progress in hardware, software and algorithm parallelization have reduced training times to a few hours.”
  17. “The issue of representation lies at the heart of the debate between the logic-inspired and the neural-network-inspired paradigms for cognition. In the logic-inspired paradigm, an instance of a symbol is something for which the only property is that it is either identical or non-identical to other symbol instances. It has no internal structure that is relevant to its use; and to reason with symbols, they must be bound to the variables in judiciously chosen rules of inference. By contrast, neural networks just use big activity vectors, big weight matrices and scalar non-linearities to perform the type of fast ‘intuitive’ inference that underpins effortless commonsense reasoning.”
  18. Machine translation and rnns
  19. Regular RNNs don’t work so well, LSTM fixes major problems
  20. “Over the past year, several authors have made different proposals to augment RNNs with a memory module. Proposals include the Neural Turing Machine in which the network is augmented by a ‘tape-like’ memory that the RNN can choose to read from or write to88, and memory networks, in which a regular network is augmented by a kind of associative memory89. Memory networks have yielded excellent performance on standard question-answering benchmarks. The memory is used to remember the story about which the network is later asked to answer questions.Beyond simple memorization, neural Turing machines and memory networks are being used for tasks that would normally require reasoning and symbol manipulation. Neural Turing machines can be taught ‘algorithms’. Among other things, they can learn to output a sorted list of symbols when their input consists of an unsorted sequence in which each symbol is accompanied by a real value that indicates its priority in the list88. Memory networks can be trained to keep track of the state of the world in a setting similar to a text adventure game and after reading a story, they can answer questions that require complex inference90. In one test example, the network is shown a 15-sentence version of the The Lord of the Ringsand correctly answers questions such as “where is Frodo now?”89.”
  21. Although the focus now is mainly  on supervised learning, expect that unsupervised learning will become most important in the long term
  22. “Systems combining deep learning and reinforcement learning are in their infancy, but they already outperform passive vision systems99 at classification tasks and produce impressive results in learning to play many different video games100.”
  23. “Natural language understanding is another area in which deep learning is poised to make a large impact over the next few years. We expect systems that use RNNs to understand sentences or whole documents will become much better when they learn strategies for selectively attending to one part at a time.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: