Sharing — Deep Learning, 2015 Nature Review Article

By Yann LeCun, Yoshua Bengio, and Geoffrey Hinton

Sik-Ho Tsang
5 min readJun 15, 2023

In this story, I would like to share a Review Article in 2015 Nature, Deep Learning, by Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. (This article was appeared before they received the 2018 ACM A.M. Turing Award.) This article has been cited for over 65000 citations (Sik-Ho Tsang @ Medium)

[Thinking Fast and Slow] 2017 [AI Reshapes World] 2019 [New Heights with ANN] 2021 [Deep Learning for AI] 2022 [Small is the New Big]
==== My Other Paper Readings Are Also Over Here ====

  • In 2015, the deep learning technologies that introduced in the article are advanced. Today, they become just basic.
  • Nevertheless, it is a very good article to cite when deep learning is mentioned.


  1. Backpropagation
  2. Convolutional Neural Network (CNN)
  3. Distributed Representations (NLP)
  4. Recurrent Neural Net (RNN)
  5. The Future of Deep Learning
  • (I have just share in a very brief way. Please feel free to read the article directly if interested.)

1. Backpropagation


1.1. Conventional Machine Learning

Since the 1960s we have known that linear classifiers can only carve their input space into very simple regions, namely half-spaces separated by a hyperplane. But there are variations in position, orientation or illumination of an object, or variations in the pitch or accent of speech, which needs good features to deal with. Thus, Shallow classifiers require a good feature extractor.

1.2. Dark Age of Deep Learning

  • To make classifiers more powerful, one can use generic non-linear features, as with kernel methods. If good features can be learned automatically using a general-purpose learning procedure. This is the key advantage of deep learning.
  • The backpropagation equation can be applied repeatedly to propagate gradients through all modules, as shown above.
  • But despite its simplicity, the solution was not widely understood until the mid 1980s.

In the late 1990s, neural nets and backpropagation were largely forsaken by the machine-learning community and ignored by the computer-vision and speech-recognition communities. It was widely thought that learning useful, multistage, feature extractors with little prior knowledge was infeasible. In particular, it was commonly thought that simple gradient descent would get trapped in poor local minima.

1.3. Emerging of Deep Learning

  • In 2006, CIFAR is setup.
  • In 2009, with GPU invented and advanced for deep learning, researchers can train networks 10 or 20 times faster.

By 2012, versions of the deep net from 2009 were being developed by many of the major speech groups.

2. Convolutional Neural Network (CNN)

2.1. Image Classification

Convolutional Neural Network (CNN)
  • There have been numerous applications of convolutional networks going back to the early 1990s, starting with time-delay neural networks for speech recognition and document reading.
  • By the late 1990s, the document reading system was reading over 10% of all the cheques in the United States.

A number of ConvNet-based optical character recognition and handwriting recognition systems were later deployed by Microsoft. ConvNets were also experimented with in the early 1990s for object detection in natural images, including faces and hands and for face recognition.

In 2012, Over 1M-image 1000-class ImageNet dataset is constructed.

2.2. Image Captioning

From Image to Text

A recent (Year of 2015) stunning demonstration combines ConvNets and recurrent net modules for the generation of image captions, as above.

  • The performance of ConvNet causes many large companies, such as Google, Facebook, Microsoft, IBM, Yahoo!, Twitter and Adobe, to deploy ConvNet-based image understanding products and services.
  • A number of companies such as NVIDIA, Mobileye, Intel, Qualcomm and Samsung are developing ConvNet chips to enable real-time vision applications.

3. Distributed Representations (NLP)

Word Vectors
  • Each word creates a different pattern of activations, or word vectors, as above.
  • The word vectors are composed of learned features that were not determined ahead of time by experts, but automatically discovered by the neural network. Vector representations of words learned from text are now very widely used in natural language applications.

Neural language models can associate each word with a vector of real valued features, and semantically related words end up close to each other in that vector space.

4. Recurrent Neural Net (RNN)

Recurrent Neural Net (RNN)
  • RNNs, once unfolded in time, can be seen as very deep feedforward networks. However, it is difficult to learn to store information for very long.

One solution is to use the long short-term memory (LSTM) networks, which can remember inputs for a long time. (In 2015, Transformer is not yet invented.)

5. The Future of Deep Learning

  • Unsupervised learning is expected to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object.
  • Human vision is an active process that sequentially samples the optic array in an intelligent, task-specific way using a small, high-resolution fovea with a large, low-resolution surround. We expect much of the future progress in vision to come from systems that are trained end-toend and combine ConvNets with RNNs that use reinforcement learning to decide where to look.
  • In Natural Language Understanding, RNNs which are used for understanding sentences or whole documents will become much better.

(The technologies are skipped here. If interested, please read the paper.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.