[Paper] Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Image Classification)

A Very Famous Regularization Approach to Prevents Co-Adaptation so as to Reduce Overfitting

  • The key idea is to randomly drop units (along with their connections) from the neural network during training.
  • This prevents units from co-adapting too much.


  1. Dropout
  2. Experimental Results

1. Dropout

1.1. General Idea

  • Left: When using the neural network at the left, if there are some neurons which are quite strong, the network will depend those neurons too much making others weak and unreliable.
  • Right: An example of a thinned net produced by applying dropout to the network on the left. Crossed units have been dropped.

1.2. Training & Testing

  • Left: At training time, it has p that it will not be used.
  • Right: At test time, the weights of this network are scaled-down versions of the trained weights.
  • If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in the above figure.

2. Experimental Results

2.1. MNIST

Error Rate on MNIST
  • All dropout nets use p = 0.5 for hidden units and p = 0.8 for input units.
  • The best performing neural networks without dropout obtains 1.6% error rate. With dropout the error reduces to 1.35%.
  • With other kinds of improvement such as using ReLU, more neurons, max-norm constraint, Maxout, 0.94% error rate is achieved.
Test error for different architectures with and without dropout.
  • The same architectures trained with and without dropout have drastically different test errors as seen as by the two separate clusters of trajectories. Dropout gives a huge improvement across all architectures.

2.2. Street View House Numbers (SVHN)

Error Rates on Street View House Numbers
  • The best architecture is LeNet which has three convolutional layers followed by 2 fully connected hidden layers. All hidden units were ReLUs.
  • The best performing convolutional nets that do not use dropout achieve an error rate of 3.95%.
  • Adding dropout only to the fully connected layers reduces the error to 3.02%.
  • Adding dropout to the convolutional layers as well further reduces the error to 2.55%.
  • Even more gains can be obtained by using Maxout units.

2.3. CIFAR

Error rates on CIFAR-10 and CIFAR-100
  • The best Conv Net without dropout obtains an error rate of 14.98% on CIFAR-10.
  • Using dropout in the fully connected layers reduces that to 14.32% and adding dropout in every layer further reduces the error to 12.61%.
  • Error is further reduced to 11.68% by replacing ReLU units with Maxout units.
  • On CIFAR-100, dropout reduces the error from 43.48% to 37.20% which is a huge improvement.

2.4. ImageNet

Top-1 and Top-5 error rates on ImageNet validation and test sets
  • AlexNet based on convolutional nets and dropout won the ILSVRC-2012 competition.
  • While the best methods based on standard vision features achieve a top-5 error rate of about 26%, convolutional nets with dropout achieve a test error of about 16%.



PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store