  1. Leaky ReLU
  2. Results

1. Leaky ReLU

1.1. Tanh

  • The hyperbolic tangent (tanh) is as below:
  • where σ() is the tanh function, w(i) is the weight vector for the i-th hidden unit, and x is the input.

1.2. ReLU

  • Rectified Linear Unit (ReLU) is as shown above and equated below:
  • When the output is above 0, its partial derivative is 1. Thus vanishing gradients do not exist.

1.3. Leaky ReLU

  • Leaky ReLU allows for a small, non-zero gradient when the unit is saturated and not active:

2. Results

Results for DNN systems in terms of frame-wise error metrics on the development set as well as word error rates (%) on the Hub5 2000 evaluation sets.
  • LVCSR experiments are performed on the 300 hour Switchboard conversational telephone speech corpus (LDC97S62).
  • DNNs with 2, 3, and 4 hidden layers are trained for all nonlinearity types.
  • The output layer is a standard softmax classifier, and cross entropy with no regularization serves as the loss function.

Leaky ReLU later is used in many other domains.


[2013 ICML] [Leaky ReLU]
Rectifier Nonlinearities Improve Neural Network Acoustic Models

