Brief Review — Rectifier Nonlinearities Improve Neural Network Acoustic Models
- Leaky ReLU, with small negative values as output when input is smaller than 0.
- This is a paper from Andrew Ng research group.
- Leaky ReLU
1. Leaky ReLU
- The hyperbolic tangent (tanh) is as below:
- where σ() is the tanh function, w(i) is the weight vector for the i-th hidden unit, and x is the input.
However, Tanh can suffer from the vanishing gradient problem.
- Rectified Linear Unit (ReLU) is as shown above and equated below:
- When the output is above 0, its partial derivative is 1. Thus vanishing gradients do not exist.
However, we might expect learning to be slow whenever the unit is not active.
1.3. Leaky ReLU
- Leaky ReLU allows for a small, non-zero gradient when the unit is saturated and not active:
Thus, we might expect the learning is faster.
- LVCSR experiments are performed on the 300 hour Switchboard conversational telephone speech corpus (LDC97S62).
- DNNs with 2, 3, and 4 hidden layers are trained for all nonlinearity types.
- The output layer is a standard softmax classifier, and cross entropy with no regularization serves as the loss function.
Leaky ReLU later is used in many other domains.
[2013 ICML] [Leaky ReLU]
Rectifier Nonlinearities Improve Neural Network Acoustic Models
2.1. Language Model / Sequence Model
(Some are not related to NLP, but I just group them here)