Brief Review — Rectifier Nonlinearities Improve Neural Network Acoustic Models

Leaky ReLU, Converge Slightly Faster Than ReLU

3 min readNov 15, 2022

--

Rectifier Nonlinearities Improve Neural Network Acoustic Models,
Leaky ReLU, by Stanford University,
2013 ICML, Over 6000 Citations (Sik-Ho Tsang @ Medium)
Acoustic Model, Activation Function, ReLU

Leaky ReLU, with small negative values as output when input is smaller than 0.
This is a paper from Andrew Ng research group.

Outline

Leaky ReLU
Results

1. Leaky ReLU

1.1. Tanh

**Tanh** (Figure from https://blog.csdn.net/qq_29831163/article/details/89887655)

The hyperbolic tangent (tanh) is as below:

where σ() is the tanh function, w(i) is the weight vector for the i-th hidden unit, and x is the input.

However, Tanh can suffer from the vanishing gradient problem.

1.2. ReLU

**ReLU** (Figure from https://blog.csdn.net/qq_29831163/article/details/89887655)

Rectified Linear Unit (ReLU) is as shown above and equated below:

When the output is above 0, its partial derivative is 1. Thus vanishing gradients do not exist.

However, we might expect learning to be slow whenever the unit is not active.

1.3. Leaky ReLU

**Leaky ReLU** (Figure Modified from https://blog.csdn.net/qq_29831163/article/details/89887655)

Leaky ReLU allows for a small, non-zero gradient when the unit is saturated and not active:

Thus, we might expect the learning is faster.

2. Results

**Results for DNN systems in terms of frame-wise error metrics on the development set as well as word error rates (%) on the Hub5 2000 evaluation sets.**

LVCSR experiments are performed on the 300 hour Switchboard conversational telephone speech corpus (LDC97S62).
DNNs with 2, 3, and 4 hidden layers are trained for all nonlinearity types.
The output layer is a standard softmax classifier, and cross entropy with no regularization serves as the loss function.

DNNs with ReLU and Leaky ReLU produce 2% absolute reductions in word error rates over Tanh ones.
Both the ReLU and Leaky ReLU networks perform similarly. During training, it is observed Leaky ReLU DNNs converge slightly faster.

Leaky ReLU later is used in many other domains.

Reference

[2013 ICML] [Leaky ReLU]
Rectifier Nonlinearities Improve Neural Network Acoustic Models

2.1. Language Model / Sequence Model

(Some are not related to NLP, but I just group them here)

1991 … 2013 [Leaky ReLU] … 2020 [ALBERT] [GPT-3] [T5] [Pre-LN Transformer] [MobileBERT] [TinyBERT] [BART] [Longformer] [ELECTRA] [Megatron-LM]

Brief Review — Rectifier Nonlinearities Improve Neural Network Acoustic Models

Leaky ReLU, Converge Slightly Faster Than ReLU

Outline

1. Leaky ReLU

1.1. Tanh

1.2. ReLU

1.3. Leaky ReLU

2. Results

Reference

2.1. Language Model / Sequence Model

My Other Previous Paper Readings

Written by Sik-Ho Tsang