Brief Review — Rectified Linear Units Improve Restricted Boltzmann Machines

Rectified Linear Unit (ReLU) Introduced

3 min readAug 9, 2022

Rectified Linear Units Improve Restricted Boltzmann Machines
ReLU, by University of Toronto
2010 ICML, Over 17000 Citations (Sik-Ho Tsang @ Medium)
Activation Function, Restricted Boltzmann Machine, Image Classification, Face Recognition

Rectified Linear Unit (ReLU) is introduced, which outperforms Sigmoid.
This is a paper from Hinton’s research group.

Outline

Rectified Linear Unit (ReLU)
Image Classification Results
Face Recognition Results

1. Rectified Linear Unit (ReLU)

More precisely, Noisy ReLU is proposed to replace the logistic sigmoid function needs to be used many times to get the probabilities required for sampling an integer value correctly:

where N(0, V) is Gaussian noise with zero mean and variance V.

2. Image Classification Results

**Network architecture used for the Jittered-Cluttered NORB classification task**

Two hidden layers of NReLUs as RBMs, are greedily pretrained. (For RBM, please read Autoencoder.)
The class label is represented as a K-dimensional binary vector with 1-of-K activation, where K is the number of classes.
The classifier computes the probability of the K classes from the second layer hidden activities h2 using the softmax function.

**Test error rates for classifiers with 4000 hidden units trained on 32×32×2 Jittered-Cluttered NORB images**

Pre-training helps improve the performance of both unit types.
But NReLUs without pre-training are better than binary units with pre-training.

**Test error rates for classifiers with two hidden layers (4000 units in the first, 2000 in the second), trained on 32×32×2 Jittered-Cluttered NORB images**

Pre-training both layers gives further improvement for NReLUs but not for binary units.

3. Face Recognition Results

**Siamese network used for the Labeled Faces in the Wild task**

The feature extractor FW contains one hidden layer of NReLUs pre-trained as an RBM. (For RBM, please read Autoencoder.)
Cosine distance is used to check whether the faces are the same.

**Accuracy on the LFW task for various models trained on 32×32 colour images**

Models using NReLUs are more accurate.

This paper and AlexNet are often cited when ReLU is used. Classic!

Reference

[2010 ICML] [ReLU]
Rectified Linear Units Improve Restricted Boltzmann Machines

Image Classification

1989 … 2010 [ReLU] … 2022 [ConvNeXt] [PVTv2]

Face Recognition

2005 [Chopra CVPR’05] 2010 [ReLU] 2014 [DeepFace] [DeepID2] [CASIANet] 2015 [FaceNet] 2016 [N-pair-mc Loss]

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Deep Learning

Artificial Intelligence

Rbm

Relu

Image Classification

Written by Sik-Ho Tsang

27K Followers

71 Following

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Cristian Leo

The Math Behind Transformers

Deep Dive into the Transformer Architecture, the key element of LLMs. Let’s explore its math, and build it from scratch in Python.

Jul 25, 2024

Mastering GPU Memory Management With PyTorch and CUDA

Level Up Coding

Sahib Dhanjal

Mastering GPU Memory Management With PyTorch and CUDA

A gentle introduction to memory management using PyTorch’s CUDA Caching Allocator

Mar 25

Understanding Deep Learning Optimizers: Momentum, AdaGrad, RMSProp & Adam

TDS Archive

Vyacheslav Efimov

Understanding Deep Learning Optimizers: Momentum, AdaGrad, RMSProp & Adam

Gain intuition behind acceleration training techniques in neural networks

Dec 30, 2023

Brief Review — EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

Sik-Ho Tsang

Brief Review — EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

Going Faster with Vision Transformers

Oct 29, 2024

The Deep Hub

Jorgecardete

Rust — A New Titan in Data Science

Revolutionizing Machine Learning with high-performance computation

Feb 20, 2024

Anh Tuan

Introduction to Pruning

In this post, I’ll introduce an overview of neural network pruning, one method for reducing the size of deep learning models.

Oct 13, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech

Brief Review — Rectified Linear Units Improve Restricted Boltzmann Machines

Rectified Linear Unit (ReLU) Introduced

Outline

1. Rectified Linear Unit (ReLU)

2. Image Classification Results

3. Face Recognition Results

Reference

Image Classification

Face Recognition

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sik-Ho Tsang

No responses yet

More from Sik-Ho Tsang

Review — YOLOv12: Attention-Centric Real-Time Object Detectors

YOLOv12, Outperforms YOLOv11, YOLOv10, YOLOv9, RT-DETR

Brief Review: YOLOv5 for Object Detection

Brief Explanation of YOLOv5, It Outperforms EfficientDet

Review: DeepLabv3+ — Atrous Separable Convolution (Semantic Segmentation)

Outperforms LC, ResNet-DUC-HDC, GCN, RefineNet, ResNet-38, PSPNet, IDW-CNN, SDN, DIS, and DeepLabv3

Review — Pre-LN Transformer: On Layer Normalization in the Transformer Architecture

Pre-LN Transformer, Warm-Up Stage is Skipped

Recommended from Medium

The Math Behind Transformers

Deep Dive into the Transformer Architecture, the key element of LLMs. Let’s explore its math, and build it from scratch in Python.

Mastering GPU Memory Management With PyTorch and CUDA

A gentle introduction to memory management using PyTorch’s CUDA Caching Allocator

Understanding Deep Learning Optimizers: Momentum, AdaGrad, RMSProp & Adam

Gain intuition behind acceleration training techniques in neural networks

Brief Review — EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

Going Faster with Vision Transformers

Rust — A New Titan in Data Science

Revolutionizing Machine Learning with high-performance computation

Introduction to Pruning

In this post, I’ll introduce an overview of neural network pruning, one method for reducing the size of deep learning models.