Review — Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.

Published in

Geek Culture

4 min readSep 5, 2021

**Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top)**

In this paper, Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, (Split-Brain Auto), by Berkeley AI Research (BAIR) Laboratory, University of California, is reviewed. In this paper:

A network is split into two sub-networks, each is trained to perform a difficult task — predicting one subset of the data channels from another.
By forcing the network to solve cross-channel prediction tasks, feature learning is achieved without using any labels.

This is a paper in 2017 CVPR with over 400 citations. (Sik-Ho Tsang @ Medium)

Outline

Split-Brain Autoencoder (Split-Brain Auto)
Experimental Results

1. Split-Brain Autoencoders (Split-Brain Auto)

**Split-Brain Autoencoders applied to various domains**

1.1. Cross-Channel Encoders

First, input data X is divided into X1 and X2.
Then, X1 goes through network F1 to predict X2:

By performing this pretext task of predicting X2 from X1, we hope to achieve a representation F(X1) which contains high-level abstractions or semantics.

Similar for F2 that X2 goes through network F2 to predict X1.
Left: For Lab color space, X1 can be L, which is luminance information, and X2 can be ab, which are color information.
Right: For RGB-D image, X1 can be RGB values, and X2 can be D, which is depth information.
l2 loss can be used to train the regression loss:

It is found that the cross-entropy loss is more effective than l2 loss for the graphics task of automatic colorization than regression:

1.2. Split-Brain Autoencoders as Aggregated Cross-Channel Encoders

Multiple cross-channel encoders, F1, F2, on opposite prediction problems, with loss functions L1, L2, respectively:

Example split-brain autoencoders in the image and RGB-D domains are shown in the above figure (a) and (b), respectively.

By concatenating the representations layer-wise, Fl = {Fl1, Fl2}, a representation F is achieved which is pretrained on full input tensor X.
If F is a CNN of a desired fixed size, e.g., AlexNet, we can design the subnetworks F1, F2 by splitting each layer of the network F in half, along the channel dimension.

The network is modified to be fully convolutionally and trained for a pixel-prediction task.

1.3. Alternative Aggregation Technique

One alternative, as a baseline: The same representation F can be trained to perform both mappings simultaneously:

Or even considering the full input tensor X.

However, it is found that the proposed Split-Brain Auto (Section 1.2) outperforms the above two alternatives (Section 1.3).

2. Experimental Results

2.1. ImageNet

**Task Generalization on ImageNet Classification**

The proposed split-brain autoencoder architecture learns the unsupervised representations on large-scale image data from ImageNet.
Lab color space is used to train the split-brain autoencoder.
All weights are frozen and feature maps spatially resized to be ∼9000 dimensions.
All methods use AlexNet variants.
The 1.3M ImageNet dataset without labels is used for training, except for ImageNet-labels.
To be brief, different autoencoder variants are tried.

Split-Brain Auto (cl, cl), cl means using classification loss, outperforms all variants and all self-supervised learning approaches such as Jigsaw Puzzles [30], Context Prediction [7], Ali [8]/BiGAN, Context Encoders [34] and Colorization [47].

2.2. Places

**Dataset & Task Generalization on Places Classification**

A different task (Places) than the pretraining tasks (ImageNet).

Similar results are obtained for Places Classification, it outperforms such as Jigsaw Puzzles [30], Context Prediction [7], L³-Net [45], Context Encoders [34] and Colorization [47].

2.3. PASCAL VOC

**Task and Dataset Generalization on PASCAL VOC**

To further test generalization, classification, detection and segmentation performance is evaluated on PASCAL VOC.

The proposed method, Split-Brain Auto (cl, cl), achieves state-of-the-art performance on almost all established self-supervision benchmarks.

There are still other results in the paper. If interested, please feel free to read the paper. Hope I can write a story about Jigsaw Puzzles in the coming future.

Reference

[2017 CVPR] [Split-Brain Auto]
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Self-Supervised Learning

2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] 2016 [Context Encoders] 2017 [L³-Net] [Split-Brain Auto]