# Review: Semi-Supervised Learning with Ladder Networks

## Ladder Network, Γ-Model: Minimize Cost of Latent Features

--

Semi-Supervised Learning with Ladder NetworksLadder Network, Γ-Model, by The Curious AI Company, Nokia Labs, and Aalto University,2015 NIPS, Over 1200 Citations(Sik-Ho Tsang @ Medium)

Semi-Supervised Learning, Image Classification

- The proposed model is trained to
**simultaneously minimize the sum of supervised and unsupervised cost functions, built on the top of the Ladder Network.**

# Outline

**Minimizing Deep Features****Ladder Network & Γ-Model****Experimental Results**

**1. Minimizing Deep Features**

## 1.1. Denoising Autoencoder

- In
**Denoising Autoencoder**, noise is added into the clean input*x*, to become ~*x*.Then,*~x*is input to autoencoder.**the autoencoder is trying to reconstruct ^***x*which is as close as*x*. - By doing so, the deep latent feature at the middle has rich feature information which can be used for fine-tuning on other datasets.
- To train the Denoising Autoencoder, the cost is to
**minimize the reconstructed output ^***x*and clean input*x*:

- However, there is no cost function to minimize the difference of the latent feature at the middle.

## 1.2. Minimizing Deep Feature Difference

- One way is to
**directly minimize the deep feature difference of**The cost function is identical to that used in a Denoising Autoencoder except that latent variables*z*.*z*replace the observations*x*:

# 2. Ladder Network **& Γ-Model**

- (Here, only the conceptual idea is presented.)

## 2.1. Ladder Network

is the*x***image**input andis the output which can be the*y***label**.

The

clean pathis the standardsupervised learningpath.The

corrupted pathis the path that thenoise is added every layerto corrupt the feature signals. And it try to predict the label y as well.The

denoising pathis toreconstruct, with the help of features at the corrupted path.xEvery layer contributes to the cost function:

- Since the cost function needs both the clean
*z*(*l*) and corrupted ˜*z*(*l*), during training,**the encoder is run twice:**a clean pass for*z*(*l*) and a corrupted pass for ˜*z*(*l*). is a*g***denoising function**with the inputs, from the previous layer and also from the corresponding layer at the corrupted path.

## 2.2. Γ-Model

- Γ-Model is the
**simple special case of the Ladder Network**. - This corresponds to a
**denoising cost only on the top layer**and means that most of the decoder can be omitted.

**3. Experimental Results**

## 3.1. Fully Connected MLP on MNIST

- The
**baseline MLP**model is**784–1000–500–250–250–250–10**.

The proposed method

outperforms all the previously reported results., e.g.:Pseudo-Label (PL).

- Encouraged by the good results, we also tested with
and got a*N*=50 labels**test error of 1.62 %**.

The simple

Γ-Model also performed surprisingly well, particularly forlabels.N=1000

- With
*N*=100 labels, all models sometimes failed to converge properly.

## 3.2. CNN on MNIST

- 2 models, Conv-FC and Conv-Small using Γ-Model, are used.

More convolutions improve the Γ-Model significantlyalthough the variance is still high. TheLadder networkwith denoising targets on every levelconverges much more reliably.

## 3.3. CNN on CIFAR-10

With

Conv-Large using Γ-Model,about3% further of error rate is reducedwhen usingN=4000 labels.

## References

[2015 NIPS] [Ladder Network, Γ-Model]

Semi-Supervised Learning with Ladder Networks

[Korean Language Presentation]

## Pretraining or Weakly/Semi-Supervised Learning

**2013** [Pseudo-Label (PL)] **2015** [Ladder Network, Γ-Model] **2016 **[Sajjadi NIPS’16] **2017** [Mean Teacher] **2018 **[WSL] **2019 **[Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] **2020 **[BiT] [Noisy Student] [SimCLRv2]