# Review — Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

## Pseudo Labels for **Unlabeled Data**

Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural NetworksPseudo-Label (PL), by Nangman Computing2013 ICLRW, Over 1500 Citations(Sik-Ho Tsang @ Medium)

Semi-Supervised Learning, Pseudo Label, Image Classification

**Unlabeled data is labelled by supervised-learnt network**, which is so called**pseudo labeling**.- Network is then trained using both labeled data and pseudo-labeled data.

# Outline

**Pseudo-Label (PL)****Experimental Results**

# 1. Pseudo-Label (PL)

- Pseudo-Label are target classes for unlabeled data as if they were true labels. The class, which has maximum predicted probability predicted using a network for each unlabeled sample, is picked up:

**Pseudo-Label**is used in a**fine-tuning phase**with Dropout. The pre-trained network is**trained in a supervised fashion with labeled and unlabeled data simultaneously:**

- where
is the*n***number of samples in labeled data**for SGD,is the*n*’**number of samples in unlabeled data**;*C*is the number of classes; *fmi*is the output for labeled data,*ymi*is the corresponding label;*f’mi*for unlabeled data,*y’mi*is the corresponding pseudo-label;is a coefficient balancing them at epoch*α*(*t*)*t*. If*α*(*t*) is**too high**, it**disturbs training**even for labeled data. Whereas if*α*(*t*) is**too small**, we**cannot use benefit from unlabeled data**.is*α*(*t*)**slowly increased**, to help the optimization process to**avoid poor local minima**:

# 2. Experimental Results

## 2.1. t-SNE Visualization

**MNIST**dataset is used.- The neural network was trained with
**600 labeled data**and**with or without 60000 unlabeled data and Pseudo-Labels.** - The neural network has 1 hidden layer. ReLU is used for hidden unit, Sigmoid Unit is used for output unit. The number of hidden units is 5000.

Though the train error is zero in the two cases,

the network outputs of test data ismore condensednear 1-of-Kcode by training with unlabeled data and Pseudo-Labels.

## 2.2. Entropy

**DropNN**: Trained without unlabelled data. (Drop means Dropout.)**+PL**: Trained with unlabelled data.

Though the entropy of labeled data is near zero in the two cases,

the entropy of unlabeled data get lowerby Pseudo-Label training, in addition, the entropy of test data get loweralong with that.

## 2.3. Error Rate

- The size of the
**labeled training set**is reduced to**100, 600, 1000 and 3000.**For validation set, 1000 labeled examples are picked up separately. **10 experiments on random split**were done using the identical network and parameters. In the case of**100 labeled data**, the results heavily depended on data split so that**30 experiments**were done.

The proposed method outperforms the conventional methods for small labeled datain spite of simplicity. The training scheme isless complexthan Manifold Tangent Classifier and doesn’t use computationally expensive similarity matrix between samples.

## Reference

[2013 ICLRW] [Pseudo-Label (PL)]

Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks

## Weakly/Semi-Supervised Learning

**2013** [Pseudo-Label (PL)] **2017** [Mean Teacher] **2018 **[WSL] **2019 **[Billion-Scale]