# Review — Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

## Curriculum Labeling (CL), By **Restarting Model Parameters** Before Each Self-Training Cycle, Outperforms **Pseudo Labeling (PL)**

--

Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning, Curriculum Labeling, by University of Virginia2021 AAAI, Over 40 Citations(Sik-Ho Tsang @ Medium)

Semi-Supervised Learning, Image Classification, Pseudo Label

**Pseudo Labeling (PL)****applying pseudo-labels to samples in the unlabeled set**for model training in a self-training cycle.- In
**Curriculum Labeling (CL)**,**curriculum learning principle**is applied and**concept drift is avoided by restarting model parameters**before each self-training cycle.

# Outline

**Pseudo Labeling (PL)****Brief Review****Proposed Curriculum Labeling (CL)****Experimental Results****Ablation Study**

**1. ****Pseudo Labeling (PL)**** Brief Review**

**Pseudo-Label**are**target classes for unlabeled data**as if they were true labels. The class, which has**maximum predicted probability predicted using a network**for each unlabeled sample, is**picked up**.**Pseudo-Label**is used in a**fine-tuning phase**with Dropout. The pre-trained network is**trained in a supervised fashion with labeled and unlabeled data simultaneously.**- (Please feel free to read Pseudo Label story if interested.)

# 2. **Proposed Curriculum Labeling (CL)**

## 2.1. Framework

- The model is
**trained**on the**labeled samples**. - Then this model is used to
**predict**and assign pseudo-labels for the**unlabeled**samples. - The distribution of the prediction scores is used to
**select a subset of pseudo-labeled samples.** - A new model is
**re-trained**with the**labeled and pseudo-labeled****samples**. - This process is
**repeated**by re-labeling unlabeled samples using this new model. The process**stops when all samples in the dataset have been used during training.**

## 2.2. Details

- To be specific,
**percentile scores**are used to decide which samples to add. The above algorithm shows the full pipeline of our model, where**Percentile(**. values of*X*,*Tr*) returns the value of the*r*-th percentileare used in*r*from 0% to 100%**increments of 20**. - The repeating process is
**terminated**when pseudo-labeled set comprise the entire training data samples ().*r*=100%

## 2.3. Loss

- The data consists of
and*N*labeled examples (*Xi*,*Yi*). Let*M*unlabeled examples*Xj*be*H***a set of hypotheses**where*h**θ**hθ*∈*H*, and each of them denotes**a function mapping**.*X*to*Y* - Let
*Lθ*(*Xi*)**loss for a given example**To choose the best predictor with the lowest possible error, the formulation can be explained with a regularized Empirical Risk Minimization (ERM) framework.*Xi*. - Below,
is defined as the*L*(*θ*)**pseudo-labeling regularized empirical loss**as:

- where
**CEE**indicates**cross entropy**.

**3. Experimental Results**

## 3.1. SOTA Comparison

- CNN-13 in All-CNN and WideResNet-28 in WRN (depth 28, width 2) are used for CIFAR-10 and SVHN.
- Data augmentation is the transformations in an entirely random fashion, which is referred as Random Augmentation (RA).

CL surprisingly

surpasses previouspseudo-labelingbased methods and consistency regularization methodsonCIFAR-10.

**On SVHN, CL obtains competitive test error**when compared with all previous methods that rely on moderate augmentation, moderate-to-high data augmentation, and heavy data augmentation.

- A common practice to test SSL algorithms, is to
**vary the size of the labeled data using 50, 100 and 200 samples per class.**

CL does not drastically degradewhen dealingwith smaller labeled sets.

- ResNet-50 is used on ImageNet.
**10%/90% of the training split**as**labeled/unlabeled**data are used.

On ImageNet,

CL achieves competitive resultswith the state-of-the-art with scores very close to the current top performing method, UDA, on both top-1 and top-5 accuracies.

## 3.2. Realistic Evaluation with Out-of-Distribution Unlabeled Samples

- In a more realistic SSL setting of Oliver NeurIPS’18,
**the unlabeled data may not share the same class set as the labeled data.** - The experiment is reproduced by
**synthetically varying the class overlap on CIFAR-10**, choosing only the animal classes to perform the classification (bird, cat, deer, dog, frog, horse).

CL is robust to out-of-distribution classes, while the performance of previous methods drops significantly.It is conjectured thatthe proposed self-pacing curriculum is key to this scenario, where the adaptive thresholding scheme could help filter the out-of-distribution unlabeled samples during training.

# 4. Ablation Study

## 4.1. Effectiveness of Curriculum Labeling

- Different data augmentations, i.e. mixup and SWA, are used when applying vanilla pseudo-labeling with no curriculum, and without a specific threshold (i.e. 0.0).

Only when

heavy data augmentationis used forPseudo Labeling, the approach isable to match the proposed curriculum design without any data augmentation.

- Fixed thresholds, which are used for including pseudo-labelled unlabeled data, used in Pseudo Labeling (PL) are tried.

The proposed curriculum designis able to yielda significant gain over the traditionalpseudo-labelingapproach that uses a fixed thresholdeven when heavy data augmentation is applied.

- Only the most confident samples are re-labeled in CL. The confident thresholds to 0.9 and 0.9995.

As seen,

using handpicked thresholdsissub-optimal.

## 4.2. Effectiveness of Reinitializing vs Finetuning

**Reinitializing**the model yields**at least 1% improvement**and does not add a significant overhead to the proposed self-paced approach.

As shown above,

reinitializing the model, as opposed to finetuning, indeedimproves the accuracy significantly, demonstrating an alternative and perhaps simpler solution to alleviate the issue of confirmation bias.

## Reference

[2021 AAAI] [Curriculum Labeling (CL)]

Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

## Pretraining or Weakly/Semi-Supervised Learning

**2004 … 2019** [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] **2020 **[BiT] [Noisy Student] [SimCLRv2] [UDA] [ReMixMatch] [FixMatch] **2021 **[Curriculum Labeling (CL)]