Review — A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Fine-Grained Classification for Semi-Supervised Learning Realistic Evaluation

Sik-Ho Tsang
5 min readJun 24, 2022
Accuracy of semi-supervised learning (SSL) algorithms on the Semi-Aves and Semi-Fungi datasets using (i) different pre-trained models, and (ii) in-class (Uin) and out-of-class (Uin+Uout) unlabeled data.

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification, Su CVPR’21, by University of Massachusetts Amherst
2021 CVPR, over 10 Citations (Sik-Ho Tsang @ Medium)
Semi-Supervised Learning, Image Classification, Fine-Grained Image Classification, Pseudo Label

  • Class distribution can be highly unbalanced or even unknown, and the unlabeled data may contain novel classes. How effective is SSL in these situations?
  • Is out-of-domain data beneficial when experts are available?
  • In this paper, rather than proposing a novel approach, a realistic benchmark is applied onto several semi-supervised learning (SSL) methods using fine-grained classification datasets where the datasets exhibit considerable class imbalance and contains images from novel classes.


  1. Realistic Datasets Using Fine-Grained Classification Datasets
  2. Semi-Supervised Learning Approaches for Evaluations
  3. Experimental Results & Analysis

1. Realistic Datasets Using Fine-Grained Classification Datasets

The proposed benchmark for semi-supervised learning
  • Two fine-grained classification datasets are used. They are obtained by sampling classes under the Aves (birds) and Fungi taxonomy. The out-of-class images are other Aves (or Fungi) images not belonging to the classes within the labeled set.
  • They are datasets formed from FGVC7 workshop and FGVC Fungi Challenge, and here called Semi-Aves and Semi-Fungi.
  • Each represents a 200-way classification task and the training set contains:
  1. labeled images from these classes Lin.
  2. unlabeled images from these classes Uin, and
  3. unlabeled images from related classes Uout. (out-of-class)
  • Moreover, the classes exhibit a long-tailed distribution with an imbalance ratio of 8 to 10.
A comparison of Semi-Aves and Semi-Fungi datasets with existing SSL benchmarks

Compared with other datasets such as CIFAR and SVHN, Semi-Aves and Semi-Fungi present a challenge due to the large number of classes, presence of novel images in the unlabeled set, long-tailed distribution of classes as indicated by the class imbalance ratio.

2. Semi-Supervised Learning Approaches for Evaluations

  1. Pseudo-Label trains a model using labeled data and assigns labels onto unlabeled data.
  2. Curriculum Pseudo-Label is similar to Pseudo-Label, but with iterative training process to re-train the model from scratch for every iteration.
  3. Self-training using Distillation, means the teacher model is trained using labeled data, and the student model is trained by the teacher using both labeled and unlabeled data.
  4. FixMatch combines Pseudo-Labeling and consistency regularization.
  5. MoCo learns the image representation without using labels, which is a self-supervised learning approach, not a semi-supervised learning approach.
  6. MoCo+Self-Training is also considered where the teacher is first pretrained using MoCo.
  • (Please feel free to read their stories if interested.)

3. Experimental Results & Analysis

  • ResNet-50 with 224×224 images are used for all experiments. Hyperparameters of all models are tuned individually for each approach.
  • For transfer learning, pre-trained models on ImageNet and iNaturalist 2018 (iNat), are used.
Results on Semi-Aves benchmark
Results on Semi-Fungi benchmark
  • The above two tables show the accuracy for two datasets. To better visualize the results, the relative gain of each SSL method, i.e. the differences between supervised baseline in raw accuracy, is shown below:
Relative gains of SSL methods on Semi-Aves and Semi-Fungi

3.1. Training From Scratch Using only Uin

  • Comparing to supervised baseline, Curriculum Pseudo-Label does not give improvements and Pseudo-Label even underperforms the baseline. This is possibly due to the low initial accuracy of the model which gets amplified during pseudo labeling.
  • FixMatch and Self-Training both result in improvements.
  • Self-supervised learning (MoCo) gives a good initialization and the improvements are similar or even more than using FixMatch.
  • Finally, Self-Training using MoCo pre-trained model as the teacher model results in a further 2–3% improvement.

3.2. Using Expert (Pretrained) Models Using only Uin

  • ImageNet or iNat pre-trained model for transfer learning with Uin only, are considered.
  • Most of the SSL methods, as well as MoCo pre-training, provide improvements over the baselines.
  • The only exception is Pseudo-Label on Semi-Fungi. Among SSL methods, FixMatch and MoCo+Self-Training perform the best.

3.3. Effect of out-of-class unlabeled data (Uin+Uout)

  • When considering having Uin+Uout with expert models, it is found that the performance often drops in the presence of Uout.
  • Curriculum Pseudo-Label and Self-Training are more robust and yield less than 1% decrease in most cases.
  • FixMatch is less robust whose performance drops by around 6%.
  • The performances of MoCo also drops around 1–3% and are sometimes worse than the supervised baseline.
  • Adding Self-Training however provides a 1–3% boost in performance.

Overall, Self-Training from either a supervised or a self-supervised model is the most robust one.

3.4. Analysis

Predictions of unlabeled data using a supervised model: maximum probability of the class predictions (left), entropy of the predictions (middle), and the distillation loss between the teacher and student model before the training starts (right)
  • Overall, the model is generally more uncertain about the out-of-class data, which often has a higher entropy or a smaller maximum probability.
  • The distillation loss on Uin is also often higher than that of Uout, suggesting the model focuses more on those from Uin during training.
  • However, there is still a good amount of data from Uout having a high maximum probability, which has a negative impact for pseudo-label methods.

Authors hope that by proposing the benchmarks and results, this paper can lead to some new innovations in semi-supervised learning.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.