Review — Learning classification with Unlabeled Data

It Might Be One of the Earliest Self-Supervised Learning Papers

Sik-Ho Tsang
3 min readFeb 1, 2022
Self-Supervised Learning

Learning classification with Unlabeled Data
de Sa NIPS’93, University of Rochester
1993 NIPS, Over 200 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Unsupervised Learning, Multimodal

  • This paper presents a new measure for piecewise-linear classifiers receiving unlabeled patterns from two or more sensory modalities.
  • Minimizing the new measure is an approximation to minimizing the number of misclassifications directly.


  1. Piecewise-Linear Classifier
  2. Self-Supervised Piecewise-Linear Classifier
  3. Experimental Results

1. Piecewise-Linear Classifier

A piecewise-linear classifier in a 2-Dimensional input space
  • Piecewise-linear classifier is shown above, which consists of a collection of (labeled) codebook vectors in the space of the input patterns.
  • The circles represent data samples from two classes (filled (A) and not filled (B)). The X’s represent codebook vectors (They are labeled according to their class A and B).
  • Future patterns are classified according to the label of the closest codebook vector.

2. Self-Supervised Piecewise-Linear Classifier

2.1. Cow and “Moo”

Self-Supervised Learning
  • For example, hearing “mooing” and seeing cows tend to occur together.
  • So, although the sight of a cow does not come with an internal homuncular “cow” label it does co-occur with an instance of a “moo”.

The key is to process the “moo” sound to obtain a self-supervised label for the network processing the visual image of the cow and vice-versa.

2.2. Self-Supervised Piecewise-Linear Classifier

Network for Learning the Labels of the Codebook Vectors
  • One way to make use of the cross-modality structure is to derive labels for the codebook vectors. The labels can be learnt with a competitive learning algorithm using a network.
  • If modality 1 experiences a sensation from its pattern A distribution, modality 2 experiences a sensation from its own pattern A distribution.
  1. Codebook vectors are initially chosen randomly from the data patterns.
  2. Initialize labels of codebook vectors.
  3. The codebook vectors may cross borders or may not be accurately labeled in the initialization stage, they are updated iteratively throughout the algorithm by increasing the weight to the output class hypothesized by the other modality, from the neuron representing the closest codebook vector.

3. Experimental Results

  • The following experiments were all performed using the Peterson and Barney vowel formant data.
  • The dataset consists of the first and second formants for ten vowels in a /h V d/context from 75 speakers (32 males, 28 females, 15 children) who repeated each vowel twice.
Accuracy (mean percent correct and sample standard deviation over 60 trials and 2 modalities). The heading i-j refers to performance measured after the j-th step during the i-th iteration.
  • Accuracy was measured individually (on the training set) for both modalities and averaged.
  • These results were then averaged over 60 runs.
  • 76%–79% accuracies are obtained with 2 different settings.
  • (It’s the first day of Chinese new year in 2022, I ‘ve just read it very quickly and present it very roughly. For more details, please read the paper. It is amazing that there is a self-supervised learning in the year of 1994.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.