Gaussian, Motion & Defocus Blur Classification

  • A learning-based method using a pre-trained deep neural network (DNN) and a general regression neural network (GRNN) is proposed to first classify the blur type and then estimate its parameters.


  1. DNN & GRNN Framework
  2. Deep Neural Network (DNN)
  3. General Regression Neural Network (GRNN)
  4. Experimental Results

1. DNN & GRNN Framework

1.1. Framework

DNN & GRNN Framework
  • DNN is…

Mean Teacher, Teacher Student Approach, for Semi-Supervised Learning

Teacher Student Approach (Image from Pixabay)
  • Mean Teacher is proposed, to average the model weights instead of label predictions as in Temporal Ensembling [13].
  • Mean Teacher improves test accuracy and enables training with fewer labels than Temporal Ensembling [13].
  • Without changing the network architecture, Mean Teacher achieves a lower error rate.

DeepCluster, K-Mean Clustering to Generate Pseudo-Labels, a Pretext Task for Self-Supervised Learning

Illustration of the Proposed DeepCluster
  • DeepCluster, a clustering method is proposed that jointly learns the parameters of a neural network and the cluster assignments of the resulting features.
  • DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network.


  1. Notations for Supervised Learning
  2. DeepCluster as Pretext Task in Self-Supervised Learning
  3. DeepCluster Analysis
  4. DeepCluster Performance

1. Notations for Supervised Learning

  • Before…

SepConv++: A Bunch of Small Improvements for Adaptive Separable Convolutions, Achieve SOTA Performance

Kernel-Based Interpolation with Spatially-Varying Kernels
  • Network using adaptive separable convolutions, is improved by a subtle set of low level improvements.
  • These improvements are delayed padding (+0.37 dB), input normalization (+0.30 dB), network improvements (+0.42 dB), kernel normalization (+0.52 dB), contextual training (+0.18 dB), self-ensembling (+0.18 dB).


  1. Proposed Video Frame Interpolation Framework
  2. Delayed Padding (+0.37 dB)
  3. Input Normalization (+0.30 dB)
  4. Network Improvements (+0.42 dB)
  5. Kernel Normalization (+0.52 dB)
  6. Contextual Training (+0.18 dB)

FCOS: Training Without the Use of Anchor Boxes

FCOS works by predicting a 4D vector (l, t, r, b) encoding the location of a bounding box at each foreground pixel
  • FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
  • It also avoids all hyper-parameters related to anchor boxes.
  • FCOS encourages to rethink the need of anchor boxes.
FCOS Demo:


  1. FCOS: Network Architecture
  2. Multi-level Prediction with FPN for FCOS
  3. Ablation Study
  4. SOTA Comparison

1. FCOS: Network Architecture

RotNet: Self-Supervised Learning by Predicting Image Rotations

The core intuition is that if someone is not aware of the concepts of the objects depicted in the images, he/she cannot recognize the rotation applied to the images
  • Using RotNet, image features are learnt by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input.
  • By this mean, unsupervised pre-trained AlexNet model achieves the state-of-the-art mAP of 54.4% that is only 2.4 points lower from the supervised AlexNet.


  1. RotNet: Image Rotation Prediction Framework
  2. Ablation Study & SOTA Comparison on CIFAR-10
  3. Task Generalization on ImageNet, Places, & PASCAL…

Solving Jigsaw Puzzles as Pretext Task for Self-Supervised Learning

Learning image representations by solving Jigsaw puzzles. (a): The image from which the tiles (marked with green lines) are extracted. (b): A puzzle obtained by shuffling the tiles. (c): determining the relative position (the relative location between the central tile and the top-left and top-middle tiles is ambiguous.)
  • Solving Jigsaw puzzles is treated as a pretext task, which requires no manual labeling. By training the CFN to solve Jigsaw puzzles, both a feature mapping of object parts and their correct spatial arrangement, are learnt.
  • Specifically, the Context Free Network (CFN), a siamese-ennead CNN, is designed to take image tiles as input, and outputs the correct spatial arrangement.

Colorization as Pretext Task in Self-Supervised Learning, Outperforms Context Prediction & Context Encoders

Example Input Grayscale Photos and Output Colorizations
  • A fully automatic approach is designed, which produces vibrant and realistic colorizations for a grayscale image.
  • This proposed colorization task is treated as a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder.


  1. Colorful Image Colorization
  2. Colorization Results
  3. Self-Supervised Learning Results

1. Colorful Image Colorization

Besides Using Latent Vector z, Latent Code c is also Input to GAN, for Learning Disentangled Representations

  • InfoGAN is designed to maximize the mutual information between a small subset of the latent variables and the observation.
  • A lower bound of the mutual information objective is derived that can be optimized efficiently.
  • By doing so, InfoGAN successfully disentangles writing styles from digit shapes on MNIST dataset, and disentangles the visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset.

Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.

Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top)
  • A network is split into two sub-networks, each is trained to perform a difficult task — predicting one subset of the data channels from another.
  • By forcing the network to solve cross-channel prediction tasks, feature learning is achieved without using any labels.


  1. Split-Brain Autoencoder (Split-Brain Auto)
  2. Experimental Results

1. Split-Brain Autoencoders (Split-Brain Auto)

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store