Review: Virtual Adversarial Training (VAT)

VAT for Semi-Supervised Learning, Outperforms Ladder Network, Γ-Model & Π-Model

4 min readApr 21, 2022

Figure From Author’s GitHub

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
VAT, by Preferred Networks, Inc., ATR Cognitive Mechanisms Laboratories, Ritsumeikan University, and Kyoto University
2019 TPAMI, Over 1500 Citations (Sik-Ho Tsang @ Medium)
This paper is extended from “Distributional Smoothing with Virtual Adversarial Training” in 2016 ICLR with over 400 Citations.

A new measure of local smoothness of the conditional label distribution given input is proposed.
Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation.

Outline

Virtual Adversarial Training (VAT)
Experimental Results

1. Virtual Adversarial Training (VAT)

Virtual Adversarial Training (VAT) (Figure from Amit Chaudhary)

In Temporal Ensembling and Mean Teacher, MSE is used for estimating the similarity between two predictions.
In contrast, in Virtual Adversarial Training (VAT), KL divergence is used:

where x is input, r is a small perturbation on x, y is output, and Q is the set of labels.

Adversarial Direction (Solid Arrow) (Figure from Divam Gupta)

The perturbation r should be in the adversarial direction such that the prediction of the perturbed input should be different from the original one, i.e. the KL divergence between the two output distributions should be large:

where ε is the norm constraint.
Local Distribution Smoothing (LDS) loss is defined:

The loss LDS(x,θ) can be considered as a negative measure of the local smoothness of the current model at each input data point x.

The regularization term proposed in this paper is the average of LDS(x*,θ) over all input data points:

where Nl is the number of labelled samples, Nul is the number of unlabelled samples, Dl is the labelled samples, Dul is the unlabelled samples.
The full objective function is:

where l(Dl,θ) is the negative log-likelihood for the labeled dataset. VAT is a training method with the regularizer Rvadv.

VAT Algorithm

To perform VAT, first, get M randomly selected samples.
Generate a random unit vector for each sample, to calculate rvadv by taking the gradient.

Codes From Author’s GitHub

The above codes are from the author.
d in the code is equal to r in the paper.

LDSs are large for the points at the class boundary, and getting smaller after each update.

2. Experimental Results

2.1. MNIST

Test Performance of Semi-Supervised Learning Methods on MNIST with the Permutation Invariant Setting

NN with four hidden layers, of {1200, 600, 300, 150}, is used.

VAT outperforms many other semi-supervised methods except Ladder Network or GANs.

2.2. SVHN & CIFAR-10

Test Performance of Semi-Supervised Learning Methods on SVHN and CIFAR-10 without Image Data Augmentation

Two CNNs, Conv-Small and Conv-Large, are used.

VAT achieved the test error rate of 14.82%, which outperformed the state-of-the-art methods for semi-supervised learning on CIFAR-10.

With EntMin, ‘VAT+EntMin’ outperformed the state-of-the-art methods for semi-supervised learning on both SVHN and CIFAR-10.

2.3. Ablation of ε and α

Ablation of ε and α

α is fixed to be 1. ε is the only hyperparameter to be tuned.

2.4. Virtual Adversarial Examples

Virtual Adversarial Examples using Different Values of ε

References

[2016 ICLR] [VAT]
Distributional Smoothing with Virtual Adversarial Training

[2019 TPAMI] [VAT]
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

Pretraining or Semi-Supervised Learning

2004 [Entropy Minimization, EntMin] 2013 [Pseudo-Label (PL)] 2015 [Ladder Network, Γ-Model] 2016 [Sajjadi NIPS’16] 2017 [Mean Teacher] [PATE & PATE-G] [Π-Model, Temporal Ensembling] 2018 [WSL] 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] 2020 [BiT] [Noisy Student] [SimCLRv2]

My Other Previous Paper Readings

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Artificial Intelligence

Semi Supervised Learning

Image Classification

Adversarial Example

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Proximal Policy Optimization (PPO) Explained

In

TDS Archive

by

Wouter van Heeswijk, PhD

Proximal Policy Optimization (PPO) Explained

The journey from REINFORCE to the go-to algorithm in continuous control

Nov 29, 2022

YOLOv12: Redefining Real-Time Object Detection 🚀

Henry Navarro

YOLOv12: Redefining Real-Time Object Detection 🚀

Introducing the Pioneering Features and Performance of YOLOv12 from the Latest Research

Feb 19

Open-Sora 2.0 Explained

In

Towards AI

by

Louis-François Bouchard

Open-Sora 2.0 Explained

The $200K Model That’s Changing Video AI

Mar 30

Understanding and Implementing Faster R-CNN

Rishabh Singh

Understanding and Implementing Faster R-CNN

Most of the current SOTA models are built on top of the groundwork laid by the Faster-RCNN model. Faster R-CNN is an object detection model…

Oct 14, 2024

This new IDE from Google is an absolute game changer

In

Coding Beauty

by

Tari Ibaba

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Mar 11

ProGAN: Progressive Growing for High-Resolution Image Synthesis

Dong-Keon Kim

ProGAN: Progressive Growing for High-Resolution Image Synthesis

A summary and review of ProGAN model

Jan 23

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech