# Review: DCGAN — Deep Convolutional Generative Adversarial Network (GAN)

## Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

In this story, **Deep Convolutional Generative Adversarial Network (DCGAN)**, by Indico Research and Facebook AI Research (FAIR), is reviewed. With DCGAN, a hierarchy of representations is learnt from object parts to scenes in both the generator and discriminator. This is a paper in **2016 ICLR** with about **6000 citations**. (Sik-Ho Tsang @ Medium)

(During the days of coronavirus, I hope to write 30 stories in this month to give myself a small challenge. This is the 16th story in the month of April. Yet, today is 20th April 2020. The schedule is a little bit lagging behind. But I wish I can accomplish it, though some stories are short and are more related to my research work, i.e. video coding/compression, which is not the mainstream of deep learning development … lol)

# Outline

**A Set of Constraints for Stable Training****Network Architecture****Experimental Results**

**1. A Set of Constraints for Stable Training**

## 1.1. All convolutional net replaces deterministic spatial pooling functions (such as max pooling) with strided convolutions.

- The generator learns its own spatial downsampling itself using convolution.
- Similarly, the discriminator learns its own spatial upsampling.

## 1.2. Eliminating Fully Connected Layers

- The first layer of the generator, which takes a uniform noise distribution
*Z*as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. - For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output.
- But there are no fully connected layers for all hidden layers.

## 1.3. Batch Normalization (BN)

- BN stabilizes learning by normalizing the input to each unit to have zero mean and unit variance.
- However, directly applying BN to all layers resulted in sample oscillation and model instability.
- This was avoided by not applying BN to the generator output layer and the discriminator input layer.

## 1.4. Activation Functions

- The ReLU activation is used in the generator with the exception of the output layer which uses the Tanh function.
- Within the discriminator, it is found that the leaky rectified activation (LeakyReLU) works well.

# 2. **Network Architecture**

- The generator is as shown above. Only convolution, no fully connected layers according to the constraints.

- The discriminator is just like a inverse of generator.

**3. Experimental Results**

- Three datasets are trained:
**LSUN**,**ImageNet**, and**a newly assembled faces dataset**.

## 3.1. LSUN

- A model is trained on the LSUN bedrooms dataset containing a little over 3 million training examples.
- The model is not producing high quality samples via simply overfitting/memorizing training examples.

## 3.2. DCGAN Trained on ImageNet, Tested on CIFAR10 & SVHN

- DCGAN is trained on ImageNet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation to produce a 4×4 spatial grid. These features are then flattened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classifier is trained on top of them.
- This achieves 82.8% accuracy, outperforms all K-means based approaches.

- A purely supervised CNN is trained but only 28.87% error rate is obtained.
- But using DCGAN, the same CNN, 22.8% error rate is obtained.

## 3.3. Walking in the Latent Space

- The vector Z actually is a
*n*dimensional vector in a*n*dimensional space. - If interpolation is performed between 2
*z*vectors, a gradual change can be seen as shown above. **Walking in this latent space results in semantic changes to the image generations (such as objects being added and removed)**, we can reason that the model has learned relevant and interesting representations.

## 3.4. Visualizing the Discriminator Features

- The above figure shows that the
**features learnt by the discriminator activate on typical parts of a bedroom, like beds and windows**.

## 3.5. Forgetting to Draw Certain Objects

- By dropping “Window” filter,
**some windows are removed**, others are**transformed into**objects with similar visual appearance such as**doors and mirrors**. - Although visual quality decreased,
**overall scene composition stayed similar**, suggesting the generator has done a good job disentangling scene representation from object representation.

## 3.6. Vector Arithmetic on Face Samples

- Simple arithmetic operations revealed rich linear structure in representation space.
- e.g.: vector(”King”) — vector(”Man”) + vector(”Woman”) can result in a vector whose nearest neighbor was the vector for Queen.
- Experiments working on only single samples per concept were unstable, but
**averaging the**showed consistent and stable generations that semantically obeyed the arithmetic.*Z*vector for three exemplars - As shown in the figures above, for each column, the
*Z*vectors of samples are averaged.**Arithmetic was then performed on the mean vectors creating a new vector**This*Y*.*Y*is fed into the generator as input. **A uniform noise sampled with scale ±0.25 was added to***Y*to produce the other 8 samples.

- Face pose can also be modeled linearly in
*Z*space.

- In contrast, vector arithmetic on input space obtains poor results.

# Reference

[2016 ICLR] [DCGAN]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks