Review: DS-CNN — Decoder-side Scalable CNN (Codec Filtering)

Scalable Quality Enhancement Approach, Outperform ARCNN and VRCNN

4 min readAug 5, 2019

**An Example of Application Scenario of DS-CNN**

In this story, Decoder-side Scalable CNN (DS-CNN), by Beihang University and Collaborative Innovation Center of Geospatial Technology, is briefly reviewed. Based on the decoder side computational ability, using different complexity scale within one CNN model as shown above to improve the image/video quality, so as to increase the coding efficiency. This is published in 2017 ICME with tens of citations. (Sik-Ho Tsang @ Medium)

Outline

From AR-CNN to AR-CNN-1 and AR-CNN-2
DS-CNN-I and DS-CNN-B Network Architecture
Experimental Results

1. From ARCNN to AR-CNN-1 and AR-CNN-2

The original ARCNN has 4 layers of convolution as shown above.
(Please feel free to read my ARCNN review if interested.)

AR-CNN-1: Authors improve the ARCNN by increasing the filter number,
AR-CNN-2: And adding one more layer. And PSNR is improved.

2. DS-CNN-I and DS-CNN-B Network Architecture

**DS-CNN-I & DS-CNN-B Network Architecture**

DS-CNN-I: It is a network consists of 5 convolutional layers (Green) as shown above. It is used for intra-frame.
DS-CNN-B: It is used for inter-frame. Recall that Conv 1 is to extract intra coding features. Then, the outputs of Conv 1 and Conv 6 are concatenated, and are both convolved by Conv 7.
Thus, Conv 7 denoises the features of both intra and inter coding.
Conv 8–10 in DS-CNN-B are designed in the similar way.

Once the computational resources are not sufficient, the switches {S0 to S4} are turned off, and only DS-CNN-I is in use at the decoder.
When the computational resources are sufficient, {S0 to S4} are turned on, and DS-CNN-B starts to work based on the output from the layers Conv 1–4 of DS-CNN-I.
Because of the reduction of inter coding distortion, the quality of B/P frames can be further enhanced by DS-CNN-B, at the cost of higher computational complexity.