Using Deformable Convolution from DCNv1 and DCNv2, Outperforms AdaConv & SepConv

Left: SepConv, bad interpolation for the ball, Right: Proposed DSepConv, considers more relevant pixels far away from local grid (black rectangle) with a much smaller kernel size and performs better
  • Deformable separable convolution (DSepConv) is used to adaptively estimate kernels, offsets and masks to allow the network to obtain information with much fewer but more relevant pixels.


  1. Adaptive Deformable Separable Convolution
  2. DSepConv: Network Architecture
  3. Loss Function
  4. Experimental Results

1. Adaptive Deformable Separable Convolution

1.1. Conventional Kernel-based Methods

Using Cycle Consistency Loss for Unpaired Image-to-Image Translation, Outperforms CoGAN, BiGAN, ALI & SimGAN

CycleGAN learns to automatically “translate” an image from one into the other
  • CycleGAN is designed to translate an image from a source domain X to a target domain Y in the absence of paired examples, i.e. G: XY.
  • This mapping is highly under-constrained, an inverse mapping F: YX is coupled and a cycle consistency loss is introduced to enforce F(G(X))=X (and vice versa).


The Encoding Time is Reduced by 39.56% on Average

Depth Map Example
  • First, Holistically Nested Edge Detection (HED) network is used for edge detection.
  • Then, Ostu method is used to divide the output of the HED into foreground region and background region.
  • Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU).


  1. Brief Introduction in…

Deep Learning-based Transform (DLT) Using Autoencoder, 0.75% BD-Rate Reduction is Achieved

Overview of the Proposed Deep Learning-Based Transform (DLT)
  • A convolutional neural network (CNN) model is designed as Deep Learning-Based Transform (DLT) to achieve better decorrelation and energy compaction than the conventional discrete cosine transform (DCT).
  • The intra prediction signal is utilized as side information to reduce the directionality in the residual.
  • A novel loss function is used to characterize the efficiency of the transform during the training.


  1. Directionality in Residual Domain
  2. DLT: Network Architecture
  3. Loss Function
  4. Experimental Results

1. Directionality in Residual Domain

Not Only Mapping from Latent Space to Data Space, But Also Mapping from Data Space to Latent Space, Outperforms DCGAN

  • The generation network maps samples from stochastic latent variables to the data space.
  • The inference network maps training examples in data space to the space of latent variables.
  • The discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network.

Using Context Encoding Module, Outperforms PSPNet and DeepLabv3, FCN, DilatedNet, DeepLabv2, CRF-RNN, DeconvNet, DPN, RefineNet & ResNet-38

Narrowing the list of probable categories based on scene context makes labeling much easier.
  • Context Encoding Module is introduced, which captures the semantic context of scenes and selectively highlights class-dependent feature maps.
  • For example in the above figure, the suite room scene will seldom have a horse, but more likely there will be chair, bed and curtain, etc. In this case, this module helps to highlight chair, bed and curtain.


  1. Context Encoding…

Bidirectional Generative Adversarial Networks (BiGANs): Learning the Inverse Mapping, from Image Space to Latent Space

  • Bidirectional Generative Adversarial Network (BiGAN) is designed as a means of learning the inverse mapping, i.e. projecting data back into the latent space.
  • This resulting learned feature representation is useful for auxiliary supervised discrimination tasks.

With Smooth Network & Border Network, Outperforms DeepLabv3+, PSPNet, ResNet-38, RefineNet, GCN, DUC, DeepLabv2, ParseNet, DPN, FCN.

Hard examples in semantic segmentation
  • Discriminative Feature Network (DFN) has 2 sub-networks.
  • One is the Smooth Network, to handle the intra-class inconsistency problem with Channel Attention Block and global average pooling to select the more discriminative features.
  • One is the Border Network, to make the bilateral features of boundary distinguishable with deep semantic boundary supervision.


  1. DFN: Network Architecture
  2. Smooth Network

GAN Combined With Autoencoder

  • AAE is a probabilistic autoencoder that uses GAN.
  • The decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution.


  1. AAE: Network Architecture
  2. AAE vs VAE
  3. Supervised AAE
  4. Semi-supervised AAE
  5. Unsupervised AAE
  6. Dimension Reduction for Data…

Using FPA & GAU Modules, Outperforms FCN, DeepLabv2, CRF-RNN, DeconvNet, DPN, PSPNet, DPN, DeepLabv2, RefineNet, DUC, and PSPNet.

Visualization results on VOC dataset
  • Feature Pyramid Attention (FPA) module is introduced to perform spatial pyramid attention structure on high-level output and combine global pooling to learn a better feature representation
  • Global Attention Upsample (GAU) module is introduced on each decoder layer to provide global context as a guidance of low-level features to select category localization details.


  1. PAN: Network Architecture
  2. Feature Pyramid Attention (FPA)…

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store