Using FPA & GAU Modules, Outperforms FCN, DeepLabv2, CRF-RNN, DeconvNet, DPN, PSPNet, DPN, DeepLabv2, RefineNet, DUC, and PSPNet.

Visualization results on VOC dataset
  • Feature Pyramid Attention (FPA) module is introduced to perform spatial pyramid attention structure on high-level output and combine global pooling to learn a better feature representation
  • Global Attention Upsample (GAU) module is introduced on each decoder layer to provide global context as a guidance of low-level features to select category localization details.


  1. PAN: Network Architecture
  2. Feature Pyramid Attention (FPA)…

Synthetic Images Become More Realistic

Simulated+Unsupervised (S+U) Learning in SimGAN
  • Simulated+Unsupervised (S+U) learning is proposed, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator.
  • Synthetic images are used as inputs instead of random vectors.
  • A ‘self-regularization’ term, a local adversarial loss, and update of the discriminator using a history of refined images, are also suggested.


Detecting Objects as Keypoint Triplet, Outperforms CornerNet, RefineDet, CoupleNet, RetinaNet, GRF-DSOD, DSOD, DSSD, SSD, YOLOv2, G-RMI, TDM, FPN, Faster R-CNN, etc.

CenterNet: Keypoint Triplets for Object Detection (The shaded red region is the central region.)
  • CenterNet detects each object as a triplet of keypoints, rather than a pair of keypoints like CornerNet, which improves both precision and recall.
  • Two customized modules, cascade corner pooling, and center pooling, that enrich information collected by both the top-left and bottom-right corners and provide more recognizable information from the central regions.

With Shared-Latent Space Assumption, Extending CoGAN, Outperforms CoGAN on Domain Adaptation

Shared-latent space Z
  • A shared-latent space assumption is made, which assumes a pair of corresponding images in different domains can be mapped to a same latent representation in a shared-latent space.
  • Image-to-Image Translation is performed through that shared-latent space. (At that moment, there exist no paired examples showing how an image could be translated to a corresponding image in another domain.)


  1. Shared-Latent Space Assumption
  2. UNIT: Framework
  3. Training Loss
  4. Image-to-image Translation Results
  5. Domain Adaptation…

CornerNet detects the top-left corner and the bottom-right corner
  • An object bounding box is detected as a pair of keypoints, the top-left corner and the bottom-right corner, eliminating the need for designing a set of anchor boxes commonly used in prior single-stage detectors.


  1. CornerNet: Network Architecture
  2. Corner Detection (Heatmap & Offsets)
  3. Corner Grouping (Embedding)
  4. Corner Pooling
  5. Comparisons with State-Of-The-Art Detectors

1. CornerNet: Network Architecture

  • RefineDet is proposed, which consists of two inter-connected modules, namely, the anchor refinement module (ARM) and the object detection module (ODM).
  • ARM filters out negative anchors to reduce search space for the classifier, and coarsely adjust the locations and sizes of anchors.
  • ODM takes the refined anchors as the input from ARM to further improve the regression and predict multi-class label.

With Local & Global FCN Branches, Captured Local & Global Information, Outperforms R-FCN, Faster R-CNN, SSD, & ION.

A toy example of object detection by combining local and global information
  • The object proposals obtained by the Region Proposal Network (RPN) are fed into the the coupling module which consists of two branches.
  • One branch adopts the position-sensitive RoI (PSRoI) pooling to capture the local part information of the object.
  • The other employs the RoI pooling to encode the global and context information.

LIU4K, Large-Scale Ideal Ultra high-definition 4K, A New 4K Image Dataset is Proposed

Example images sampled from LIU4K. (a) Training set. (b) Testing set.
  • LIU4K: Large-Scale Ideal Ultra high-definition 4K, a new dataset, is proposed.
  • A summary of state-of-the-art single image filtering approaches are also reviewed and evaluated using both full reference and no reference metric for objective and subjective quality measurement respectively.


  1. LIU4K: Large-Scale Ideal Ultra high-definition 4K
  2. Summary of State-Of-The-Art Single Image Filtering…

Using OHEM on Fast R-CNN, Heuristics Removed, Detection Accuracy Improved, Outperforms MR-CNN

  • OHEM automatic selects these hard examples can make training more effective and efficient.
  • OHEM eliminates several heuristics and hyperparameters in common use.


  1. Brief Review of Fast R-CNN (FRCN)
  2. Several Heuristics in FRCN
  3. Online Hard Example Mining (OHEM)
  4. Experimental Results

1. Brief Review of Fast R-CNN (FRCN)

  1. FRCN takes as…


With Weight Sharing, Generates Correlated Outputs in Different Domains for the Same Input, Outperforms CGAN

Face Generation With and Without Smiling
  • A single input vector can generates correlated outputs in different domains through multiple GANs with weight sharing.
  • Possible applications: Producing color image and depth image where these two images are highly correlated, i.e. describing the same scene, or images of the same face with different attributes (smiling and non-smiling).


  1. Coupled Generative…

Sik-Ho Tsang

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store