Summary: My Paper Reading Lists, Tutorials & Sharings

From Image Classification, Object Detection, Natural Language Processing (NLP), Self-Supervised Learning, Semi-Supervised Learning, Vision-Language, Generative Adversarial Network (GAN) to …

In this story, as the list is too long to be posted in each story, a list of my paper readings, tutorials and also sharings are posted here for convenience and will be updated from time to time.

  • The total number of views I counted in June 2020 has been over 2M.
  • The followers has been over 5K.

Thanks everyone for the support & the reading my stories.

Your claps are also important for me to continue my writing as well !!!

Actually, I wrote what I’ve learnt only. Reading a paper can consume hours or days. Sometimes, it is quite luxury to read a paper. I hope I can dig out some important points in the paper, or help reading the papers at a faster pace. But if there are some papers that you’re particularly interested in, it’s better to read the original papers for more detailed explanations. If there are something wrong, please also tell me. Thank you. (Sik-Ho Tsang @ Medium)

1. Computer Vision

1.1. Image Classification

1989-1998 [LeNet] 2010–2014 [ReLU] [AlexNet & CaffeNet] [Dropout] [Maxout] [NIN] [ZFNet] [SPPNet] [Distillation] 2015 [VGGNet] [Highway] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [All-CNN] [RCNN] 2016 [SqueezeNet] [Inception-v3] [ResNet] [Pre-Activation ResNet] [RiR] [Stochastic Depth] [WRN] [Trimps-Soushen] [GELU] [Layer Norm, LN] [Weight Norm, WN] [ELU] 2017 [Inception-v4] [Xception] [MobileNetV1] [Shake-Shake] [Cutout] [FractalNet] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [IGCNet / IGCV1] [Deep Roots] [CWN] [RevNet] 2018 [RoR] [DMRNet / DFN-MR] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet] [IGCV2] [IGCV3] [FishNet] [SqueezeNext] [ENAS] [PNASNet] [ShuffleNet V2] [BAM] [CBAM] [MorphNet] [NetAdapt] [mixup] [DropBlock] [Group Norm (GN)] [Pelee & PeleeNet] [DLA] [Swish] [CoordConv] 2019 [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet] [Stylized-ImageNet] [FixRes] [SASA] [SE-WRN] [SGELU] [ImageNet-V2] [Bag of Tricks, ResNet-D] [PBA] [Fast AutoAugment (FAA)] 2020 [Random Erasing (RE)] [SAOL] [AdderNet] [FixEfficientNet] [BiT] [RandAugment] [ImageNet-ReaL] [ciFAIR] [ResNeSt] [Batch Augment, BA] [Mish] [WS, BCN] [AdvProp] [RegNet] [SAN] [Cordonnier ICLR’20] [ICMLM] [Self-Training] [SupCon] 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] [V-MoE] [ImageNet-21K Pretraining] [ResT / ResTv1] [ViL] [ReLabel] [MixToken / LV-ViT] [gMLP] [MViT / MViTv1] 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP]

1.2. Unsupervised/Self-Supervised Learning

1993 [de Sa NIPS’93] 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] [Wang ICCV’15] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Motion Masks] [Doersch ICCV’17] [TextTopicNet] [Counting] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1] [Instance Discrimination] [Spot Artifacts] 2019 [Ye CVPR’19] [S⁴L] [Goyal ICCV’19] [Rubik’s Cube] [AET] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [Barlow Twins] [W-MSE] [SimSiam+AL] [BYOL+LP] 2022 [BEiT] [BEiT V2]

1.3. Pretraining or Weakly/Semi-Supervised Learning

2004 [Entropy Minimization, EntMin] 2013 [Pseudo-Label (PL)] 2015 [Ladder Network, Γ-Model] 2016 [Sajjadi NIPS’16] [Improved DCGAN, Inception Score] 2017 [Mean Teacher] [PATE & PATE-G] [Π-Model, Temporal Ensembling] 2018 [WSL] [Oliver NeurIPS’18] 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] [Kolesnikov CVPR’19] 2020 [BiT] [Noisy Student] [SimCLRv2] [UDA] [ReMixMatch] [FixMatch] [Self-Training] 2021 [Curriculum Labeling (CL)] [Su CVPR’21] [Exemplar-v1, Exemplar-v2] [SimPLE] [BYOL+LP]

1.4. Object Detection

2014 [OverFeat] [R-CNN] 2015 [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] 2016 [OHEM] [CRAFT] [R-FCN] [ION] [MultiPathNet] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [SSD] [YOLOv1] 2017 [NoC] [G-RMI] [TDM] [DSSD] [YOLOv2 / YOLO9000] [FPN] [RetinaNet] [DCN / DCNv1] [Light-Head R-CNN] [DSOD] [CoupleNet] 2018 [YOLOv3] [Cascade R-CNN] [MegDet] [StairNet] [RefineDet] [CornerNet] [Pelee & PeleeNet] [SiLU] 2019 [DCNv2] [Rethinking ImageNet Pre-training] [GRF-DSOD & GRF-SSD] [CenterNet] [Grid R-CNN] [NAS-FPN] [ASFF] [Bag of Freebies] [VoVNet/OSANet] [FCOS] [GIoU] 2020 [EfficientDet] [CSPNet] [YOLOv4] [SpineNet] [DETR] [Mish] [PP-YOLO] 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] 2022 [PVTv2] [YOLOv7] [Pix2Seq]

1.5. Semantic Segmentation / Scene Parsing

2015 [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] 2016 [ENet] [ParseNet] [DilatedNet] 2017 [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [Cascade-SegNet & Cascade-DilatedNet] 2018 [ESPNet] [ResNet-DUC-HDC] [DeepLabv3+] [PAN] [DFN] [EncNet] [DLA] [Non-Local Neural Networks] [UPerNet] [PSANet] [Probabilistic U-Net] 2019 [ResNet-38] [C3] [ESPNetv2] [ADE20K] [Semantic FPN, Panoptic FPN] [Auto-DeepLab] [DANet] 2020 [DRRN Zhang JNCA’20] 2021 [PVT, PVTv1] [SETR] 2022 [PVTv2]

1.6. Instance Segmentation

2014–2015 [SDS] [Hypercolumn] [DeepMask] 2016 [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] 2017 [FCIS] [Mask R-CNN] 2018 [MaskLab] [PANet] 2019 [DCNv2] [Rethinking ImageNet Pre-training] [HTC] 2021 [PVT, PVTv1] [Copy-Paste] 2022 [PVTv2]

1.7. Panoptic Segmentation

2019 [PS] [UPSNet] [Semantic FPN, Panoptic FPN] 2020 [DETR]

1.8. Face Recognition

2005 [Chopra CVPR’05] 2010 [ReLU] 2014 [DeepFace] [DeepID2] [CASIANet] 2015 [FaceNet] 2016 [N-pair-mc Loss]

1.9. Human Pose Estimation

2014–2015 [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] 2016 [CPM] [FCGN] [IEF] [DeepCut & DeeperCut] [Newell ECCV’16 & Newell POCV’16] 2017 [G-RMI] [CMUPose & OpenPose] [Mask R-CNN]

1.10. Video Classification / Action Recognition

2014 [Deep Video] [Two-Stream ConvNet] 2015 [DevNet] [C3D] [LRCN] 2016 [TSN] 2017 [Temporal Modeling Approaches] [4 Temporal Modeling Approaches] [P3D] [I3D] [Something Something] 2018 [Non-Local Neural Networks] [S3D, S3D-G] 2019 [VideoBERT] [Moments in Time] 2021 [MViT / MViTv1]

1.11. Weakly Supervised Object Localization (WSOL)

2014 [Backprop] 2016 [CAM] 2017 [Grad-CAM] [Hide-and-Seek] 2018 [Grad-CAM++] [ACoL] [SPG] 2019 [CutMix] [ADL] 2020 [Evaluating WSOL Right] [SAOL]

1.12. Data Visualization

2002 [SNE] 2006 [Autoencoder] [DrLIM] 2007 [UNI-SNE] 2008 [t-SNE]

3. Visual/Vision/Video-Language

3.1. Visual/Vision/Video Language Model (VLM)

2018 [Conceptual Captions] 2019 [VideoBERT] [VisualBERT] [LXMERT] [ViLBERT] 2020 [ConVIRT] [VL-BERT] [OSCAR]

3.2. Text-to-Image Generation

2021 [DALL·E]

3.3. Image Captioning

2015 [m-RNN] [R-CNN+BRNN] [Show and Tell/NIC] [Show, Attend and Tell] [LRCN] 2017 [Visual N-Grams] 2018 [Conceptual Captions]

3.4. Video Captioning

2015 [LRCN] 2017 [Something Something] 2019 [VideoBERT]

4. Medical Image Analysis

4.1. Biomedical Image Classification

2017 [ChestX-ray8] 2019 [CheXpert] 2020 [VGGNet for COVID-19] [Dermatology] 2021 [CheXternal] [CheXtransfer]

4.2. Biomedical Image Segmentation

2015 [U-Net] 2016 [CUMedVision1] [CUMedVision2 / DCAN] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [Moeskops MICCAI’16] 2017 [M²FCN] [Suggestive Annotation (SA)] [3D U-Net+ResNet] [Cascaded 3D U-Net] [DenseVoxNet] [Atlas-Aware Net] 2018 [QSA+QNT] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [UNet++] [H-DenseUNet] [MDU-Net] [Probabilistic U-Net] 2019 [DUNet] [NN-Fit] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)]

4.3. Biomedical Image Multi-Task Learning

2018 [ResNet+Mask R-CNN] [cU-Net+PE] [Multi-Task Deep U-Net] [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] [Qu ISBI’19] 2020 [BUSI] [Song JBHI’20] [cGAN JESWA’20] 2021 [Ciga JMEDIA’21]

4.4. Biomedical Image Self-Supervised Learning

2018 [Spitzer MICCAI’18] 2019 [Rubik’s Cube] [Context Restoration] 2020 [ConVIRT] [Rubik’s Cube+] 2021 [MICLe] [MoCo-CXR] [DVME] [MedAug] 2022 [BT-Unet] [Taleb JDiagnostics’22]

4.5. Biomedical Image Semi-Supervised Learning

2019 [UA+MT]

5. Image Generation Related

3.1. Generative Adversarial Network (GAN)

Image Synthesis: 2014 [GAN] [CGAN] 2015 [LAPGAN] 2016 [AAE] [DCGAN] [CoGAN] [VAE-GAN] [InfoGAN] [Improved DCGAN, Inception Score] 2017 [SimGAN] [BiGAN] [ALI] [LSGAN] [EBGAN] [PBT] 2019 [SAGAN]
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT]
Machine Translation: 2018 [UNMT]
Super Resolution: 2017
[SRGAN & SRResNet] [EnhanceNet] 2018 [ESRGAN]
Blur Detection: 2019 [DMENet]
Medical Imaging: 2018 [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] 2020 [cGAN JESWA’20]
Camera Tampering Detection: 2019
[Mantini’s VISAPP’19]
Video Coding: 2018
[VC-LAPGAN] 2020 [Zhu TMM’20] 2021 [Zhong ELECGJ’21]

5.2. Image Generation

2018 [Image Transformer] 2021 [Performer]

5.3. Style Transfer

2016 [Image Style Transfer] [Perceptual Loss]

6. Image Reconstruction Related

6.1. Single Image Super Resolution (SISR)

2014–2016 [SRCNN] 2016 [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [Perceptual Loss] 2017 [DnCNN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [BT-SRN] [EDSR & MDSR] [EnhanceNet] 2018 [MWCNN] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [ZSSR] [MSRN] [Image Transformer] 2019 [SR+STN] [IDBP-CNN-IA] [SRFBN] [OISR] 2020 [PRLSR] [CSFN & CSFN-M]

6.2. Image Restoration

2008 [Jain NIPS’08] 2016 [RED-Net] [GDN] 2017 [DnCNN] [MemNet] [IRCNN] [WDRN / WavResNet] 2018 [MWCNN] 2019 [IDBP-CNN-IA]

6.3. Video Super Resolution (VSR)

2017 [STMC / VESPCN] 2018 [VSR-DUF / DUF] 2019 [EDVR]

6.4. Video Frame Interpolation / Extrapolation

2016 [Mathieu ICLR’16] 2017 [AdaConv] [SepConv] 2020 [DSepConv] 2021 [SepConv++]



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store