Summary: My Paper Reading Lists, Tutorials & Sharings
From Image Classification, Object Detection, Natural Language Processing (NLP), Self-Supervised Learning, Semi-Supervised Learning, Vision-Language, Generative Adversarial Network (GAN) to …

In this story, as the list is too long to be posted in each story, a list of my paper readings, tutorials and also sharings are posted here for convenience and will be updated from time to time.
- The total number of views I counted in June 2020 has been over 2M.
- The followers has been over 5K.
Thanks everyone for the support & the reading my stories.
Your claps are also important for me to continue my writing as well !!!
Actually, I wrote what I’ve learnt only. Reading a paper can consume hours or days. Sometimes, it is quite luxury to read a paper. I hope I can dig out some important points in the paper, or help reading the papers at a faster pace. But if there are some papers that you’re particularly interested in, it’s better to read the original papers for more detailed explanations. If there are something wrong, please also tell me. Thank you. (Sik-Ho Tsang @ Medium)
1. Computer Vision
1.1. Image Classification
1989-1998 [LeNet] 2010–2014 [ReLU] [AlexNet & CaffeNet] [Dropout] [Maxout] [NIN] [ZFNet] [SPPNet] [Distillation] 2015 [VGGNet] [Highway] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [All-CNN] 2016 [SqueezeNet] [Inception-v3] [ResNet] [Pre-Activation ResNet] [RiR] [Stochastic Depth] [WRN] [Trimps-Soushen] [GELU] [Layer Norm, LN] [Weight Norm, WN] 2017 [Inception-v4] [Xception] [MobileNetV1] [Shake-Shake] [Cutout] [FractalNet] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [IGCNet / IGCV1] [Deep Roots] [CWN] [RevNet] 2018 [RoR] [DMRNet / DFN-MR] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet] [IGCV2] [IGCV3] [FishNet] [SqueezeNext] [ENAS] [PNASNet] [ShuffleNet V2] [BAM] [CBAM] [MorphNet] [NetAdapt] [mixup] [DropBlock] [Group Norm (GN)] [Pelee & PeleeNet] [DLA] [Swish] [CoordConv] 2019 [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet] [Stylized-ImageNet] [FixRes] [SASA] [SE-WRN] [SGELU] [ImageNet-V2] [Bag of Tricks, ResNet-D] 2020 [Random Erasing (RE)] [SAOL] [AdderNet] [FixEfficientNet] [BiT] [RandAugment] [ImageNet-ReaL] [ciFAIR] [ResNeSt] [Batch Augment, BA] [Mish] [WS, BCN] [AdvProp] [RegNet] 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] 2022 [ConvNeXt] [PVTv2]
1.2. Unsupervised/Self-Supervised Learning
1993 [de Sa NIPS’93] 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] [Wang ICCV’15] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Motion Masks] [Doersch ICCV’17] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1] [Instance Discrimination] [Spitzer MICCAI’18] 2019 [Ye CVPR’19] [S⁴L] [Goyal ICCV’19] [Rubik’s Cube] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] [ConVIRT] [Rubik’s Cube+] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [MICLe] [Barlow Twins] [MoCo-CXR]
1.3. Pretraining or Weakly/Semi-Supervised Learning
2004 [Entropy Minimization, EntMin] 2013 [Pseudo-Label (PL)] 2015 [Ladder Network, Γ-Model] 2016 [Sajjadi NIPS’16] [Improved DCGAN, Inception Score] 2017 [Mean Teacher] [PATE & PATE-G] [Π-Model, Temporal Ensembling] 2018 [WSL] [Oliver NeurIPS’18] 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] [Kolesnikov CVPR’19] 2020 [BiT] [Noisy Student] [SimCLRv2] [UDA] [ReMixMatch] [FixMatch] 2021 [Curriculum Labeling (CL)] [Su CVPR’21] [Exemplar-v1, Exemplar-v2] [SimPLE]
1.4. Object Detection
2014 [OverFeat] [R-CNN] 2015 [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] 2016 [OHEM] [CRAFT] [R-FCN] [ION] [MultiPathNet] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [SSD] [YOLOv1] 2017 [NoC] [G-RMI] [TDM] [DSSD] [YOLOv2 / YOLO9000] [FPN] [RetinaNet] [DCN / DCNv1] [Light-Head R-CNN] [DSOD] [CoupleNet] 2018 [YOLOv3] [Cascade R-CNN] [MegDet] [StairNet] [RefineDet] [CornerNet] [Pelee & PeleeNet] [SiLU] 2019 [DCNv2] [Rethinking ImageNet Pre-training] [GRF-DSOD & GRF-SSD] [CenterNet] [Grid R-CNN] [NAS-FPN] [ASFF] [Bag of Freebies] [VoVNet/OSANet] [FCOS] [GIoU] 2020 [EfficientDet] [CSPNet] [YOLOv4] [SpineNet] [DETR] [Mish] [PP-YOLO] 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] 2022 [PVTv2] [YOLOv7]
1.5. Semantic Segmentation / Scene Parsing
2015 [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] 2016 [ENet] [ParseNet] [DilatedNet] 2017 [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [Cascade-SegNet & Cascade-DilatedNet] 2018 [ESPNet] [ResNet-DUC-HDC] [DeepLabv3+] [PAN] [DFN] [EncNet] [DLA] [UPerNet] 2019 [ResNet-38] [C3] [ESPNetv2] [ADE20K] [Semantic FPN, Panoptic FPN] 2020 [DRRN Zhang JNCA’20] 2021 [PVT, PVTv1] 2022 [PVTv2]
1.6. Instance Segmentation
2014–2015 [SDS] [Hypercolumn] [DeepMask] 2016 [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] 2017 [FCIS] [Mask R-CNN] 2018 [MaskLab] [PANet] 2019 [DCNv2] [Rethinking ImageNet Pre-training] 2021 [PVT, PVTv1] [Copy-Paste] 2022 [PVTv2]
1.7. Panoptic Segmentation
2019 [PS] [UPSNet] [Semantic FPN, Panoptic FPN] 2020 [DETR]
1.8. Biomedical Image Classification
2017 [ChestX-ray8] 2019 [CheXpert] [Rubik’s Cube] 2020 [VGGNet for COVID-19] [Dermatology] [ConVIRT] [Rubik’s Cube+] 2021 [MICLe] [MoCo-CXR]
1.9. Biomedical Image Segmentation
2015 [U-Net] 2016 [CUMedVision1] [CUMedVision2 / DCAN] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] 2017 [M²FCN] [Suggestive Annotation (SA)] [3D U-Net+ResNet] [Cascaded 3D U-Net] [DenseVoxNet] 2018 [QSA+QNT] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [UNet++] [H-DenseUNet] [Spitzer MICCAI’18] 2019 [DUNet] [NN-Fit] [Rubik’s Cube] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Rubik’s Cube+]
1.10. Face Recognition
2005 [Chopra CVPR’05] 2010 [ReLU] 2014 [DeepFace] [DeepID2] [CASIANet] 2015 [FaceNet] 2016 [N-pair-mc Loss]
1.11. Human Pose Estimation
2014–2015 [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] 2016 [CPM] [FCGN] [IEF] [DeepCut & DeeperCut] [Newell ECCV’16 & Newell POCV’16] 2017 [G-RMI] [CMUPose & OpenPose] [Mask R-CNN]
1.12. Video Classification / Action Recognition
2014 [Deep Video] [Two-Stream ConvNet] 2015 [DevNet] [C3D] [LRCN] 2016 [TSN] 2017 [Temporal Modeling Approaches] [4 Temporal Modeling Approaches] [P3D] [I3D] 2018 [NL: Non-Local Neural Networks] [S3D, S3D-G] 2019 [VideoBERT]
1.13. Weakly Supervised Object Localization (WSOL)
2014 [Backprop] 2016 [CAM] 2017 [Grad-CAM] [Hide-and-Seek] 2018 [Grad-CAM++] [ACoL] [SPG] 2019 [CutMix] [ADL] 2020 [Evaluating WSOL Right] [SAOL]
1.14. Data Visualization
2002 [SNE] 2006 [Autoencoder] [DrLIM] 2007 [UNI-SNE] 2008 [t-SNE]
2. Image Generation Related
2.1. Generative Adversarial Network (GAN)
Image Synthesis: 2014 [GAN] [CGAN] 2015 [LAPGAN] 2016 [AAE] [DCGAN] [CoGAN] [VAE-GAN] [InfoGAN] [Improved DCGAN, Inception Score] 2017 [SimGAN] [BiGAN] [ALI] [LSGAN] [EBGAN] 2019 [SAGAN]
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT]
Super Resolution: 2017 [SRGAN & SRResNet] [EnhanceNet] 2018 [ESRGAN]
Blur Detection: 2019 [DMENet]
Camera Tampering Detection: 2019 [Mantini’s VISAPP’19]
Video Coding: 2018 [VC-LAPGAN] 2020 [Zhu TMM’20] 2021 [Zhong ELECGJ’21]
2.2. Style Transfer
2016 [Image Style Transfer]
3. Image Reconstruction Related
3.1. Single Image Super Resolution (SISR)
2014–2016 [SRCNN] 2016 [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN]
2017 [DnCNN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [BT-SRN] [EDSR & MDSR] [EnhanceNet] 2018 [MWCNN] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [ZSSR] [MSRN] [Image Transformer] 2019 [SR+STN] [IDBP-CNN-IA] [SRFBN] [OISR] 2020 [PRLSR] [CSFN & CSFN-M]
3.2. Image Restoration
2008 [Jain NIPS’08] 2016 [RED-Net] [GDN] 2017 [DnCNN] [MemNet] [IRCNN] [WDRN / WavResNet] 2018 [MWCNN] 2019 [IDBP-CNN-IA]
3.3. Video Super Resolution (VSR)
2017 [STMC / VESPCN] 2018 [VSR-DUF / DUF] 2019 [EDVR]
3.4. Video Frame Interpolation / Extrapolation
2016 [Mathieu ICLR’16] 2017 [AdaConv] [SepConv] 2020 [DSepConv] 2021 [SepConv++]
4. Natural Language Processing (NLP)
4.1. Language Model / Sequence Model
2007 [Bengio TNN’07] 2013 [Word2Vec] [NCE] [Negative Sampling] [SGD+CR] 2014 [GloVe] [GRU] [Doc2Vec] [DT-RNN, DOT-RNN, sRNN] 2015 [Skip-Thought] [IRNN] 2016 [GCNN/GLU] [context2vec] [Jozefowicz arXiv’16] [LSTM-Char-CNN] [Layer Norm, LN] 2017 [TagLM] [CoVe] [MoE] [fastText] 2018 [GLUE] [T-DMCA] [GPT, GPT-1] [ELMo] 2019 [T64] [Transformer-XL] [BERT] [RoBERTa] [GPT-2] [DistilBERT] [MT-DNN] [Sparse Transformer] [SuperGLUE] [FAIRSEQ] [XLNet] [XLM] 2020 [ALBERT] [GPT-3] [T5] [Pre-LN Transformer] [MobileBERT] [TinyBERT]
4.2. Machine Translation
2014 [Seq2Seq] [RNN Encoder-Decoder] 2015 [Attention Decoder/RNNSearch] 2016 [GNMT] [ByteNet] [Deep-ED & Deep-Att] [Byte Pair Encoding (BPE)] 2017 [ConvS2S] [Transformer] [MoE] [GMNMT] [CoVe] 2018 [Shaw NAACL’18] 2019 [AdaNorm] [GPT-2] [Pre-Norm Transformer] [FAIRSEQ] [XLM] 2020 [Batch Augment, BA] [GPT-3] [T5] [Pre-LN Transformer] [OpenNMT] 2021 [ResMLP] [GPKD]
5. Visual/Vision/Video-Language
5.1. Visual/Vision/Video Language Model (VLM)
2019 [VideoBERT] [VisualBERT] [LXMERT] 2020 [ConVIRT]
5.2. Image Captioning
2015 [m-RNN] [R-CNN+BRNN] [Show and Tell/NIC] [Show, Attend and Tell] [LRCN] 2017 [Visual N-Grams]
5.3. Video Captioning
6. My Tutorials
6.1. Models
[Transformer from D2L.ai] [Transformer from Google TensorFlow]
6.2. Linux
[Ubuntu Installation] [NVIDIA Driver Installation (2018) (Old)] [OpenSSH Installation] [Hard Drive Partitioning/Formatting/Mounting] [Add 1 More GPU] [TeamViewer Installation] [NVIDIA Driver Installation (2019)]
6.3. Windows
[Anaconda + Spyder + TensorFlow 2.0 @ Windows 10] [OpenCV v4.2.0 Installation in Windows 10] [Converting PNG to YUV] [CUDA, cuDNN, Anaconda, Jupyter, PyTorch Installation in Windows 10]
6.4. Docker
[Docker Installation] [Pulling Image] [Running Image] [Exporting/Saving Image] [Nvidia-Docker 2.0 Installation] [Docker Installation in WSL 2 of Windows]
6.5. Caffe
[Image Classification] [Handwritten Digit Classification] [Style Recognition]
6.6. HDF5
7. My Sharings
- My Paper Readings about IQA/VQA, and Camera Tampering/Blur/Soiling Detection
- My Paper Readings and Tutorials About Video Coding
- Again, thanks for visiting my Medium stories. :)