Summary: My Paper Reading Lists, Tutorials & Sharings
From Image Classification, Object Detection, Natural Language Processing (NLP), Self-Supervised Learning, Semi-Supervised Learning, Vision-Language, Generative Adversarial Network (GAN) to …
In this story, as the list is too long to be posted in each story, a list of my paper readings, tutorials and also sharings are posted here for convenience and will be updated from time to time.
Actually, I wrote what I’ve learnt only. Reading a paper can consume hours or days. Sometimes, it is quite luxury to read a paper. I hope I can dig out some important points in the paper, or help reading the papers at a faster pace. But if there are some papers that you’re particularly interested in, it’s better to read the original papers for more detailed explanations. If there are something wrong, please also tell me. Thank you. (Sik-Ho Tsang @ Medium)
- Thanks everyone for reading my stories.
- Your claps are important to me for continuing my writing as well !!!
0. Legends
2011 [Thinking Fast and Slow] 2017 [AI Reshapes World] 2019 [New Heights with ANN] 2021 [Deep Learning for AI] 2022 [Small is the New Big]
1. Computer Vision
1.1. Image Classification
1989-1998 [LeNet] 2010–2014 [ReLU] [AlexNet & CaffeNet] [Dropout] [Maxout] [NIN] [ZFNet] [SPPNet] [Distillation] 2015 [VGGNet] [Highway] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [All-CNN] [RCNN] 2016 [SqueezeNet] [Inception-v3] [ResNet] [Pre-Activation ResNet] [RiR] [Stochastic Depth] [WRN] [Trimps-Soushen] [GELU] [Layer Norm, LN] [Weight Norm, WN] [ELU] 2017 [Inception-v4] [Xception] [MobileNetV1] [Shake-Shake] [Cutout] [FractalNet] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [IGCNet / IGCV1] [Deep Roots] [CWN] [RevNet] 2018 [RoR] [DMRNet / DFN-MR] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet] [IGCV2] [IGCV3] [FishNet] [SqueezeNext] [ENAS] [PNASNet] [ShuffleNet V2] [BAM] [CBAM] [MorphNet] [NetAdapt] [mixup] [DropBlock] [Group Norm (GN)] [Pelee & PeleeNet] [DLA] [Swish] [CoordConv] 2019 [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet] [Stylized-ImageNet] [FixRes] [SASA] [SE-WRN] [SGELU] [ImageNet-V2] [Bag of Tricks, ResNet-D] [PBA] [Fast AutoAugment (FAA)] [Switch Norm (SN)] 2020 [Random Erasing (RE)] [SAOL] [AdderNet] [FixEfficientNet] [BiT] [RandAugment] [ImageNet-ReaL] [ciFAIR] [ResNeSt] [Batch Augment, BA] [Mish] [WS, BCN] [AdvProp] [RegNet] [SAN] [Cordonnier ICLR’20] [ICMLM] [Self-Training] [SupCon] [Open Images] [Axial-DeepLab] [GhostNet] 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] [V-MoE] [ImageNet-21K Pretraining] [Do You Even Need Attention?] [ResT / ResTv1] [ViL] [ReLabel] [MixToken / LV-ViT] [gMLP] [MViT / MViTv1] [CLIP] [GFNet] [Res2Net] [Sharpness-Aware Minimization (SAM)] [Transformer-LS] [R-Drop] [ParNet] 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] [RepLKNet] [ParNet] [MetaFormer, PoolFormer] [Swin Transformer V2] 2023 [Vision Permutator (ViP)]
1.2. Unsupervised/Self-Supervised Learning
1993 [de Sa NIPS’93] 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] [Wang ICCV’15] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Motion Masks] [Doersch ICCV’17] [TextTopicNet] [Counting] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1] [Instance Discrimination] [Spot Artifacts] 2019 [Ye CVPR’19] [S⁴L] [Goyal ICCV’19] [Rubik’s Cube] [AET] [Deep InfoMax (DIM)] [AMDIM] [Local Aggregation (LA)] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [Barlow Twins] [W-MSE] [SimSiam+AL] [BYOL+LP] 2022 [BEiT] [BEiT V2] [Masked Autoencoders (MAE)] [DiT]
1.3. Pretraining or Weakly/Semi-Supervised Learning
2004 [Entropy Minimization, EntMin] 2013 [Pseudo-Label (PL)] 2015 [Ladder Network, Γ-Model] 2016 [Sajjadi NIPS’16] [Improved DCGAN, Inception Score] 2017 [Mean Teacher] [PATE & PATE-G] [Π-Model, Temporal Ensembling] 2018 [WSL] [Oliver NeurIPS’18] 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] [Kolesnikov CVPR’19] 2020 [BiT] [Noisy Student] [SimCLRv2] [UDA] [ReMixMatch] [FixMatch] [Self-Training] 2021 [Curriculum Labeling (CL)] [Su CVPR’21] [Exemplar-v1, Exemplar-v2] [SimPLE] [BYOL+LP]
1.4. Object Detection
2014 [OverFeat] [R-CNN] 2015 [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] 2016 [OHEM] [CRAFT] [R-FCN] [ION] [MultiPathNet] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [SSD] [YOLOv1] 2017 [NoC] [G-RMI] [TDM] [DSSD] [YOLOv2 / YOLO9000] [FPN] [RetinaNet] [DCN / DCNv1] [Light-Head R-CNN] [DSOD] [CoupleNet] 2018 [YOLOv3] [Cascade R-CNN] [MegDet] [StairNet] [RefineDet] [CornerNet] [Pelee & PeleeNet] [SiLU] 2019 [DCNv2] [Rethinking ImageNet Pre-training] [GRF-DSOD & GRF-SSD] [CenterNet] [Grid R-CNN] [NAS-FPN] [ASFF] [Bag of Freebies] [VoVNet/OSANet] [FCOS] [GIoU] 2020 [EfficientDet] [CSPNet] [YOLOv4] [SpineNet] [DETR] [Mish] [PP-YOLO] [Open Images] 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] 2022 [PVTv2] [Pix2Seq] [MViTv2] 2023 [YOLOv7]
1.5. Semantic Segmentation / Scene Parsing / Instance Segmentation / Panoptic Segmentation
2014-2015 [SDS] [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] [Hypercolumn] [DeepMask] [DecoupledNet] 2016 [ENet] [ParseNet] [DilatedNet] [Cityscapes] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] 2017 [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [Cascade-SegNet & Cascade-DilatedNet] [FCIS] [Mask R-CNN] 2018 [ESPNet] [ResNet-DUC-HDC] [DeepLabv3+] [PAN] [DFN] [EncNet] [DLA] [Non-Local Neural Networks] [UPerNet] [PSANet] [Probabilistic U-Net] [MaskLab] [PANet] [Mask X R-CNN] [MaskLab] [PersonLab] 2019 [ResNet-38] [C3] [ESPNetv2] [ADE20K] [Semantic FPN, Panoptic FPN] [Auto-DeepLab] [DANet] [Improved U-Net] [Gated-SCNN] [Recurrent U-Net (R-UNet)] [EFCN] [DCNv2] [Rethinking ImageNet Pre-training] [HTC] [YOLACT] [MS R-CNN] [PS] [UPSNet] [DeeperLab] [Bellver CVPRW’19] 2020 [DRRN Zhang JNCA’20] [Trans10K, TransLab] [CCNet] [Open Images] [DETR] [Panoptic-DeepLab] [Axial-DeepLab] [Zhang JNCA’20] 2021 [PVT, PVTv1] [SETR] [Trans10K-v2, Trans2Seg] [Copy-Paste] 2022 [PVTv2] [YOLACT++]
1.6. Face Recognition
2005 [Chopra CVPR’05] 2010 [ReLU] 2014 [DeepFace] [DeepID2] [CASIANet] 2015 [FaceNet] 2016 [N-pair-mc Loss]
1.7. Human Pose Estimation
2014–2015 [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] 2016 [CPM] [FCGN] [IEF] [DeepCut & DeeperCut] [Newell ECCV’16 & Newell POCV’16] 2017 [G-RMI] [CMUPose & OpenPose] [Mask R-CNN] 2018 [PersonLab]
1.8. Video Classification / Action Recognition
2014 [Deep Video] [Two-Stream ConvNet] 2015 [DevNet] [C3D] [LRCN] 2016 [TSN] 2017 [Temporal Modeling Approaches] [4 Temporal Modeling Approaches] [P3D] [I3D] [Something Something] 2018 [Non-Local Neural Networks] [S3D, S3D-G] 2019 [VideoBERT] [Moments in Time] 2021 [MViT / MViTv1] [MViTv2]
1.9. Weakly Supervised Object Localization (WSOL)
2014 [Backprop] 2016 [CAM] 2017 [Grad-CAM] [Hide-and-Seek] 2018 [Grad-CAM++] [ACoL] [SPG] 2019 [CutMix] [ADL] 2020 [Evaluating WSOL Right] [SAOL]
1.10. Visualization
2002 [SNE] 2006 [Autoencoder] [DrLIM] 2007 [UNI-SNE] 2008 [t-SNE] 2018 [Loss Landscape]
2. Natural Language Processing (NLP)
2.1. Language Model / Sequence Model
(Some are not related to NLP, but I just group them here)
1991 [MoE] 1997 [Bidirectional RNN (BRNN)] 2005 [Bidirectional LSTM (BLSTM)] 2007 [Bengio TNN’07] 2013 [Word2Vec] [NCE] [Negative Sampling] [SGD+CR] [Leaky ReLU] 2014 [GloVe] [GRU] [Doc2Vec] [DT-RNN, DOT-RNN, sRNN] 2015 [Skip-Thought] [IRNN] [ConvLSTM] 2016 [GCNN/GLU] [context2vec] [Jozefowicz arXiv’16] [LSTM-Char-CNN] [Layer Norm, LN] 2017 [TagLM] [CoVe] [MoE] [fastText] 2018 [GLUE] [T-DMCA] [GPT, GPT-1] [ELMo] 2019 [T64] [Transformer-XL] [BERT] [RoBERTa] [GPT-2] [DistilBERT] [MT-DNN] [Sparse Transformer] [SuperGLUE] [FAIRSEQ] [XLNet] [XLM] [UniLM] 2020 [ALBERT] [GPT-3] [T5] [Pre-LN Transformer] [MobileBERT] [TinyBERT] [BART] [Longformer] [ELECTRA] [Megatron-LM] [SpanBERT] [UniLMv2] [Human Feedback Model] [DeFINE] [BIGBIRD] [ReGLU, GEGLU & SwiGLU] 2021 [Performer] [gMLP] [Roformer] [PPBERT] [DeBERTa] [DeLighT] [Transformer-LS] [R-Drop] [Jurassic-1] 2022 [GPT-NeoX-20B] [GPT-3.5, InstructGPT] [GLM] [MT-NLG 530B] 2023 [GPT-4]
2.2. Machine Translation
2013 [Translation Matrix] 2014 [Seq2Seq] [RNN Encoder-Decoder] 2015 [Attention Decoder/RNNSearch] 2016 [GNMT] [ByteNet] [Deep-ED & Deep-Att] [Byte Pair Encoding (BPE)] [Back Translation] 2017 [ConvS2S] [Transformer] [MoE] [GMNMT] [CoVe] [PBT] 2018 [Shaw NAACL’18] [CSLS] [Back Translation+Sampling] [UNMT] 2019 [AdaNorm] [GPT-2] [Pre-Norm Transformer] [FAIRSEQ] [XLM] 2020 [Batch Augment, BA] [GPT-3] [T5] [Pre-LN Transformer] [OpenNMT] [DeFINE] [MUTE] 2021 [ResMLP] [GPKD] [Roformer] [DeLighT] [R-Drop] 2022 [DeepNet]
2.3. Summarization
2018 [T-DMCA] 2020 [Human Feedback Model] 2022 [GPT-3.5, InstructGPT] 2023 [DetectGPT]
3. Visual/Vision/Video-Language
3.1. Visual/Vision/Video Language Model (VLM)
2017 [Visual Genome (VG)] 2018 [Conceptual Captions] 2019 [VideoBERT] [VisualBERT] [LXMERT] [ViLBERT] 2020 [ConVIRT] [VL-BERT] [OSCAR] 2021 [CLIP] [VinVL] [ALIGN] [VirTex] [ALBEF] [Conceptual 12M (CC12M)] 2022 [FILIP] [Wukong] [LiT] [Flamingo] [FLAVA] 2023 [GPT-4]
3.2. Text-to-Image Generation
2021 [DALL·E]
3.3. Image Captioning
2015 [m-RNN] [R-CNN+BRNN] [Show and Tell/NIC] [Show, Attend and Tell] [LRCN] 2017 [Visual N-Grams] 2018 [Conceptual Captions]
3.4. Video Captioning
2015 [LRCN] 2017 [Something Something] 2019 [VideoBERT]
4. Medical Image Analysis
4.1. Biomedical Image Classification
2017 [ChestX-ray8] 2019 [CheXpert] 2020 [VGGNet for COVID-19] [Dermatology] [Deep-COVID] [Zeimarani ACCESS’20] [Multiview CNN] 2021 [CheXternal] [CheXtransfer] [CheXbreak]
4.2. Biomedical Image Segmentation
2015 [U-Net] 2016 [CUMedVision1] [CUMedVision2 / DCAN] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [Moeskops MICCAI’16] 2017 [M²FCN] [Suggestive Annotation (SA)] [3D U-Net+ResNet] [Cascaded 3D U-Net] [DenseVoxNet] [Atlas-Aware Net] 2018 [QSA+QNT] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [UNet++] [H-DenseUNet] [MDU-Net] [Probabilistic U-Net] [CC-3D-FCN] 2019 [DUNet] [NN-Fit] [DUnet & ResDUnet] [Channel-UNet] [HyperDense-Net] [DALS] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Non-local U-Net] [SAUNet] [SDM] [DIU-Net] [Chen FCVM’20] [cGAN+AC+CAW] [RA-UNet] [SD-UNet] [DoFE] [Inception U-Net] [RefineU-Net] 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin-Unet] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] [SK-Unet] 2022 [UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] 2023 [DCSAU-Net]
4.3. Biomedical Image Multi-Task Learning
2018 [ResNet+Mask R-CNN] [cU-Net+PE] [Multi-Task Deep U-Net] [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] [Qu ISBI’19] 2020 [BUSI] [Song JBHI’20] [cGAN JESWA’20] 2021 [Ciga JMEDIA’21] [CMSVNetIter]
4.4. Biomedical Image Self-Supervised Learning
2018 [Spitzer MICCAI’18] 2019 [Rubik’s Cube] [Context Restoration] 2020 [ConVIRT] [Rubik’s Cube+] 2021 [MICLe] [MoCo-CXR] [DVME] [MedAug] 2022 [BT-Unet] [Taleb JDiagnostics’22] [Self-Supervised Swin UNETR]
4.5. Biomedical Image Semi-Supervised Learning
5. Image Generation Related
5.1. Generative Adversarial Network (GAN)
Image Synthesis: 2014 [GAN] [CGAN] 2015 [LAPGAN] 2016 [AAE] [DCGAN] [CoGAN] [VAE-GAN] [InfoGAN] [Improved DCGAN, Inception Score] 2017 [SimGAN] [BiGAN] [ALI] [LSGAN] [EBGAN] [PBT] 2019 [SAGAN]
Image-to-image Translation: 2017 [Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT]
Machine Translation: 2018 [UNMT]
Super Resolution: 2017 [SRGAN & SRResNet] [EnhanceNet] 2018 [ESRGAN]
Blur Detection: 2019 [DMENet]
Medical Imaging: 2018 [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] 2020 [cGAN JESWA’20]
Camera Tampering Detection: 2019 [Mantini’s VISAPP’19]
Video Coding: 2018 [VC-LAPGAN] 2020 [Zhu TMM’20] 2021 [Zhong ELECGJ’21]
5.2. Image Generation
2018 [Image Transformer] 2021 [Performer]
5.3. Style Transfer
2016 [Image Style Transfer] [Perceptual Loss]
6. Image Reconstruction Related
6.1. Single Image Super Resolution (SISR)
2014–2016 [SRCNN] 2016 [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [Perceptual Loss] 2017 [DnCNN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [BT-SRN] [EDSR & MDSR] [EnhanceNet] 2018 [MWCNN] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [ZSSR] [MSRN] [Image Transformer] 2019 [SR+STN] [IDBP-CNN-IA] [SRFBN] [OISR] 2020 [PRLSR] [CSFN & CSFN-M]
6.2. Image Restoration
2008 [Jain NIPS’08] 2016 [RED-Net] [GDN] 2017 [DnCNN] [MemNet] [IRCNN] [WDRN / WavResNet] 2018 [MWCNN] 2019 [IDBP-CNN-IA]
6.3. Video Super Resolution (VSR)
2017 [STMC / VESPCN] 2018 [VSR-DUF / DUF] 2019 [EDVR]
6.4. Video Frame Interpolation / Extrapolation
2016 [Mathieu ICLR’16] 2017 [AdaConv] [SepConv] 2020 [DSepConv] 2021 [SepConv++]
7. My Tutorials
7.1. Models & Related Knowledge
[LSTM Model from Kaggle] [Transformer from D2L.ai] [Transformer from Google TensorFlow] [Normalized Graph Laplacian]
7.2. Linux
[Ubuntu Installation] [NVIDIA Driver Installation (2018) (Old)] [OpenSSH Installation] [Hard Drive Partitioning/Formatting/Mounting] [Add 1 More GPU] [TeamViewer Installation] [NVIDIA Driver Installation (2019)]
7.3. Windows
[Anaconda + Spyder + TensorFlow 2.0 @ Windows 10] [OpenCV v4.2.0 Installation in Windows 10] [Converting PNG to YUV] [CUDA, cuDNN, Anaconda, Jupyter, PyTorch Installation in Windows 10]
7.4. Docker
[Docker Installation] [Pulling Image] [Running Image] [Exporting/Saving Image] [Nvidia-Docker 2.0 Installation] [Docker Installation in WSL 2 of Windows]
7.5. Caffe
[Image Classification] [Handwritten Digit Classification] [Style Recognition]
7.6. HDF5
8. My Sharings
- Sharing: From 100 To 60000 Views Per Month
- Sharing: Take “AI for Everyone” Course Or Not? — A Course By deeplearning.ai
- Sharing — ChatGPT: Comments From Jensen Huang, Bill Gates, Elon Musk, Sam Altman, Yann LeCun, and Cathie Wood
- Sharing — RTX Video Super Resolution (VSR)
- Sharing: The Growing Influence of Industry in AI Research
- My Paper Readings about IQA/VQA, and Camera Tampering/Blur/Soiling Detection
- My Paper Readings and Tutorials About Video Coding
Again, thanks for visiting my Medium stories. :)