Summary: My Paper Reading Lists, Tutorials & Sharings

From Image Classification, Object Detection, Natural Language Processing (NLP), Self-Supervised Learning, Semi-Supervised Learning, Vision-Language, Generative Adversarial Network (GAN) to …

Sik-Ho Tsang
12 min readMar 15, 2020

In this story, as the list is too long to be posted in each story, a list of my paper readings, tutorials and also sharings are posted here for convenience and will be updated from time to time.

Actually, I wrote what I’ve learnt only. Reading a paper can consume hours or days. Sometimes, it is quite luxury to read a paper. I hope I can dig out some important points in the paper, or help reading the papers at a faster pace. But if there are some papers that you’re particularly interested in, it’s better to read the original papers for more detailed explanations. If there are something wrong, please also tell me. Thank you. (Sik-Ho Tsang @ Medium)

  • Thanks everyone for reading my stories.
  • Your claps are important to me for continuing my writing as well !!!

1. Computer Vision

1.1. Image Classification

1989-1998 [LeNet] 2010–2014 [ReLU] [AlexNet & CaffeNet] [Dropout] [Maxout] [NIN] [ZFNet] [SPPNet] [Distillation] 2015 [VGGNet] [Highway] [PReLU-Net] [STN] [DeepImage] [GoogLeNet / Inception-v1] [BN-Inception / Inception-v2] [All-CNN] [RCNN] 2016 [SqueezeNet] [Inception-v3] [ResNet] [Pre-Activation ResNet] [RiR] [Stochastic Depth] [WRN] [Trimps-Soushen] [GELU] [Layer Norm, LN] [Weight Norm, WN] [ELU] [Veit NIPS’16] 2017 [Inception-v4] [Xception] [MobileNetV1] [Shake-Shake] [Cutout] [FractalNet] [PolyNet] [ResNeXt] [DenseNet] [PyramidNet] [DRN] [DPN] [Residual Attention Network] [IGCNet / IGCV1] [Deep Roots] [CWN] [RevNet] 2018 [RoR] [DMRNet / DFN-MR] [MSDNet] [ShuffleNet V1] [SENet] [NASNet] [MobileNetV2] [CondenseNet] [IGCV2] [IGCV3] [FishNet] [SqueezeNext] [ENAS] [PNASNet] [ShuffleNet V2] [BAM] [CBAM] [MorphNet] [NetAdapt] [mixup] [DropBlock] [Group Norm (GN)] [Pelee & PeleeNet] [DLA] [Swish] [CoordConv] 2019 [ResNet-38] [AmoebaNet] [ESPNetv2] [MnasNet] [Single-Path NAS] [DARTS] [ProxylessNAS] [MobileNetV3] [FBNet] [ShakeDrop] [CutMix] [MixConv] [EfficientNet] [ABN] [SKNet] [CB Loss] [AutoAugment, AA] [BagNet] [Stylized-ImageNet] [FixRes] [SASA] [SE-WRN] [SGELU] [ImageNet-V2] [Bag of Tricks, ResNet-D] [PBA] [Fast AutoAugment (FAA)] [Switch Norm (SN)] 2020 [Random Erasing (RE)] [SAOL] [AdderNet] [FixEfficientNet] [BiT] [RandAugment] [ImageNet-ReaL] [ciFAIR] [ResNeSt] [Batch Augment, BA] [Mish] [WS, BCN] [AdvProp] [RegNet] [SAN] [Cordonnier ICLR’20] [ICMLM] [Self-Training] [SupCon] [Open Images] [Axial-DeepLab] [GhostNet] [ECA-Net] [MobileNeXt] [Dynamic ReLU] 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] [V-MoE] [ImageNet-21K Pretraining] [Do You Even Need Attention?] [ResT / ResTv1] [ViL] [ReLabel] [MixToken / LV-ViT] [gMLP] [MViT / MViTv1] [CLIP] [GFNet] [Res2Net] [Sharpness-Aware Minimization (SAM)] [Transformer-LS] [R-Drop] [ParNet] [LeViT] [BotNet] [CrossViT] [Tuli CogSci’21] [Coordinate Attention (CA)] [DeepViT] [PiT] [HRNetV2, HRNetV2p] [Raghu NeurIPS’21] [SoftPool] 2022 [ConvNeXt] [PVTv2] [ViT-G] [AS-MLP] [ResTv2] [CSWin Transformer] [Pale Transformer] [Sparse MLP] [MViTv2] [S²-MLP] [CycleMLP] [MobileOne] [GC ViT] [VAN] [ACMix] [CVNets] [MobileViT] [RepMLP] [RepLKNet] [MetaFormer, PoolFormer] [Swin Transformer V2] [hMLP] [DeiT III] [GhostNetV2] [C-GhostNet & G-GhostNet] [AlterNet] [DHVT] [CrossFormer] [DynaMixer] [FocalNet] [WideNet] [CMT] [EfficientFormer] [OpenCLIP] 2023 [Vision Permutator (ViP)] [ConvMixer] [CrossFormer++] [FastViT] [EfficientFormerV2]

1.2. Unsupervised/Self-Supervised Learning

1993 [de Sa NIPS’93] 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] [Wang ICCV’15] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Motion Masks] [Doersch ICCV’17] [TextTopicNet] [Counting] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1] [Instance Discrimination] [Spot Artifacts] 2019 [Ye CVPR’19] [S⁴L] [Goyal ICCV’19] [Rubik’s Cube] [AET] [Deep InfoMax (DIM)] [AMDIM] [Local Aggregation (LA)] [DeeperCluster] 2020 [CMC] [MoCo] [CPCv2] [PIRL] [SimCLR] [MoCo v2] [iGPT] [BoWNet] [BYOL] [SimCLRv2] [BYOL+GN+WS] [CompRess] [MoCo v2+Distillation] [SeLa] [SwAV] 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [Barlow Twins] [W-MSE] [SimSiam+AL] [BYOL+LP] [SEED] [SEER] [SplitMask] [SimReg] [MoCLR, DnC] 2022 [BEiT] [BEiT V2] [Masked Autoencoders (MAE)] [DiT] [SimMIM] [LDBM] [data2vec] [SEER 10B, RG-10B] [iBOT]

1.3. Pretraining or Weakly/Semi-Supervised Learning

2004 [Entropy Minimization, EntMin] 2013 [Pseudo-Label (PL)] 2015 [Ladder Network, Γ-Model] 2016 [Sajjadi NIPS’16] [Improved DCGAN, Inception Score] 2017 [Mean Teacher] [PATE & PATE-G] [Π-Model, Temporal Ensembling] 2018 [WSL] [Oliver NeurIPS’18] 2019 [VAT] [Billion-Scale] [Label Propagation] [Rethinking ImageNet Pre-training] [MixMatch] [SWA & Fast SWA] [S⁴L] [Kolesnikov CVPR’19] 2020 [BiT] [Noisy Student] [SimCLRv2] [UDA] [ReMixMatch] [FixMatch] [Self-Training] 2021 [Curriculum Labeling (CL)] [Su CVPR’21] [Exemplar-v1, Exemplar-v2] [SimPLE] [BYOL+LP]

1.4. Object Detection

2014 [OverFeat] [R-CNN] 2015 [Fast R-CNN] [Faster R-CNN] [MR-CNN & S-CNN] [DeepID-Net] 2016 [OHEM] [CRAFT] [R-FCN] [ION] [MultiPathNet] [Hikvision] [GBD-Net / GBD-v1 & GBD-v2] [SSD] [YOLOv1] 2017 [NoC] [G-RMI] [TDM] [DSSD] [YOLOv2 / YOLO9000] [FPN] [RetinaNet] [DCN / DCNv1] [Light-Head R-CNN] [DSOD] [CoupleNet] 2018 [YOLOv3] [Cascade R-CNN] [MegDet] [StairNet] [RefineDet] [CornerNet] [Pelee & PeleeNet] [SiLU] 2019 [DCNv2] [Rethinking ImageNet Pre-training] [GRF-DSOD & GRF-SSD] [CenterNet] [Grid R-CNN] [NAS-FPN] [ASFF] [Bag of Freebies] [VoVNet/OSANet] [FCOS] [GIoU] 2020 [EfficientDet] [CSPNet] [YOLOv4] [SpineNet] [DETR] [Mish] [PP-YOLO] [Open Images] [YOLOv5] [CornerNet-Lite] [ATSS] 2021 [Scaled-YOLOv4] [PVT, PVTv1] [Deformable DETR] [HRNetV2, HRNetV2p] [MDETR] [TPH-YOLOv5] [YOLOX] [TOOD] [ViT-YOLO] [YOLOS] 2022 [Pix2Seq] [MViTv2] [SF-YOLOv5] [GLIP] [TPH-YOLOv5++] [YOLOv6] [ViDT] [ViTDet] 2023 [YOLOv7] [YOLOv8] 2024 [YOLOv9]

1.5. Semantic Segmentation / Scene Parsing / Instance Segmentation / Panoptic Segmentation

2014-2015 [SDS] [FCN] [DeconvNet] [DeepLabv1 & DeepLabv2] [CRF-RNN] [SegNet] [DPN] [Hypercolumn] [DeepMask] [DecoupledNet] [Weakly-Supervised EM] 2016 [ENet] [ParseNet] [DilatedNet] [Cityscapes] [SharpMask] [MultiPathNet] [MNC] [InstanceFCN] [TransferNet] 2017 [DRN] [RefineNet] [ERFNet] [GCN] [PSPNet] [DeepLabv3] [LC] [FC-DenseNet] [IDW-CNN] [DIS] [SDN] [Cascade-SegNet & Cascade-DilatedNet] [FCIS] [Mask R-CNN] [SPN] 2018 [ESPNet] [ResNet-DUC-HDC] [DeepLabv3+] [PAN] [DFN] [EncNet] [DLA] [Non-Local Neural Networks] [UPerNet] [PSANet] [Probabilistic U-Net] [MaskLab] [PANet] [Mask X R-CNN] [MaskLab] [PersonLab] [ResUNet] [TernausNet] 2019 [ResNet-38] [C3] [ESPNetv2] [ADE20K] [Semantic FPN, Panoptic FPN] [Auto-DeepLab] [DANet] [Improved U-Net] [Gated-SCNN] [Recurrent U-Net (R-UNet)] [EFCN] [DCNv2] [Rethinking ImageNet Pre-training] [HTC] [YOLACT] [MS R-CNN] [PS] [UPSNet] [DeeperLab] [Bellver CVPRW’19] 2020 [DRRN Zhang JNCA’20] [Trans10K, TransLab] [CCNet] [Open Images] [DETR] [Panoptic-DeepLab] [Axial-DeepLab] [Zhang JNCA’20] [CenterMask] 2021 [PVT, PVTv1] [SETR] [Trans10K-v2, Trans2Seg] [Copy-Paste] [HRNetV2, HRNetV2p] [Lite-HRNet] 2022 [YOLACT++] 2023 [Segment Anthing Model (SAM)]

1.6. Face Recognition

2005 [Chopra CVPR’05] 2010 [ReLU] 2014 [DeepFace] [DeepID2] [CASIANet] 2015 [FaceNet] 2016 [N-pair-mc Loss]

1.7. Human Pose Estimation

2014–2015 [DeepPose] [Tompson NIPS’14] [Tompson CVPR’15] 2016 [CPM] [FCGN] [IEF] [DeepCut & DeeperCut] [Newell ECCV’16 & Newell POCV’16] 2017 [G-RMI] [CMUPose & OpenPose] [Mask R-CNN] [RMPE] 2018 [PersonLab] [CPN] 2019 [OpenPose] [HRNet / HRNetV1] 2020 [A-HRNet] [Dynamic ReLU] 2021 [HRNetV2, HRNetV2p] [Lite-HRNet]

1.8. Video Classification / Action Recognition

2014 [Deep Video] [Two-Stream ConvNet] 2015 [DevNet] [C3D] [LRCN] 2016 [TSN] 2017 [Temporal Modeling Approaches] [4 Temporal Modeling Approaches] [P3D] [I3D] [Something Something] 2018 [Non-Local Neural Networks] [S3D, S3D-G] 2019 [VideoBERT] [Moments in Time] 2021 [MViT / MViTv1] [MViTv2] [SoftPool]

1.9. Weakly Supervised Object Localization (WSOL)

2014 [Backprop] 2016 [CAM] 2017 [Grad-CAM] [Hide-and-Seek] 2018 [Grad-CAM++] [ACoL] [SPG] 2019 [CutMix] [ADL] 2020 [Evaluating WSOL Right] [SAOL]

1.10. Visualization

2002 [SNE] 2006 [Autoencoder] [DrLIM] 2007 [UNI-SNE] 2008 [t-SNE] 2018 [Loss Landscape]

1.11. Data-Centric AI

2021 [SimSiam+AL] [BYOL+LP] 2022 [Small is the New Big] [DataPerf]

2. Natural Language Processing (NLP)

2.1. Language Model (LM)

2007 [Bengio TNN’07] 2013 [Word2Vec] [NCE] [Negative Sampling] 2014 [GloVe] [Doc2Vec] [DT-RNN, DOT-RNN, sRNN] 2015 [Skip-Thought] [IRNN] [ConvLSTM] 2016 [GCNN/GLU] [context2vec] [Jozefowicz arXiv’16] [LSTM-Char-CNN] [Layer Norm, LN] 2017 [TagLM] [CoVe] [MoE] [fastText] 2018 [GLUE] [T-DMCA] [GPT, GPT-1] [ELMo] 2019 [T64] [Transformer-XL] [BERT] [RoBERTa] [GPT-2] [DistilBERT] [MT-DNN] [Sparse Transformer] [SuperGLUE] [FAIRSEQ] [XLNet] [XLM] [UniLM] [ERNIE 1.0] [SciBERT] 2020 [ALBERT] [T5] [Pre-LN Transformer] [MobileBERT] [TinyBERT] [BART] [Longformer] [ELECTRA] [Megatron-LM] [SpanBERT] [UniLMv2] [DeFINE] [BIGBIRD] [ReGLU, GEGLU & SwiGLU] [ERNIE 2.0] [XLM-R] [ERNIE-Doc] 2021 [Performer] [gMLP] [Roformer] [PPBERT] [DeBERTa] [DeLighT] [Transformer-LS] [R-Drop] [mT5] [ERNIE 3.0] [nmT5] [C4] [MMLU] 2022 [GLM] [Switch Transformers] [WideNet] [MoEBERT] [X-MoE] [sMLP] [LinkBERT, BioLinkBERT] [AlphaCode] [Block-wise Dynamic Quantization] 2023 [ERNIE-Code] [Grouped-Query Attention (GQA)]

2.2. Large Langauge Model (LLM)

2020 [GPT-3] 2021 [Jurassic-1] [Gopher] [Codex] [ERNIE 3.0 Titan] 2022 [GPT-NeoX-20B] [GPT-3.5, InstructGPT, ChatGPT] [MT-NLG 530B] [Chinchilla] [PaLM] [AlexaTM] [BLOOM] [AlexaTM 20B] [OPT] [LaMDA] [Galactica] [DeepSpeed-MoE] [GLaM] 2023 [GPT-4] [LLaMA] [Koala] [BloombergGPT] [GLM-130B] [UL2] [PaLM 2] [Llama 2] [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [Flan 2022, Flan-T5] [AlphaCode 2] [Mistral 7B]

2.3 LM Tuning / Prompting

2019 [BERT for Text Classification] 2020 [Human Feedback Model] 2021 [T5+LM, Prompt Tuning] [Prefix-Tuning] 2022 [GPT-3.5, InstructGPT] [LoRA] [Chain-of-Thought Prompting] [T0] [FLAN] [UL2R, U-PaLM] [Flan-PaLM] [Tk-INSTRUCT] 2023 [LIMA] [SELF-INTRUCT] [Self-Consistency] [Med-PaLM 2] [QLoRA, Guanaco] 2024 [LLaMA-Adapter]

2.4. Neural Machine Translation (NMT)

2013 [Translation Matrix] 2014 [Seq2Seq] [RNN Encoder-Decoder] 2015 [Attention Decoder/RNNSearch] 2016 [GNMT] [ByteNet] [Deep-ED & Deep-Att] [Byte Pair Encoding (BPE)] [Back Translation] 2017 [ConvS2S] [Transformer] [MoE] [GMNMT] [CoVe] [PBT] 2018 [Shaw NAACL’18] [CSLS] [Back Translation+Sampling] [UNMT] [SentencePiece] 2019 [AdaNorm] [GPT-2] [Pre-Norm Transformer] [FAIRSEQ] [XLM] [Multi-Query Attention (MQA)] 2020 [Batch Augment, BA] [GPT-3] [T5] [Pre-LN Transformer] [OpenNMT] [DeFINE] [MUTE] [BERTScore] 2021 [ResMLP] [GPKD] [Roformer] [DeLighT] [R-Drop] 2022 [DeepNet] [PaLM] [BLOOM] [AlexaTM 20B] 2023 [Grouped-Query Attention (GQA)]

2.5. Summarization

2018 [T-DMCA] 2020 [Human Feedback Model] 2022 [GPT-3.5, InstructGPT] 2023 [DetectGPT]

2.6. Dense Text Retrieval

2019 [Sentence-BERT (SBERT)] 2020 [Retrieval-Augmented Generation (RAG)] [Dense Passage Retriever (DPR)] 2021 [Fusion-in-Decoder] [Augmented SBERT (AugSBERT)]

2.7. Question Answering (QA)

2016 [SQuAD 1.0/1.1] 2017 [Dynamic Coattention Network (DCN)] 2018 [SQuAD 2.0]

3. Speech / Audio / Acoustic Signal Processing

3.1. Acoustic Model / Automatic Speech Recognition (ASR) / Speech-to-Text Modeling

1991 [MoE] 1997 [Bidirectional RNN (BRNN)] 2005 [Bidirectional LSTM (BLSTM)] 2013 [SGD+CR] [Leaky ReLU] 2014 [GRU] [Deep KWS] 2015 [LibriSpeech] [ARSG] 2016 [Listen, Attend and Spell (LAS)] 2017 [CNN for KWS] 2018 [Speech Commands] 2019 [SpecAugment] [Cnv Cxt Tsf] 2020 [FAIRSEQ S2T] [PANNs]

3.2. Sound Classification / Audio Tagging / Sound Event Detection (SED)

2015 [ESC-50, ESC-10, ESC-US] 2017 [AudioSet / Audio Set] [M3, M5, M11, M18, M34-res (DaiNet)] [Sample-Level DCNN (LeeNet)] 2021 [Audio Spectrogram Transformer (AST)]

3.4. Self-Supervised Learning

2019 [wav2vec]

4. Foundation Model, Large Multimodal Model (LMM), or Multimodal Signal Processing (Audio / Vision / Language)

4.1. Foundation Model (Text, Visual & Audio)

2023 [Gemini]

4.2. Visual/Vision/Video Language Model (VLM)

2017 [Visual Genome (VG)] 2018 [Conceptual Captions] 2019 [VideoBERT] [VisualBERT] [LXMERT] [ViLBERT] 2020 [ConVIRT] [VL-BERT] [OSCAR] 2021 [CLIP] [VinVL] [ALIGN] [VirTex] [ALBEF] [Conceptual 12M (CC12M)] [MDETR] [Florence] 2022 [FILIP] [Wukong] [LiT] [Flamingo] [FLAVA] [SimVLM] [VLMo] [BEiT-3] [GLIP] [OpenCLIP] [CoOp] [CoCoOp] 2023 [GPT-4] [GPT-4V(ision)] [MultiModal-CoT] [CoCa] [Florence-2] [PaLI]

4.3. Text-to-Image Generation

2016 [GAN-CLS, GAN-INT, GAN-CLS-INT] 2017 [StackGAN, StackGAN-v1] 2018 [StackGAN++, StackGAN-v2] 2021 [DALL·E]

4.4. Image Captioning

2015 [m-RNN] [R-CNN+BRNN] [Show and Tell/NIC] [Show, Attend and Tell] [LRCN] 2017 [Visual N-Grams] 2018 [Conceptual Captions]

4.5. Video Captioning

2015 [LRCN] 2017 [Something Something] 2019 [VideoBERT]

5. Healthcare/Medical Related

2019 [DL in Healthcare]

5.1. Biomedical/Medical Image Classification

2017 [ChestX-ray8] [CheXNet] 2019 [CheXpert] 2020 [VGGNet for COVID-19] [Dermatology] [Deep-COVID] [Zeimarani ACCESS’20] [Multiview CNN] 2021 [CheXternal] [CheXtransfer] [CheXbreak] 2022 [BUS-CNN] [CheXED]

5.2. Biomedical/Medical Image Segmentation

2015 [U-Net] 2016 [CUMedVision1] [CUMedVision2 / DCAN] [CFS-FCN] [U-Net+ResNet] [MultiChannel] [V-Net] [3D U-Net] [Moeskops MICCAI’16] 2017 [M²FCN] [Suggestive Annotation (SA)] [3D U-Net+ResNet] [Cascaded 3D U-Net] [DenseVoxNet] [Atlas-Aware Net] 2018 [QSA+QNT] [Attention U-Net] [RU-Net & R2U-Net] [VoxResNet] [UNet++] [H-DenseUNet] [MDU-Net] [Probabilistic U-Net] [CC-3D-FCN] 2019 [DUNet] [NN-Fit] [DUnet & ResDUnet] [Channel-UNet] [HyperDense-Net] [DALS] [UNet+Up+SKM] 2020 [MultiResUNet] [UNet 3+] [Dense-Gated U-Net (DGNet)] [Non-local U-Net] [SAUNet] [SDM] [DIU-Net] [Chen FCVM’20] [cGAN+AC+CAW] [RA-UNet] [SD-UNet] [DoFE] [Inception U-Net] [RefineU-Net] [DoubleU-Net] [PraNet] 2021 [Expanded U-Net] [3-D RU-Net] [nnU-Net] [TransUNet] [CoTr] [TransBTS] [Swin UNETR] [RCU-Net] [IBA-U-Net] [PRDNet] [Up-Net] [SK-Unet] [ResUNet++] [LeViT-UNet] [FFANet] [Medical Transformer (MedT)] 2022 [UNETR] [Half-UNet] [BUSIS] [RCA-IUNet] [Swin-Unet] [DS-TransUNet] [UNeXt] [AdwU-Net] [TransUNetV2] [Swin-SFTNet] 2023 [DCSAU-Net] [RMMLP] [BTS-ST]

5.3. Biomedical/Medical Image Multi-Task Learning

2018 [ResNet+Mask R-CNN] [cU-Net+PE] [Multi-Task Deep U-Net] [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] [Qu ISBI’19] [FCN+S-Net+C-Net] 2020 [BUSI] [Song JBHI’20] [cGAN JESWA’20] 2021 [Ciga JMEDIA’21] [CMSVNetIter] [MultiMix] 2023 [FFANet+MTL] [U-Net+MTL] [VENet] [COVID-MTL]

5.4. Biomedical/Medical Image Self-Supervised Learning

2018 [Spitzer MICCAI’18] 2019 [Rubik’s Cube] [Context Restoration] 2020 [ConVIRT] [Rubik’s Cube+] 2021 [MICLe] [MoCo-CXR] [DVME] [MedAug] 2022 [BT-Unet] [Taleb JDiagnostics’22] [Self-Supervised Swin UNETR] [Self-Supervised Multi-Modal] 2023 [Multi-Modal ResUNet+ASPP+HAFB]

5.5. Biomedical/Medical Image Semi-Supervised Learning

2019 [UA+MT] 2020 [SASSNet] [FocalMix]

5.6. Medical/Clinical/Healthcare NLP/LLM

2017 [LiveQA] 2018 [Clinical NLP Overview] 2019 [MedicationQA] [G-BERT] [PubMedQA] [Clinical BERT] [SciBERT] [BLUE] 2020 [BioBERT] [BEHRT] [COVID-Q] [COVID-QA] 2021 [MedGPT] [Med-BERT] [MedQA] [BLURB, PubMedBERT] [MMLU] [COVID-19 Chatbot Using BERT] 2022 [MedMCQA] [LinkBERT, BioLinkBERT] 2023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE] [Regulatory Oversight of LLM] [ExBEHRT] [ChatDoctor] [DoctorGLM] [HuaTuo] 2024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology] [LLM on Clicical Text Summarization]

5.7. Phonocardiogram (PCG)/Heart Sound Classification

2013 [PASCAL Dataset] 2016 [PhysioNet/CinC Challenge 2016 Dataset] [20 Ensemble FFNN] [MFCC+CNN CinC’16] 2017 [Modified LeNet & Modified AlexNet] [MFCC+CNN arXiv’17] [MFSC+CNN J. Physil. Meas.’17] 2018 [RNN Variants] [Yaseen GitHub Dataset] [Chakir JSVIP’18] [Spectrogram+CNN] [MFCC+ANN] [CWT Scalogram+CNN EMBC’18] [MFCC+CNN+Optimizied λ] 2019 [AlexNet/VGG + SVM] [LSTM MDPI J. Sensors’19] [LHSNN] [Wavelet+GRU] [ResHNet] [DAE+1D CNN] [CWT Scalogram+AlexNet] [TF-ECNN] [AMDF+LSTM] [IFE+RF, IFE+kNN] 2020 [1D-CNN] [WaveNet] [Power Features+KNN] [Improved MFCC + CRNN] [Ensemble Learning] [HSS Dataset] [Li BioMed Research Int’20] [F-NN Net-4] [2-Layer 64-Unit GRU] [GAN for Normal Heart Sound Synthesis] [5-Layer 1D-CNN] [Pretrained PANN] [MFSWT + CNN] [tConv-CNN] [497 Features + 1D-CNN] 2021 [CardioXNet] [CNN & RNN Overview] [XGBoost + LSTM] [MFCC + CNN] [WSS, SSG] [Multimodal CNN] 2022 [CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] [Learnable Features + VGGNet/EfficientNet] [DWT+SVM] [MFCC+LSTM] [DWT+1D-CNN] [CNN+Attention] [RF/SVM+GA] [NARX] [Chaogram + Inception-v3] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)] [DL Overview] [MFCC + k-NN / RF / ANN / SVM + Grid Search] [Long-Short Term Features (LSTF)] [WST+1D-CNN and CST+2D-CNN Ensemble] [CTENN] [Bispectrum + ViT] [Multi-Feature + CNN-1D] [MFCC + Patient Features + RF] [Survey for Detecting Heart Diseases] 2024 [MWRS-BFSC + CNN2D]

6. Image Generation Related

6.1. Generative Adversarial Network (GAN)

Image Synthesis: 2014 [GAN] [CGAN] 2015 [LAPGAN] 2016 [AAE] [DCGAN] [CoGAN] [VAE-GAN] [InfoGAN] [Improved DCGAN, Inception Score] 2017 [SimGAN] [BiGAN] [ALI] [LSGAN] [EBGAN] [PBT] [WGAN] [WGAN-GP] [TTUR, Fréchet Inception Distance (FID)] [StackGAN, StackGAN-v1] [AC-GAN] 2018 [SNGAN] [StackGAN++, StackGAN-v2] [Progressive GAN] 2019 [SAGAN] [BigGAN] [BigBiGAN] 2020 [GAN Overview]
Text-to-Image Generation: 2016 [GAN-CLS, GAN-INT, GAN-CLS-INT] 2017 [StackGAN, StackGAN-v1] 2018 [StackGAN++, StackGAN-v2]
Image-to-image Translation: 2017
[Pix2Pix] [UNIT] [CycleGAN] 2018 [MUNIT] [StarGAN] [pix2pixHD] [SaGAN] [Mask Contrastive-GAN]
Style Transfer: 2016 [GAN-CLS, GAN-INT, GAN-CLS-INT] 2019 [StyleGAN]
Machine Translation: 2018
[UNMT]
Super Resolution: 2017
[SRGAN & SRResNet] [EnhanceNet] 2018 [ESRGAN]
Blur Detection: 2019 [DMENet]
Medical Imaging: 2018 [cGAN-AutoEnc & cGAN-Unet] 2019 [cGAN+AC+CAW] 2020 [cGAN JESWA’20]
Heart Sound Classification: 2020 [GAN for Normal Heart Sound Synthesis]
Camera Tampering Detection: 2019
[Mantini’s VISAPP’19]
Video Coding: 2018
[VC-LAPGAN] 2020 [Zhu TMM’20] 2021 [Zhong ELECGJ’21]

6.2. Image Generation

2018 [Image Transformer] 2021 [Performer]

6.3. Style Transfer

2016 [Artistic Style Transfer] [Image Style Transfer] [Perceptual Loss] [GAN-CLS, GAN-INT, GAN-CLS-INT] [Texture Nework] [Instance Norm (IN)] 2017 [StyleNet] [AdaIN] 2019 [StyleGAN]

7. Image Reconstruction Related

7.1. Single Image Super Resolution (SISR)

2014–2016 [SRCNN] 2016 [FSRCNN] [VDSR] [ESPCN] [RED-Net] [DRCN] [Perceptual Loss] 2017 [DnCNN] [DRRN] [LapSRN & MS-LapSRN] [MemNet] [IRCNN] [WDRN / WavResNet] [SRDenseNet] [SRGAN & SRResNet] [SelNet] [CNF] [BT-SRN] [EDSR & MDSR] [EnhanceNet] 2018 [MWCNN] [MDesNet] [RDN] [SRMD & SRMDNF] [DBPN & D-DBPN] [RCAN] [ESRGAN] [CARN] [IDN] [ZSSR] [MSRN] [Image Transformer] 2019 [SR+STN] [IDBP-CNN-IA] [SRFBN] [OISR] 2020 [PRLSR] [CSFN & CSFN-M]

7.2. Image Restoration

2008 [Jain NIPS’08] 2016 [RED-Net] [GDN] 2017 [DnCNN] [MemNet] [IRCNN] [WDRN / WavResNet] 2018 [MWCNN] 2019 [IDBP-CNN-IA]

7.3. Video Super Resolution (VSR)

2017 [STMC / VESPCN] 2018 [VSR-DUF / DUF] 2019 [EDVR]

7.4. Video Frame Interpolation / Extrapolation

2016 [Mathieu ICLR’16] 2017 [AdaConv] [SepConv] 2020 [DSepConv] 2021 [SepConv++]

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.