Review: Image Transformer

Image Generation and Super Resolution Using Transformer

Left: Super Resolution, Right: Image Generation (Input / Image Transformer Results / Ground Truth)

Outline

1. Image Transformer

1.1. One Layer of Image Transformer

A slice of one layer of the Image Transformer

2. Local Self-Attention

1D Local Attention and 2D Local Attention

2.1. 1D Local Attention

2.2. 2D Local Attention

3. Experimental Results

3.1. Image Generation

Image completions from the proposed best conditional generation model
Conditional image generations for all CIFAR-10 categories
Bits/dim on CIFAR-10 test and ImageNet validation sets

3.2. Super Resolution

Four-fold super-resolution model trained on CIFAR-10, images look realistic and plausible

3.3. Super Resolution on CelebA

Negative log-likelihood and human eval performance for the Image Transformer on CelebA
Images from our 1D and 2D local attention super-resolution models trained on CelebA, sampled with different temperatures

Reference

Single Image Super Resolution (SISR)

My Other Previous Paper Readings

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG