Review — Show, Attend and Tell: Neural Image Caption Generation

With Attention, Show, Attend and Tell Outperforms Show and Tell

Show, Attend and Tell (Figure from


Show, Attend and Tell Network Architecture

1. CNN Encoder

2. Attention Decoder

2.1. Attention Decoder

Left: CNN Encoder, Right: Attention Decoder
Relationships between annotation vectors ai and weights αit (Figure from

2.2. Stochastic “Hard” Attention & Deterministic “Soft” Attention

Soft and Hard Attention (Figure from
Soft Attention (Figure from
Hard Attention (Figure from
Examples of soft (top) and hard (bottom) attentions

3. Experimental Results

BLEU-1,2,3,4/METEOR metrics compared to other methods,
Examples of attending to the correct object
Examples of mistakes where we can use attention to gain intuition into what the model saw


Natural Language Processing (NLP)

My Other Previous Paper Readings

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List: