1. Neural Image Caption (NIC) Network Architecture

1.1. Objective

1.2. LSTM as RNN

1.3. BN-Inception / Inception-v2 as CNN

1.4. Overview

1.5. Training

1.6. Inference

2. Experimental Results

2.1. Datasets

The statistics of the datasets

2.2. BLEU

BLEU-1 scores

2.3. Sentence Diversity

N-best examples from the MSCOCO test set. Bold lines indicate a novel sentence not present in the training set

2.4. Qualitative Results

A selection of evaluation results, grouped by human rating
Nearest neighbors of a few example words


