# Brief Review — RCNN: Recurrent Convolutional Neural Network for Object Recognition

## RCNN, CNN+RNN for Image Classification

Recurrent Convolutional Neural Network for Object Recognition, by Tsinghua University

RCNN2015 CVPR, Over 1000 Citations(Sik-Ho Tsang @ Medium)

Image Classification, CNN, RNN

- A
**Recurrent CNN (RCNN)**is proposed for image classification, where**the convolutional layer is recurrently used for multiple times**.

# Outline

**Recurrent CNN (RCNN)****Results**

**1. Recurrent CNN (RCNN)**

## 1.1. Recurrent Convolutional Layer (RCL)

- The key module of RCNN is the recurrent convolutional layer (RCL).
- For a unit located at (
*i*,*j*) on the*k*th feature map in an RCL, its net input*zijk*(*t*) at time step*t*is given by:

- The
**activity**or**state**of this unit is a function of its net input:

- where
*f***rectified linear activation (****ReLU****)**function:

- and
is the*g***local response normalization (LRN)**function in AlexNet:

- It is claimed that LRN is used for preventing the states from exploding.
**Left**: An example with*T*=3. When*t*=0 only the feedforward input is present.- The
**final gradient**of a shared weight is**the sum of its gradients over all time steps.** **Right**: To save computation, layer 1 is the**standard feed-forward convolutional layer**without recurrent connections,**followed by max pooling**. On top of this,**four RCLs are used**with a**max pooling layer in the middle.**Both pooling operations have stride 2 and size 3.- The output of the fourth RCL follows a
**global max pooling**, yielding a feature vector representing the image. - Finally a
**softmax**layer is used to classify the feature vectors to C categories whose output is given by:

- The
**cross-entropy loss**function is used for training. - If we unfold the recurrent connections for
*T*time steps, the model becomes a very deep feed-forward network with 4(*T*+1)+2 parameterized layers, where*T*+1 is the depth of each RCL.

# 2. Results

## 2.1. CIFAR-10

**Three models with different**’s were tested:*K***RCNN-96, RCNN-128**and**RCNN-160**. The number of iterations was set to 3.

All of them outperformed existing models such asMaxoutandNIN, and the performance was steadily improved with more features maps.

## 2.2. CIFAR-100

- Again RCNN-96 outperformed the state-of-the-art models with fewer parameters, and the performance kept improving by increasing
*K*.

## 2.3. MNIST

- RCNN-64 outperformed other models using only 0.30 million parameters.

## 2.4. SVHN

- RCNN-128 had much fewer parameters than NIN (1.19 million versus 1.98 million), and increasing
*K*kept improving the accuracy.

Similar idea is used in PolyInception Modules as in PolyNet, and PolyNet got 2nd Runner Up in ILSVRC 2016 Image Classification.

## Reference

[2015 CVPR] [RCNN]

Recurrent Convolutional Neural Network for Object Recognition

## Image Classification

**1989 … 2015 **[RCNN]** … 2021** [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] **2022 **[ConvNeXt] [PVTv2]