Brief Review — RCNN: Recurrent Convolutional Neural Network for Object Recognition

RCNN, CNN+RNN for Image Classification

4 min readAug 26, 2022

Recurrent Convolutional Neural Network for Object Recognition
RCNN, by Tsinghua University
2015 CVPR, Over 1000 Citations (Sik-Ho Tsang @ Medium)
Image Classification, CNN, RNN

A Recurrent CNN (RCNN) is proposed for image classification, where the convolutional layer is recurrently used for multiple times.

Outline

Recurrent CNN (RCNN)
Results

1. Recurrent CNN (RCNN)

1.1. Recurrent Convolutional Layer (RCL)

The key module of RCNN is the recurrent convolutional layer (RCL).
For a unit located at (i, j) on the kth feature map in an RCL, its net input zijk(t) at time step t is given by:

The activity or state of this unit is a function of its net input:

where f is the rectified linear activation (ReLU) function:

and g is the local response normalization (LRN) function in AlexNet:

It is claimed that LRN is used for preventing the states from exploding.
Left: An example with T=3. When t=0 only the feedforward input is present.
The final gradient of a shared weight is the sum of its gradients over all time steps.
Right: To save computation, layer 1 is the standard feed-forward convolutional layer without recurrent connections, followed by max pooling. On top of this, four RCLs are used with a max pooling layer in the middle. Both pooling operations have stride 2 and size 3.
The output of the fourth RCL follows a global max pooling, yielding a feature vector representing the image.
Finally a softmax layer is used to classify the feature vectors to C categories whose output is given by:

The cross-entropy loss function is used for training.
If we unfold the recurrent connections for T time steps, the model becomes a very deep feed-forward network with 4(T+1)+2 parameterized layers, where T+1 is the depth of each RCL.