Brief Review — RCNN: Recurrent Convolutional Neural Network for Object Recognition
RCNN, CNN+RNN for Image Classification
Recurrent Convolutional Neural Network for Object Recognition
RCNN, by Tsinghua University
2015 CVPR, Over 1000 Citations (Sik-Ho Tsang @ Medium)
Image Classification, CNN, RNN
- A Recurrent CNN (RCNN) is proposed for image classification, where the convolutional layer is recurrently used for multiple times.
Outline
- Recurrent CNN (RCNN)
- Results
1. Recurrent CNN (RCNN)
1.1. Recurrent Convolutional Layer (RCL)
- The key module of RCNN is the recurrent convolutional layer (RCL).
- For a unit located at (i, j) on the kth feature map in an RCL, its net input zijk(t) at time step t is given by:
- The activity or state of this unit is a function of its net input:
- where f is the rectified linear activation (ReLU) function:
- and g is the local response normalization (LRN) function in AlexNet:
- It is claimed that LRN is used for preventing the states from exploding.
- Left: An example with T=3. When t=0 only the feedforward input is present.
- The final gradient of a shared weight is the sum of its gradients over all time steps.
- Right: To save computation, layer 1 is the standard feed-forward convolutional layer without recurrent connections, followed by max pooling. On top of this, four RCLs are used with a max pooling layer in the middle. Both pooling operations have stride 2 and size 3.
- The output of the fourth RCL follows a global max pooling, yielding a feature vector representing the image.
- Finally a softmax layer is used to classify the feature vectors to C categories whose output is given by:
- The cross-entropy loss function is used for training.
- If we unfold the recurrent connections for T time steps, the model becomes a very deep feed-forward network with 4(T+1)+2 parameterized layers, where T+1 is the depth of each RCL.
2. Results
2.1. CIFAR-10
- Three models with different K’s were tested: RCNN-96, RCNN-128 and RCNN-160. The number of iterations was set to 3.
All of them outperformed existing models such as Maxout and NIN, and the performance was steadily improved with more features maps.
2.2. CIFAR-100
- Again RCNN-96 outperformed the state-of-the-art models with fewer parameters, and the performance kept improving by increasing K.
2.3. MNIST
- RCNN-64 outperformed other models using only 0.30 million parameters.
2.4. SVHN
- RCNN-128 had much fewer parameters than NIN (1.19 million versus 1.98 million), and increasing K kept improving the accuracy.
Reference
[2015 CVPR] [RCNN]
Recurrent Convolutional Neural Network for Object Recognition
Image Classification
1989 … 2015 [RCNN] … 2021 [Learned Resizer] [Vision Transformer, ViT] [ResNet Strikes Back] [DeiT] [EfficientNetV2] [MLP-Mixer] [T2T-ViT] [Swin Transformer] [CaiT] [ResMLP] [ResNet-RS] [NFNet] [PVT, PVTv1] [CvT] [HaloNet] [TNT] [CoAtNet] [Focal Transformer] [TResNet] [CPVT] [Twins] [Exemplar-v1, Exemplar-v2] [RepVGG] 2022 [ConvNeXt] [PVTv2]