Review — OHEM: Training Region-based Object Detectors with Online Hard Example Mining (Object Detection)

Using OHEM on Fast R-CNN, Heuristics Removed, Detection Accuracy Improved, Outperforms MR-CNN

In this story, Training Region-based Object Detectors with Online Hard Example Mining, (OHEM), by Carnegie Mellon University, and Facebook AI Research (FAIR), is reviewed.

  • OHEM eliminates several heuristics and hyperparameters in common use.


  1. Brief Review of Fast R-CNN (FRCN)
  2. Several Heuristics in FRCN
  3. Online Hard Example Mining (OHEM)
  4. Experimental Results

1. Brief Review of Fast R-CNN (FRCN)

  • For each object proposal, the RoI-pooling layer projects the proposal onto the conv feature map and extracts a fixed-length feature vector.
  • Each feature vector is fed into the fc layers, which finally give two outputs: Softmax probability and regressed coordinates for bounding-box.
  • For each minibatch, N images are first sampled from the dataset, and then B/N RoIs are sampled from each image. Setting N = 2 and B = 128.

2. Several Heuristics in FRCN

2.1. Foreground RoIs

  • For an example RoI to be labeled as foreground (fg), its intersection over union (IoU) overlap with a ground-truth bounding box should be at least 0.5. This is a very standard design choice.

2.2. Background RoIs

  • A region is labeled background (bg) if its maximum IoU with ground truth is in the interval [bg_lo; 0:5).
  • Although this heuristic helps convergence and detection accuracy, it is suboptimal because it ignores some infrequent, but important, difficult background regions.
  • OHEM removes the bg_lo threshold.

2.3. Balancing fg-bg RoIs

3. Online Hard Example Mining (OHEM)

3.1. OHEM

  • The loss of each RoI represents how well the current network performs on each RoI.
  • The backward pass is no more expensive than before.
  • Overlapping RoIs can project onto the same region. To deal with these redundant and correlated regions, non-maximum suppression (NMS) is used to perform deduplication.
  • And OHEM does not need a fg-bg ratio for data balancing. If any class were neglected, its loss would increase.
  • There can be images where the fg RoIs are easy (e.g. canonical view of a car), so the network is free to use only bg regions in a mini-batch; and vice versa when bg is trivial (e.g. sky, grass etc.), the mini-batch can be entirely fg regions.

3.2. Implementation Details

FRCN with Online Hard Example Mining (OHEM)
  • The readonly RoI network performs a forward pass and computes loss for all input RoIs (R) (green arrows).
  • Then the hard RoI sampling module uses OHEM to select hard examples (Rhard-sel), which are input to the regular RoI network (red arrows).
  • This network computes forward and backward passes only for Rhard-sel.

4. Experimental Results

4.1 OHEM on PASCAL VOC 2007

Impact of hyperparameters on FRCN training
  • FRCN with bg_lo = 0, rows 3-4 show that for VGGM, mAP drops by 2.4 points, whereas for VGG-16 remains similar comparing with rows 1–2.
  • Different settings are also tried (rows 5–10).
  • OHEM (rows 11–13) improves mAP by 2.4 points compared to FRCN with the bg_lo = 0.1 heuristic for VGGM, and 4.8 points without the heuristic.
  • This result demonstrates the sub-optimality of these heuristics and the effectiveness of our hard mining approach.
Training loss using VGG16
Computational statistics of training FRCN
  • The increase in training time is likely acceptable to most users.

4.2. PASCAL VOC 2007 & 2012

PASCAL VOC 2007 test detection average precision (%)
PASCAL VOC 2012 test detection average precision (%)

4.3. MS COCO, & Adding Bells and Whistles

MS COCO 2015 test􀀀dev detection average precision (%).
  • With multi-scale for training and testing (M), 24.4% AP and 25.5% AP are obtained for different training sets.
Impact of multi-scale and iterative bbox reg
  • Using OHEM consistently results in higher mAP for all variants of these two additions (M and B).

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store