Review — CVNets: High Performance Library for Computer Vision

CVNets, Open-Source Library for Computer Vision

Sik-Ho Tsang
3 min readFeb 10, 2023
CVNets can be used to improve the performance of different deep neural networks on the ImageNet dataset significantly with simple training recipes (e.g., random resized cropping and horizontal flipping). The official MobileNetV1 and ResNet results are from TensorflowLite and Torchvision, respectively.

CVNets: High Performance Library for Computer Vision,
CVNets, by Apple
2022 ACM MM (

@ Medium)
Deep Learning Libraries, Image Classification

  • CVNets, a high-performance open-source library, is proposed for training deep neural networks for visual recognition tasks, e.g.: efficient data sampling methods are used.


  1. CVNets Library Designs
  2. CVNets
  3. Benchmarks

1. CVNets Library Designs

  • Modularity: CVNets provides independent components. For example, different classification backbones (e.g., ResNet-50) trained in CVNets can be seamlessly integrated with object detection (e.g., SSD) or semantic segmentation (e.g., DeepLabv3) pipelines for studying the generic nature of an architecture.
  • Flexibility: With CVNets, there are new use cases in research as well as production. New components (e.g., models, datasets, loss functions, data samplers, and optimizers) can be integrated easily.
  • Reproducibility: The pre-trained weights of each model are released online to enable future research.
  • Compatibility: CVNets is compatible with hardware accelerated frameworks (e.g., CoreML) and domain-specific libraries (e.g., PyTorchVideo).
An example of registering a video classification model from PyTorchVideo inside CVNets on the Kinetics-400 dataset
  • The models from domain-specific libraries can be easily consumed in the CVNets, as shown above.
  • Beyond ImageNet: Any classification backbone in CVNets (either existing or new) can seamlessly be integrated with down-stream networks (e.g., PSPNet and SSD) and enables researchers to study the generic nature of different classification models.

2. CVNets Library Components

2.1. Data Samplers

Effect of training deep learning models with different sampling methods on the ImageNet dataset. Models trained with MSc-VBS delivers similar performance, trains faster with fewer optimization updates, and generalizes better (higher train loss; similar validation loss) as compared to the ones trained with SSc-FBS and MSc-FBS.
  • CVNets offer data samplers with three sampling strategies: (1) single-scale with fixed batch size (SSc-FBS), (2) multi-scale with fixed batch size (MSc-FBS), and (3) multi-scale with variable batch size (MSc-VBS).

Compared to SSc-FBS and MSc-FBS, MSc-VBS is a memory-efficient sampler that speeds-up the training significantly while maintaining performance.

2.2. Sample Efficient Training

Sample efficient training for the ResNet-50 model. SET reduces the optimizer updates (a) while maintaining performance (b). Fluctuations in (c) represents that easy samples were classified as hard and added back to training data.
  • Sample Efficient Training (SET) is proposed.

If model predicts the training data sample correctly with a confidence greater than a pre-defined threshold 𝜏 for a moving window of 𝑤 epochs, then it is an easy sample and it can be removed from the training data. At each epoch, model only trains using hard samples. (Yet, this process also induces overheads.)

As above, ResNet-50 without SET requires 22% more optimization updates while delivering similar performance.

3. Benchmarks

3.1. Classification on ImageNet

Classification on the ImageNet dataset.

With CVNets, better performance (e.g., MobileNetV1/MobileNetV2) or similar performance (ResNet-50/ResNet-101) is achieved with fewer optimization updates (faster training).

3.2. Detection and Segmentation

For example, SSD with ResNet-101 backbone trained with CVNets at a resolution of 384×384 delivers a 1.6% better mAP than the same model trained at a resolution of 512×512 as reported in DSSD.

  • Similarly, on the task of semantic segmentation on the ADE20K dataset using DeepLabv3 with MobileNetV2 as the backbone, CVNets delivers 1.1% better performance than MMSegmentation library [1] with 2× fewer epochs and optimization updates.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.