Review — TiledSoilingNet: Tile-level Soiling Detection on Automotive Surround-view Cameras Using Coverage Metric

TiledSoilingNet: Tile-Based Soiling Detection Using CNN

Sik-Ho Tsang
5 min readAug 5, 2021
Automotive cameras get soiled

In this story, TiledSoilingNet: Tile-level Soiling Detection on Automotive Surround-view Cameras Using Coverage Metric, (TiledSoilingNet), by Valeo, is reviewed.

  • Automotive cameras, particularly surround-view cameras, tend to get soiled by mud, water, snow, etc.

In this paper:

  • Tile-based soiling detection is proposed to regress the area of each soiling type within a tile directly, which is referred as coverage.
  • It is integrated into an object detection and semantic segmentation multi-task model.
  • As soiling camera image data is not common, a soiling dataset is constructed, where a portion of the dataset used will be released publicly as part of the WoodScape dataset.

This is a paper in 2020 ITSC. (Sik-Ho Tsang @ Medium)


  1. Dataset Construction
  2. CNN Soiling Detection
  3. Experimental Results

1. Dataset Construction

Soiling annotation using polygons (top) and tile level coverage values per class generation derived from polygons (bottom)

1.1. Dataset

  • A total of 105,987 images from all four cameras around the car was collected.
  • Every 15-th frame is extracted out of short video recordings recorded at 30 frames per second.
  • The annotations were generated manually in the form of coarse polygonal segmentations, and were converted into tile level-based labels, which is the dominating soiling class coverage in the specific tile.
  • The entire dataset was divided into three non-overlapping parts with 60/20/20 ratio for training, validation, and test sets.
  • A stratified sampling approach was followed to mostly retain the underlying distributions of the classes among the splits.

1.2. Coverage

  • A number of tiles of 4×4 is used in an image.
  • The coverage values are mentioned in each tile in order clean, transparent, semitransparent, and opaque.

The coverage is defined and calculated simply as the fraction of pixels from the given class {clean, transparent, semitransparent, opaque} that belong to the particular tile and the number of pixels of the tile itself.

  • This definition ensures that the coverages of all classes within the tile sum up to 1.
  • A precise segmentation mask is not necessary but only the area of the soiling segmentation. The percentage of the soiling classes in a tile is sufficient to decide on activating the cleaning system.

2. CNN Soiling Detection

Illustration of soiling integrated into a multi-task object detection and segmentation network
  • The proposed network is a multi-task learning framework with one shared encoder and dedicated decoders per task.
  • ResNet-10 network without classification head is used as the encoder.
  • Other than a soiling, there are two vision tasks — semantic segmentation and object detection. An FCN8 style decoder was adapted for semantic segmentation and the detection decoder is YOLOv2.
  • Batch normalization and ReLU activations are used for convolutions.
  • First, detection and segmentation tasks are trained jointly. Then the encoder is frozen (set as non-trainable), and the soiling decoder is trained on top of it.

2.1. Classification

  • Categorical cross-entropy and categorical accuracy were used as loss and metrics respectively for the classification output.

2.2. Coverage

  • A novel coverage metric is proposed.
  • An RMSE (Root Mean Square Error) value per class to measure the presence of each class per tile. The RMSE is defined as:
  • in which t and p are the set of true and predicted values coming out of the softsign function.
  • C is the set of classes {clean, transparent, semitransparent, opaque}.
  • N is the total number of tiles in one image.
  • The proposed coverage based RMSE and weighted precision were applied as loss and evaluation metrics for the coverage output.

3. Experimental Results

3.1. RMSE

Per class RMSE of tile level soiling detection (metrics rounded off) across camera views
  • The RMSE values per class for each camera view and overall are shown above.
  • The achieved RMSE values for all camera views are fairly reasonable across all classes as the errors are bounded and quite close to 0.
  • These values are more meaningful than classification-based metrics.

3.2. Tile Level Classification

Summary of results of tile-level soiling classification
  • The camera view based confusion matrices for all soiling classes with and without data normalization are shown above.
  • The confusion is higher between transparent and semitransparent classes.
  • It is only a subtle change in visibility that changes a tile from semitransparent to transparent.
  • Furthermore, annotators are not always consistent across images as even humans struggle to annotate this kind of data.

3.3. Data Augmentation

Color codes: Blue/Green — trained with/without data augmentation and performance on validation dataset; x-axis is epoch, From left to right: a) RMSE; b) weighted precision loss; c) accuracy for class clean; d) accuracy for class transparent; e) accuracy for class semitransparent; f) accuracy for class opaque
  • Some common augmentation techniques are applied, such as horizontal flips, contrast, brightness, and color (hue and saturation) modifications and adding Gaussian noise.
  • According to the trend across graphs, the errors are larger or the accuracies are lower when augmentation is used.
  • The test without augmentation outperformed the one with augmentation.
  • It is expected that data augmentation could help a lot to generalize across different unseen weather and soiling conditions in practice.
  • But data augmentation is playing a negative role in this task.

3.4. Practical Challenges

  • Some classes have less inter-class variance, such as transparent and semitransparent, which makes annotators’ job extremely difficult. It is often a matter of discussion to judge whether specific pixels are transparent or semitransparent, especially in transition zones.
  • Cloud structures in the sky often confuse with the opaque soiling class. It commonly occurs in highway scenes due to large open areas of the sky being visible in the image containing unstructured, blurry patterns that seem very similar to soiling patterns.
  • Sun glare is another issue that makes some areas overexposed in the image leading to artifacts. These artifacts are often misclassified as opaque soiling.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.