Review — Desoiling Dataset: Restoring Soiled Areas on Automotive Fisheye Cameras (Desoiling)

Desoiling Dataset is Proposed; CycleGAN for Desoiling

Sik-Ho Tsang
6 min readJul 25, 2021
Left: soiled camera lens mounted to the car body; Middle: the image quality of the soiled camera from the previous image; Right: an example of image soiled by a heavy rain.

In this story, Desoiling Dataset: Restoring Soiled Areas on Automotive Fisheye Cameras, (Desoiling Dataset), by Valeo, is reviewed.

  • Surround-view cameras can get soiled easily. When cameras get soiled, the degradation of performance is usually more dramatic compared to other sensors.

In this paper:

  • First, a Desoiling Dataset is constructed, which contains 40+ approximately 1 minute long video sequences with paired image information of both clean and soiled nature.
  • Then, CycleGAN is used for desoiling.

This is a paper in 2019 ICCVW. (Sik-Ho Tsang @ Medium)


  1. Motivation
  2. Desoiling Dataset
  3. CycleGAN for Desoiling
  4. Experimental Results

1. Motivation

  • Surround view cameras are becoming de facto standard in autonomous parking, such as fishbone parking or detecting a free parking spot.
  • The surround view cameras are directly exposed to the environment, which can, sometimes, be very harsh. In certain conditions, e.g. heavy rain, snow or offroad driving, the surround view cameras can get soiled quite easily. Then, the performance usually degrades dramatically.

2. Desoiling Dataset

Left: Camera mount with one specific soiling setup. Right: Corresponding imagery from this particular soiling setup
  • The main goal of the proposed dataset is a restoration of soiled images.

2.1. Track Setup

  • The dataset is formed by 40+ video captures, each of them is approximately 1 minute long and contain low speed maneuvering of the car in a close proximity of a parking place, and were obtained in 3 recording sessions, which were conducted each on a different day with slightly different weather conditions.
  • The data were collected on a small test track of authors’ facility. The test track speed limit is 20 kmph and the data are collected within this speed limit.
  • It consisted of a short stay at a starting spot. Then a short drive around the testing track, parking between parked cars in a reverse motion and then again a short drive through the testing track back to the original position.
  • The driving scenario covers typical classes used for semantic segmentation in autonomous driving, such as building, other vehicles, ground line markings, foliage, and sparsely also pedestrians.

2.2. Camera Setup

  • Each capture consist of image data from a setup of 4 cameras, in a row.
  • One camera is always kept clean, while the rest 3 cameras are manually soiled.
  • There are different types of soiling (e.g., ceramic mud of different consistency, ISO mud, muddy water, water or foam from formed by a cleaning agent).
  • For applying the soiling, either a toothbrush is used by which the camera hood is randomly sprayed, or an aerosol spray is used to spray water drops of different sizes.

Based on the setup, it is possible to used both pairiness (clean and soiled image with a small shift in camera position) and temporal information (consecutive frames from the video streams). This is beneficial not only for the task of image restoration, but also for soiling detection and other admissible tasks.

2.3. Dataset Path

Dataset in GoogleDrive Provided by Valeo
The Folder is Empty
  • (But when I was going to download the dataset (25/7/2021), it seems that the dataset has not been open to public yet. Maybe the provision of dataset includes some tedious work such as providing the segmentation ground-truth I don’t know. anyway, I hope the dataset will be available in the coming future.)

3. CycleGAN for Desoiling

CycleGAN for Desoiling
  • Since the shift in the physical position of cameras introduces non-affine perturbations of the images, CycleGAN is used to deal with the non-aligned data.
  • It consists of a pair of generators and a pair of discriminators. For the soiling restoration purposes, authors are interested only in a single generator, which takes the soiled data on its input and provides “de-soiled”/clean images on its output.
  • 17,828 images are sampled (both clean and soiled) and created the following split: training set (8,913 images), validation set (4,457 images), and testing set (4,458 images).
  • It is found that the generator which takes the soiled images on its input and produces clean images on the output was working already quite reasonably
  • The other generator was not so convincing. It was able to introduce only “water”-like soiling in the clean images.

4. Experimental Results

4.1. Qualitative Results

Left: Input, Right: Output
  • Qualitative results as shown above, show reasonable restoration.

4.2. Quantitative Results

Comparison of accuracy metrics on soiled data vs desoiled data
  • The commonly used image similarity metric Structural Similarity Index (SSIM) is used for evaluation, and the comparison of mean Intersection over Union (mIoU) is also used, which is commonly used to express semantic segmentation accuracy.

4.2.1. SSIM

  • In case of the SSIM, the comparison is made between the soiled and de-soiled images using the clean image as a ground truth. We observe an improvement of 6% without any tuning of the algorithm.

4.2.2. IoU

  • The encoder-decoder architecture of the semantic segmentation network with the ResNet-50 encoder and the FCN-8 decoder. The network is pre-trained on ImageNet and then trained on the proposed internal fisheye dataset.
  • Due to the lack of segmentation ground truth on these images, the same network is run on clean images and used as a ground truth.
  • There are improvements of 5% for the road class and 3% improvement for lanes and curb classes.

4.3. Potential Extension to Utilize Temporal Information

Qualitative results of restored images using Video Inpainting
  • First column represents the masked images which simulate soiled camera frames. Second column shows the reconstruction results compared to ground truth in the third column.
  • When there is an area in the Field of View (FOV) that is blocked by a soiled part of the camera, it is likely to be visible when the vehicle moves as it will be captured by an unsoiled part.
  • In this case, a sequence of temporal images can be used to extrapolate the parts that have been hidden by soiled parts and therefore reconstruct the whole scene.
  • This reconstruction problem is formulated as an in-painting problem.
  • “Free-form video inpainting with 3d gated convolution and temporal patchgan” [1] is used as the video painting approach to generate the above preliminary results.
  • The above scene is being restored based on temporal neighbors where the car in the middle is completely or partially masked out.
  • The results show the benefit of utilizing the time information.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.