Brief Review — Log RGB Images Provide Invariance to Intensity and Color Balance Variation for Convolutional Networks

Using Linear RGB or Log RGB to Improve Accuracy

Sik-Ho Tsang
4 min readNov 17, 2024

Log RGB Images Provide Invariance to Intensity and Color Balance Variation for Convolutional Networks
Log RGB
, by Northeastern University
2023 BMVC (Sik-Ho Tsang @ Medium)

Image Classification
1989 … 2024 [FasterViT] [CAS-ViT] [TinySaver] [Fast Vision Transformer (FViT)] [MogaNet] [RDNet]
==== My Other Paper Readings Are Also Over Here ====

  • Typical signal processing pipelines, cConversion to sRGB, and JPEG compression break the rules of physics.
  • It is found that using linear or log RGB images can preserve the rules and increase robustness to certain types of visual variation.
  • Later, authors extends the idea and have a paper published in 2024 CVPR.

Outline

  1. Preliminaries
  2. Linear RGB or Log RGB
  3. Results

1. Preliminaries

  • A common model for body reflection in images is the multiplicative model I =LR, where I is the captured image value, L is the direct illuminant, and R is the body reflection.
  • The BIDR model adds an ambient illumination term A, and a direct illuminant modifier g that represents both geometric shading and shadows that modify the strength of L:
  • Taking the log of the refactored equation gives two terms in log space:
  • The first term is a constant for a single material. The second term varies according to the strength of the direct illuminant.
  • The result is an approximate line segment, or cylinder in log space that represents the range of body reflection values for a single material.

Conversion to sRGB, contrast enhancement, and JPEG compression all conspire to eliminate the linearity of the data and break the structure of material appearance in log space.

2. Linear RGB or Log RGB

2.1. Dataset

  • To see if using linear RGB or log RGB is better than using JPEG, authors captured 561 images that contain a Swedish Fish® candy box and 557 images taken in similar locations without the box.
  • 100 images are used for the test set, evenly split.
  • The RAW data was read and processed with the rawPy library [41] using the default deBayering algorithm.
  • The linear data was resized using the OpenCV resize function with the INTERP_AREA flag so the minimum spatial dimension was 64 pixels and then saved as 16-bit TIFF files. The log of linear data was generated from the resized linear data and saved as 32-bit EXR files.
  • From the original data, three variations of the original test set are created: (A) random intensity variation, (B) random color balance, and (C) both random intensity and color balance. The color balance is applied as a diagonal matrix on linear RGB.
  • The JPEG sRGB and linear data is normalized to the range [0, 1] by dividing by the max value for the data: 255 for JPEG and 65535 for the linear data.
  • The log data is not normalized or shifted and is in the range [0, 11.1].

2.2. Model

  • A small CNN structure is used for the detection task, similar to LeNet.
  • The network, built in pyTorch, has three convolution layers with 5×5 filters, stride 1, valid convolution, with 16, 32, and 32 channels, respectively. Each convolution layer is followed by a 2×2 max pooling layer and ReLU activation.
  • The final pooling layer is 4×4 spatially with 32 channels and is fully connected to a 64 node linear layer, followed by the output layer with two nodes.
  • A Dropout layer with p = 0.7 sits after the final pooling layer.
  • The network was trained with negative log likelihood as the loss and Adam as the optimizer.

3. Results

Accuracy Performance
  • The JPEG network showed similar performance on the validation set and the unmodified data in experiment 1, but did not generalize as well in experiments 2 or 3.

The network trained on log data demonstrates invariance to color balance and intensity even when trained only on the original data.

  • The linear network showed a drop in performance on the modified test sets when trained only on the original data. It demonstrated more sensitivity to color balance than to intensity variation in all three experiments, but it was able to perform more consistently when trained on the modified data.
  • The linear network had the highest validation accuracy for all three experiments, but worse generalization to the unmodified test set than the log network, and the losses indicate the network was exhibiting more uncertainty despite the good accuracy.

It can be concluded that using log data provides invariance to color balance and intensity variation with no additional training.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet