Review — UHCTD: A Comprehensive Dataset for Camera Tampering Detection (Camera Tampering Detection)

Camera Tampering Detection Using AlexNet, ResNet & DenseNet

6 min readFeb 7, 2021

In this story, UHCTD: A Comprehensive Dataset for Camera Tampering Detection, UHCTD, by University of Houston, is reviewed. In this paper:

A synthetic dataset with over 6.8 million annotated images is proposed.
The problem of tampering detection is formulated as a classification problem. Deep learning architectures, such as AlexNet, ResNet, and DenseNet, are used for evaluation.

This is a paper in 2019 AVSS. (Sik-Ho Tsang @ Medium)

Outline

Camera Tampering
UHCTD: University of Houston Camera Tampering Detection Dataset
Tampering Synthesis
Tampering Detection as a Classification Problem

1. Camera Tampering

There are many types of camera tampering. For example:
Covered tampering occurs when the view of the camera is blocked: Spray painting the lens, or blocking it with hand.
Defocused tampering occurs when the view of the camera is blurred: Failures to focus, or intentionally changing the focus.
Moved tampering occurs when the view point of the camera has changed: e.g.: Strong wind, or an intentionally change in the direction of the lens with malicious intent.

2. UHCTD: University of Houston Camera Tampering Detection Dataset

**Left: Camera A Viewpoint, Right: Camera B Viewpoint**

The dataset is created from two outdoor surveillance cameras, Camera A and Camera B, the viewpoints fore each are shown above. Camera A has a resolution of 2048×1536, and Camera B has a resolution of 1280×960.
The videos are cropped to a 3rd of their resolution during synthesis. Camera A has a framerate of 3 frames per second (fps), and Camera B has a framerate of 10 fps.
Camera tampering detection algorithms are sensitive to events that effect a large portion of pixels simultaneously. For example, illumination changes have a global effect, as do large objects passing in front of the camera. This results in a simultaneous change in a large number of pixels in the images. Such events can be inferred as tampering and often generate false alarms.
The videos are captured over 6 consecutive days and include a variety of illumination changes.

Natural illumination changes: Outdoor surveillance cameras undergo regular variations.

Shadows are also indicative of regular scene changes.

**The view under rainy and overcast conditions.**

Weather related changes: Outdoor cameras undergo scene changes due to weather.

Crowded scenarios: Surveillance camera can have objects that occupy a large portions of the view.

Scene change over extended period: The dataset consist of scenarios where there is a large deviation in the view occurring over an extended period of time. There is a deviation caused by tents being setup for extended periods of time.

3. Tampering Synthesis

3.1. Type of Tampering

Four classes of data are created (normal, covered, defocused, and moved), by applying spatial translation, spatial smoothing, and pixel copy operations to synthesize moved, defocused, and covered tampering respectively.
The center region is cropped and used as a region of interest (roi) for normal images.
The roi is translated and and then cropped to simulate a moved tampering.
The image is cropped first, and then a block of pixels are replaced by a random texture, to simulate covered tampering. Random textures from Kylberg Texture Dataset is used to accomplish this.
The image is cropped first and then smoothed using a Gaussian kernel to simulate a defocused tampering.

**Synthetic Data. a) Original, b) Covered, c) Defocused, and d) Moved images.**

The above shows the examples of synthetic data.
Two parameters, extent and rate, are introduced to synthesize different synthetic effects for each class.

3.2. Extent

While inducing covered tampering, extent defines the ratio of the image area that is covered.
While inducing moved tampering, it defines the ratio of overlap between the original and moved image.
Four different values are used to induce covered and moved tampers: {0.25, 0.50, 0.75, 1.00}.
While inducing a defocused tampering, extent represents the blurriness of the image. A 31×31 Gaussian kernel is used. Extent refers to the number of times the blurring operation. Four values for extent are used to induce defocused tamper: {1, 11, 21, 31}.

3.3. Rate

Rate defines the time it takes for tampering to occur.
6 different durations to induce tampering: 0.1, 0.5, 1, 5, 30, 60} secs.

3.4. Dataset

Finally, the training images, training videos and testing videos are obtained as above.

4. Tampering Detection as a Classification Problem

Three questions are thought about:

How well can we learn and classify images within a single camera?
Can we transfer learning from one camera to another?
Can we learn and classify images in multiple cameras simultaneously?

These questions lead to 3 types of experiments as shown above.
AlexNet, ResNet-18, ResNet-50, and DenseNet-161 are tried.

4.1. 2-Class Results {Normal, Tampered}

**Comparison of the performance under a two class assumption**

In 2-class experiment, {Normal} is 1 class while {Covered, Defocused, Moved} are treated as 1 class.
TPR: True Positive Rate
FPR: False Positive Rate
Acc: Accuracy
hFAR: Hourly False Alarm Rate
In Experiment 1, AlexNet performs best among the four models with an accuracy of 75%. The higher accuracy is attributed to the lower number of false positives it generates.
The order of performance for rest of the models is ResNet-18, ResNet-50, and DenseNet-161.
DenseNet-161, with a FPR of 78%, is the least capable of detecting normal images and produces the largest number of false positives.

Interestingly, the idea that deeper models perform better may not translate to camera tampering detection.
The large values for hFAR indicate that there is much work needed towards realizing a method that can translate to real world applications.
Incorporating temporal information can assist in reducing the false alarms to a large extent.

In Experiment 3, AlexNet again produces lower false positives compared to other models.
There is a sharp increase in false positives for the ResNet and DenseNet models

4.2. 4-Class Results {Normal, Covered, Defocused, Moved}

**Comparison of performance under a four class assumption**

In 4-class experiment, {Normal, Covered, Defocused, Moved} are treated as 4 individual classes.
In Experiment 1, AlexNet is able to classify normal images with a higher accuracy compared to other models. Furthermore, it is capable of detecting covered tampering better, while producing lower false alarms.
ResNet-18 has the highest accuracy in detecting defocused tampering.
Overall, DensNet-161 tends to confuse normal images with defocused image, and ResNet-18/50 tends to confuse them with moved tampers.
AlexNet, while producing lowest false positives, tends to confuse normal images with defocused tampering.
In Experiment 2, the ResNet based models outperform AlexNet and DenseNet-161.
AlexNet tends to produce a higher number of false positives than in experiment 1.
In Experiment 3, the results also indicate that the four models continue to detect covered tampering with the same accuracy as experiment 1.
ResNet-18/50 and DenseNet-161 models show a sharp degradation in the accuracy of detecting normal images.
ResNet tends to confuse normal images with moved tampering.
DenseNet tends to confuse normal images with defocused tampering.

Reference

[2019 AVSS] [UHCTD]
UHCTD: A Comprehensive Dataset for Camera Tampering Detection

Camera Tampering Detection

[UHCTD]