Brief Review — The Cityscapes Dataset for Semantic Urban Scene Understanding

Cityscapes, One of the Popular Semantic Segmentation Datasets

3 min readDec 22, 2022

--

**Cityscapes Dataset** (Figure from https://www.cityscapes-dataset.com/)

The Cityscapes Dataset for Semantic Urban Scene Understanding,
Cityscapes, by Daimler AG R&D, TU Darmstadt, MPI Informatics, and TU Dresden, 2016 CVPR, Over 8600 Citations (Sik-Ho Tsang @ Medium)
Semantic Segementation, Dataset

Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities.
5000 of these images have high quality pixel-level annotations.
20,000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data.

Outline

Cityscapes Dataset
Results

1. Cityscapes Dataset

Several hundreds of thousands of frames were acquired from a moving vehicle during the span of several months, covering spring, summer, and fall in 50 cities, primarily in Germany but also in neighboring countries. They are not deliberately recorded in adverse weather conditions.
5000 images were manually selected from 27 cities for dense pixel-level annotation, aiming for high diversity of foreground objects, background, and overall scene layout. The annotations were done on the 20th frame of a 30-frame video snippet, which we provide in full to supply context information.
For the remaining 23 cities, a single image every 20s or 20m driving distance (whatever comes first) was selected for coarse annotation, yielding 20,000 images in total.

Densely annotated images are split into separate training, validation, and test sets.
Coarsely annotated images serve as additional training data only.

The above shows some statistics for each class in the dataset.

2. Results

**Quantitative results of baselines for semantic labeling**

FCN and also other SOTA approaches at that year, such as DPN [40], CRF-RNN [81], DeepLabv1 [9], and DilatedNet [79], are used to benchmark the dataset. IoU and iIoU are low, meaning that the dataset is challenging.

**Quantitative results (avg. recall in percent) of half-resolution** **FCN-8s model trained on Cityscapes images and tested on Camvid and KITTI.**

FCN-8s model trained on Cityscapes images and tested on Camvid and KITTI, and obtained reasonable performance, which means that the dataset integrates well with existing ones and allows for cross-dataset research.