Review — Two-Stream ConvNet: Spatial and Temporal Networks (Video Classification)

Video Classification/Action Recognition Using AlexNet-Like Two-Stream Spatial and Temporal Networks

Outline

1. Two-Stream CNN: Network Architecture

Two-Stream CNN: Network Architecture
(a),(b): a pair of consecutive video frames, (c): a close-up of dense optical flow, (d): horizontal, (e) vertical component dx of the displacement vector field
ConvNet input derivation from the multi-frame optical flow

2. Experimental Results

Spatial Stream ConvNet UCF-101 (split 1).
Temporal Stream ConvNet UCF-101 (split 1).
Temporal ConvNet accuracy on HMDB-51
Two-Stream ConvNet accuracy on UCF-101 (split 1)
Mean accuracy (over three splits) on UCF-101 and HMDB-51

--

--

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store