SpeedNet: Learning the Speediness in Videos

Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

Research output: Contribution to journalConference articlepeer-review

196 Scopus citations

Abstract

We wish to automatically predict the 'speediness' of moving objects in videos-whether they move faster, at, or slower than their 'natural' speed. The core component in our approach is SpeedNet-a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring any manual annotations. We show how this single, binary classification network can be used to detect arbitrary rates of speediness of objects. We demonstrate prediction results by SpeedNet on a wide range of videos containing complex natural motions, and examine the visual cues it utilizes for making those predictions. Importantly, we show that through predicting the speed of videos, the model learns a powerful and meaningful space-Time representation that goes beyond simple motion cues. We demonstrate how those learned features can boost the performance of self supervised action recognition, and can be used for video retrieval. Furthermore, we also apply SpeedNet for generating time-varying, adaptive video speedups, which can allow viewers to watch videos faster, but with less of the jittery, unnatural motions typical to videos that are sped up uniformly.

Original languageEnglish
Article number9156879
Pages (from-to)9919-9928
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2020
Externally publishedYes
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
Duration: 14 Jun 202019 Jun 2020

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Fingerprint

Dive into the research topics of 'SpeedNet: Learning the Speediness in Videos'. Together they form a unique fingerprint.

Cite this