Compact CNN for indexing egocentric videos

Yair Poleg, Ariel Ephrat, Shmuel Peleg, Chetan Arora

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

74 Scopus citations

Abstract

While egocentric video is becoming increasingly popular, browsing it is very difficult. In this paper we present a compact 3D Convolutional Neural Network (CNN) architecture for long-term activity recognition in egocentric videos. Recognizing long-term activities enables us to temporally segment (index) long and unstructured egocentric videos. Existing methods for this task are based on hand tuned features derived from visible objects, location of hands, as well as optical flow. Given a sparse optical flow volume as input, our CNN classifies the camera wearer's activity. We obtain classification accuracy of 89%, which outperforms the current state-of-the-art by 19%. Additional evaluation is performed on an extended egocentric video dataset, classifying twice the amount of categories than current state-of-the-art. Furthermore, our CNN is able to recognize whether a video is egocentric or not with 99.2% accuracy, up by 24% from current state-of-the-art. To better understand what the network actually learns, we propose a novel visualization of CNN kernels as flow fields.

Original languageEnglish
Title of host publication2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509006410
DOIs
StatePublished - 23 May 2016
EventIEEE Winter Conference on Applications of Computer Vision, WACV 2016 - Lake Placid, United States
Duration: 7 Mar 201610 Mar 2016

Publication series

Name2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016

Conference

ConferenceIEEE Winter Conference on Applications of Computer Vision, WACV 2016
Country/TerritoryUnited States
CityLake Placid
Period7/03/1610/03/16

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Fingerprint

Dive into the research topics of 'Compact CNN for indexing egocentric videos'. Together they form a unique fingerprint.

Cite this