A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Abstract

Thanks to the advances in the technology of low-cost digital cameras and the popularity of the self-recording culture, the amount of visual data on the Internet is going to the opposite side of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched in a computer folder or website. In this work, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem, which combined with a smoothing frame transition method accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. The experiments show that our method is able to fast-forward videos to retain as much relevant information and smoothness as the state-of-the-art techniques in less time. We also present a new 80-hours multimodal (RGB-D, IMU, and GPS) dataset of first-person videos with annotations for recorder profile, frame scene, activities, interaction, and attention.

Official Publication

Source code

ArXiv

Supplementary Material

Dataset Page

CVF CVPR’18 Open Access (NEW!)

Methodology and Visual Results

Citation

@InProceedings{Silva2018,
title = {A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos},
booktitle = {2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
author = {Silva, Michel and Ramos, Washington and Ferreira, João and Chamone, Felipe and Campos, Mario and Nascimento, Erickson R.},
Year = {2018},
Address = {Salt Lake City, USA},
month = {Jun.},
pages = {2383-2392},
doi = {10.1109/CVPR.2018.00253},
ISBN = {978-1-5386-6420-9}
}

Baselines

We compare the proposed methodology against the following methods:

EgoSampling – Poleg et al., Egosampling: Fast-forward and stereo for egocentric videos, CVPR 2015.
Microsoft Hyperlapse – Joshi et al., Real-time hyperlapse creation via optimal frame selection, ACM. Trans. Graph. 2015.
Stabilized Semantic Fast-Forward (SSFF) – Silva et al., Towards semantic fast-forward and stabilized egocentric videos, EPIC@ECCV 2016.
Multi-Importance Semantic Fast-Forward (MIFF) – Silva et al., Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects, JVCI 2017.

Datasets

We conducted the experimental evaluation using the following datasets:

Semantic Dataset – Silva et al., Towards Semantic Fast-Forward and Stabilized Egocentric Videos, EPIC@ECCV 2016.
Dataset of Multimodal Semantic Egocentric Videos (DoMSEV) – Silva et al., A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos, CVPR 2018.