Unsupervised learning from videos using temporal coherency deep networks

Redondo-Cabrera Carolina; Lopez-Sastre Roberto

首页> 外文期刊>Computer vision and image understanding >Unsupervised learning from videos using temporal coherency deep networks

【24h】

Unsupervised learning from videos using temporal coherency deep networks

机译：使用时间相干深度网络从视频进行无监督学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this temporal coherence of close frames is used as a free form of annotation, encouraging the learned representations to exhibit small differences between these frames. But this type of approach fails to capture the dissimilarity between videos with different content, hence learning less discriminative features. We here propose two Siamese architectures for Convolutional Neural Networks, and their corresponding novel loss functions, to learn from unlabeled videos, which jointly exploit the local temporal coherence between contiguous frames, and a global discriminative margin used to separate representations of different videos. An extensive experimental evaluation is presented, where we validate the proposed models on various tasks. First, we show how the learned features can be used to discover actions and scenes in video collections. Second, we show the benefits of such an unsupervised learning from just unlabeled videos, which can be directly used as a prior for the supervised recognition tasks of actions and objects in images, where our results further show that our features can even surpass a traditional and heavily supervised pre-training plus fine-tuning strategy.

机译：在这项工作中，我们解决了视频无监督学习的挑战性问题。现有方法利用连续视频帧中的时空连续性作为学习过程的正则化。通常，接近帧的时间相关性用作注释的自由形式，从而鼓励学习的表示在这些帧之间表现出小的差异。但是，这种方法无法捕获具有不同内容的视频之间的差异，因此学习的判别功能较少。我们在这里提出两种用于卷积神经网络的暹罗体系结构及其相应的新颖损失函数，以从未标记的视频中学习，这些视频共同利用了连续帧之间的局部时间相干性，以及用于区分不同视频表示的全局判别余量。提出了广泛的实验评估，我们在其中验证了针对各种任务提出的模型。首先，我们展示如何将学习到的功能用于发现视频集合中的动作和场景。其次，我们展示了从无标签视频中进行这种无监督学习的好处，该视频可以直接用作图像中动作和对象的有监督识别任务的先验，我们的结果进一步表明，我们的功能甚至可以超越传统的严格监督的预训练和微调策略。

著录项

来源
《Computer vision and image understanding》 |2019年第2期|79-89|共11页
作者
Redondo-Cabrera Carolina; Lopez-Sastre Roberto;
展开▼
作者单位

Univ Alcala De Henares, GRAM, Alcala De Henares 28805, Spain;

Univ Alcala De Henares, GRAM, Alcala De Henares 28805, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Unsupervised learning; Action discovery; Action recognition; Object recognition; Deep learning;

机译：无监督学习;动作发现;动作识别;对象识别;深度学习;

相似文献

外文文献
中文文献
专利

1. Unsupervised learning from videos using temporal coherency deep networks [J] . Redondo-Cabrera Carolina, Lopez-Sastre Roberto Computer vision and image understanding . 2019,第Feba期

机译：使用时间一致性深网络从视频中学习无监督
2. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks [J] . Na Zhao, Hanwang Zhang, Richang Hong, IEEE transactions on multimedia . 2017,第9期

机译：VideoWhisper：通过基于注意力的递归神经网络实现区分性无监督视频特征学习
3. Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model [J] . Fakhar Babak, Kanan Hamidreza Rashidy, Behrad Alireza Multimedia Tools and Applications . 2019,第12期

机译：使用基于空间金字塔模型的时空特征无监督学习对足球视频中的事件进行检测
4. Unsupervised Deep Networks for Temporal Localization of Human Actions in Streaming Videos [C] . Binu M. Nair . 2016

机译：在流视频中对人类行为进行时间定位的无监督深度网络
5. Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks. [D] . Peng, Liang. 2017

机译：利用具有深度神经网络的分层和时间对象性的视频中的对象识别。
6. Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks [O] . Okko Räsänen, Tasha Nagamine, Nima Mesgarani -1

机译：分析无监督深度神经网络中音位类别的分布学习
7. Unsupervised learning from videos using temporal coherency deep networks [O] . Carolina Redondo-Cabrera, Roberto Lopez-Sastre 2019

机译：使用时间一致性深网络从视频中学习无监督

Unsupervised learning from videos using temporal coherency deep networks

摘要

著录项

相似文献

相关主题

期刊订阅