Unsupervised learning from videos using temporal coherency deep networks

Redondo-Cabrera Carolina; Lopez-Sastre Roberto

首页> 外文期刊>Computer vision and image understanding >Unsupervised learning from videos using temporal coherency deep networks

【24h】

Unsupervised learning from videos using temporal coherency deep networks

机译：使用时间一致性深网络从视频中学习无监督

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this temporal coherence of close frames is used as a free form of annotation, encouraging the learned representations to exhibit small differences between these frames. But this type of approach fails to capture the dissimilarity between videos with different content, hence learning less discriminative features. We here propose two Siamese architectures for Convolutional Neural Networks, and their corresponding novel loss functions, to learn from unlabeled videos, which jointly exploit the local temporal coherence between contiguous frames, and a global discriminative margin used to separate representations of different videos. An extensive experimental evaluation is presented, where we validate the proposed models on various tasks. First, we show how the learned features can be used to discover actions and scenes in video collections. Second, we show the benefits of such an unsupervised learning from just unlabeled videos, which can be directly used as a prior for the supervised recognition tasks of actions and objects in images, where our results further show that our features can even surpass a traditional and heavily supervised pre-training plus fine-tuning strategy.

机译：在这项工作中，我们解决了来自视频无监督学习的挑战性问题。现有方法利用连续视频帧中的时空连续性作为学习过程的正则化。通常，关闭帧的这种时间相干性用作自由的注释形式，鼓励学习的表示在这些框架之间表现出小的差异。但这种类型的方法无法捕获具有不同内容的视频之间的不相似性，因此学习较少的歧视特征。我们在这里提出了两个用于卷积神经网络的暹罗架构，以及他们对相应的新颖损失功能，以便从未标记的视频中学习，该概念共同利用连续帧之间的局部时间一致性以及用于分离不同视频表示的全球判别余量。提出了一个广泛的实验评估，我们在各种任务上验证了所提出的模型。首先，我们展示了如何使用学习功能来发现视频集合中的操作和场景。其次，我们展示了这种无监督的学习的好处，从刚刚的未标记的视频中可以直接用作图像的监督识别任务和图像中的对象的监督，我们的结果进一步表明我们的功能甚至可能超过传统和传统和严重监督的预培训加上微调策略。

著录项

来源
《Computer vision and image understanding》 |2019年第2期|79-89|共11页
作者
Redondo-Cabrera Carolina; Lopez-Sastre Roberto;
展开▼
作者单位

Univ Alcala De Henares GRAM Alcala De Henares 28805 Spain;

Univ Alcala De Henares GRAM Alcala De Henares 28805 Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Unsupervised learning; Action discovery; Action recognition; Object recognition; Deep learning;

机译：无人监督的学习;行动发现;行动识别;对象识别;深入学习;

相似文献

外文文献
中文文献
专利

1. Unsupervised learning from videos using temporal coherency deep networks [J] . Redondo-Cabrera Carolina, Lopez-Sastre Roberto Computer vision and image understanding . 2019,第FEBa期

机译：使用时间相干深度网络从视频进行无监督学习
2. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks [J] . Na Zhao, Hanwang Zhang, Richang Hong, IEEE transactions on multimedia . 2017,第9期

机译：VideoWhisper：通过基于注意力的递归神经网络实现区分性无监督视频特征学习
3. Event detection in soccer videos using unsupervised learning of Spatio-temporal features based on pooled spatial pyramid model [J] . Fakhar Babak, Kanan Hamidreza Rashidy, Behrad Alireza Multimedia Tools and Applications . 2019,第12期

机译：使用基于空间金字塔模型的时空特征无监督学习对足球视频中的事件进行检测
4. Unsupervised Deep Networks for Temporal Localization of Human Actions in Streaming Videos [C] . Binu M. Nair . 2016

机译：在流视频中对人类行为进行时间定位的无监督深度网络
5. Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks. [D] . Peng, Liang. 2017

机译：利用具有深度神经网络的分层和时间对象性的视频中的对象识别。
6. Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks [O] . Okko Räsänen, Tasha Nagamine, Nima Mesgarani -1

机译：分析无监督深度神经网络中音位类别的分布学习
7. Unsupervised learning from videos using temporal coherency deep networks [O] . Carolina Redondo-Cabrera, Roberto Lopez-Sastre 2019

机译：使用时间一致性深网络从视频中学习无监督

Unsupervised learning from videos using temporal coherency deep networks

摘要

著录项

相似文献

相关主题

期刊订阅