From Image Captioning to Video Summary using Deep Recurrent Networks and Unsupervised Segmentation

机译：使用深度递归网络和无监督分割从图像字幕到视频摘要

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic captioning systems based on recurrent neural networks have been tremendously successful at providing realistic natural language captions for complex and varied image data. We explore methods for adapting existing models trained on large image caption data sets to a similar problem, that of summarising videos using natural language descriptions and frame selection. These architectures create internal high level representations of the input image that can be used to define probability distributions and distance metrics on these distributions. Specifically, we interpret each hidden unit inside a layer of the caption model as representing the un-normalised log probability of some unknown image feature of interest for the caption generation process. We can then apply well understood statistical divergence measures to express the difference between images and create an unsupervised segmentation of video frames, classifying consecutive images of low divergence as belonging to the same context, and those of high divergence as belonging to different contexts. To provide a final summary of the video, we provide a group of selected frames and a text description accompanying them, allowing a user to perform a quick exploration of large unlabeled video databases.

机译：基于递归神经网络的自动字幕系统在为复杂多样的图像数据提供逼真的自然语言字幕方面已经取得了巨大的成功。我们探索使在大型图像字幕数据集上训练的现有模型适应类似问题的方法，即使用自然语言描述和帧选择对视频进行汇总的方法。这些体系结构创建了输入图像的内部高级表示形式，可用于定义概率分布和这些分布上的距离度量。具体来说，我们将字幕模型层内的每个隐藏单元解释为表示字幕生成过程中感兴趣的某些未知图像特征的未归一化对数概率。然后，我们可以应用易于理解的统计差异度量来表达图像之间的差异，并创建视频帧的无监督分割，将低散度的连续图像归为同一上下文，将高散度的连续图像归为不同上下文。为了提供视频的最终摘要，我们提供了一组选定的帧以及伴随它们的文本描述，从而使用户可以快速浏览大型的未标记视频数据库。

著录项

来源
《International conference on machine vision》|2017年|106960P.1-106960P.8|共8页
会议地点
作者
Bogdan-Andrei Morosanu; Camelia Lemnaru;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Image captioning; video summarization; domain adaptation; divergence; clustering;

机译：图片字幕;视频摘要;领域适应;分歧聚类;

相似文献

外文文献
中文文献
专利

1. VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks [J] . Na Zhao, Hanwang Zhang, Richang Hong, IEEE transactions on multimedia . 2017,第9期

机译：VideoWhisper：通过基于注意力的递归神经网络实现区分性无监督视频特征学习
2. Unsupervised learning from videos using temporal coherency deep networks [J] . Redondo-Cabrera Carolina, Lopez-Sastre Roberto Computer vision and image understanding . 2019,第FEBa期

机译：使用时间相干深度网络从视频进行无监督学习
3. Unsupervised learning from videos using temporal coherency deep networks [J] . Redondo-Cabrera Carolina, Lopez-Sastre Roberto Computer vision and image understanding . 2019,第Feba期

机译：使用时间一致性深网络从视频中学习无监督
4. From Image Captioning to Video Summary using Deep Recurrent Networks and Unsupervised Segmentation [C] . Bogdan-Andrei Morosanu, Camelia Lemnaru International Conference on Machine Vision . 2018

机译：使用深度反复网络和无监督分割来从图像标题到视频摘要
5. Automatic Video Captioning using Deep Neural Network. [D] . Nguyen, Thang Huy. 2017

机译：使用深度神经网络的自动视频字幕。
6. Unsupervised Cerebrovascular Segmentation of TOF-MRA Images Based on Deep Neural Network and Hidden Markov Random Field Model [O] . Shengyu Fan, Yueyan Bian, Hao Chen, 2019

机译：基于深度神经网络和隐马尔可夫随机场模型的TOF-MRA图像无监督脑血管分割
7. Unsupervised Cerebrovascular Segmentation of TOF-MRA Images Based on Deep Neural Network and Hidden Markov Random Field Model [O] . Shengyu Fan, Yueyan Bian, Hao Chen, 2020

机译：基于深神经网络和隐马尔可夫随机现场模型的TOF-MRA图像的无监督脑血管分割

From Image Captioning to Video Summary using Deep Recurrent Networks and Unsupervised Segmentation

摘要

著录项

相似文献

相关主题

期刊订阅