A Generic Framework for Video Annotation via Semi-Supervised Learning

Zhang T.; Xu C.; Zhu G.; Liu S.; Lu H.

首页> 外文期刊>Multimedia, IEEE Transactions on >A Generic Framework for Video Annotation via Semi-Supervised Learning

【24h】

A Generic Framework for Video Annotation via Semi-Supervised Learning

机译：通过半监督学习的视频注释通用框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Learning-based video annotation is essential for video analysis and understanding, and many various approaches have been proposed to avoid the intensive labor costs of purely manual annotation. However, there lacks a generic framework due to several difficulties, such as dependence of domain knowledge, insufficiency of training data, no precise localization and inefficacy for large-scale video dataset. In this paper, we propose a novel approach based on semi-supervised learning by means of information from the Internet for interesting event annotation in videos. Concretely, a Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm, which aims to simultaneously tackle these difficulties in a generic framework for various video domains (e.g., sports, news, and movies), is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the models. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e.g., movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e.g., YouTube, Google) in order to give more distributive information for event modeling. Two critical issues of FGSSMIL are: 1) how to calculate the weight assignment for a graph construction, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers; 2) how to solve the algorithm efficiently for large-scale dataset through an optimization approach. To address this issue, Concave-Convex Procedure (CCCP) and nonnegative multiplicative updating rule are adopted. We perform the extensive experiments in three popular video domains: movies, sports, and news. The results compared with the state-of-the-arts are promising and demonstrate the effectiveness- and efficiency of our proposed approach.

机译：基于学习的视频注释对于视频分析和理解必不可少，并且已经提出了许多种方法来避免纯粹的手工注释的大量人工成本。然而，由于诸如领域知识的依赖，训练数据的不足，对于大型视频数据集没有精确的定位和效率低下之类的一些困难，缺乏通用的框架。在本文中，我们提出了一种基于半监督学习的新颖方法，该方法利用来自互联网的信息对视频中的有趣事件进行注释。具体而言，提出了一种基于快速图的半监督多实例学习（FGSSMIL）算法，旨在同时解决各种视频领域（例如体育，新闻和电影）的通用框架中的这些困难。大规模专家标记视频和大规模未标记视频来训练模型。带有专家标签的视频是通过对结构良好的视频相关文本（例如电影脚本，网络广播文本，隐藏式字幕）进行分析和对齐而获得的。通过向来自视频搜索引擎（例如YouTube，Google）的相关事件进行查询，可以获得未标记的数据，以便为事件建模提供更多的分布式信息。 FGSSMIL的两个关键问题是：1）如何计算图结构的权重分配，其中边的权重指定两个数据点之间的相似度。为了解决这个问题，我们提出了一种通过学习实例敏感分类器的新颖的多实例学习诱导相似性（MILIS）措施。 2）如何通过优化方法有效地解决大规模数据集的算法。为了解决此问题，采用了凹凸程序（CCCP）和非负乘法更新规则。我们在三个受欢迎的视频领域中进行了广泛的实验：电影，体育和新闻。与最新技术相比，结果令人鼓舞，并证明了我们提出的方法的有效性和效率。

著录项

来源
《Multimedia, IEEE Transactions on》 |2012年第4期|p.1206-1219|共14页
作者
Zhang T.; Xu C.; Zhu G.; Liu S.; Lu H.;
展开▼
作者单位

Advanced Digital Science Center,;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Broadcast video; Internet; concave-convex procedure (CCCP); event detection; graph; multiple instance learning; semi-supervised learning; web-casting text;

机译：广播视频;互联网;凹凸过程;事件检测;图形;多实例学习;半监督学习;网络广播文本;

相似文献

外文文献
中文文献
专利

1. ESA~☆: A generic framework for semi-supervised inductive learning [J] . Yang Shuyi, Ienco Dino, Esposito Roberto, Neurocomputing . 2021,第Auga4期

机译：ESA〜☆：半监督归纳学习的通用框架
2. A Generic Human–Machine Annotation Framework Based on Dynamic Cooperative Learning [J] . Zhang Yue, Michi Andrea, Wagner Johannes, Cybernetics, IEEE Transactions on . 2020,第3期

机译：一种基于动态合作学习的通用人机注释框架
3. Semi-supervised Kernel Density Estimation For Video Annotation [J] . Meng Wang, Xian-Sheng Hua, Tao Mei, Computer vision and image understanding . 2009,第3期

机译：用于视频注释的半监督内核密度估计
4. Automatic video annotation by semi-supervised learning with kernel density estimation [C] . Meng Wang, Xian-Sheng Hua, Yan Song, Annual ACM international conference on Multimedia;ACM international conference on Multimedia . 2006

机译：通过半监督学习和内核密度估计自动进行视频注释
5. Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings [D] . Kilinc, Ismail Ozsel. 2017

机译：半监督和非监督设置的神经网络中基于图的潜在嵌入，注释和表示学习
6. A semi-supervised machine learning framework for microRNA classification [O] . Mohsen Sheikh Hassani, James R. Green 2019

机译：用于microRNA分类的半监督机器学习框架
7. Video annotation by active learning and semi-supervised ensembling [O] . Yan Song, Guo-jun Qi, Xian-sheng Hua, 2006

机译：通过主动学习和半监督集成进行视频注释

A Generic Framework for Video Annotation via Semi-Supervised Learning

摘要

著录项

相似文献

相关主题

期刊订阅