...
首页> 外文期刊>Multimedia, IEEE Transactions on >A Generic Framework for Video Annotation via Semi-Supervised Learning
【24h】

A Generic Framework for Video Annotation via Semi-Supervised Learning

机译:通过半监督学习的视频注释通用框架

获取原文
获取原文并翻译 | 示例
           

摘要

Learning-based video annotation is essential for video analysis and understanding, and many various approaches have been proposed to avoid the intensive labor costs of purely manual annotation. However, there lacks a generic framework due to several difficulties, such as dependence of domain knowledge, insufficiency of training data, no precise localization and inefficacy for large-scale video dataset. In this paper, we propose a novel approach based on semi-supervised learning by means of information from the Internet for interesting event annotation in videos. Concretely, a Fast Graph-based Semi-Supervised Multiple Instance Learning (FGSSMIL) algorithm, which aims to simultaneously tackle these difficulties in a generic framework for various video domains (e.g., sports, news, and movies), is proposed to jointly explore small-scale expert labeled videos and large-scale unlabeled videos to train the models. The expert labeled videos are obtained from the analysis and alignment of well-structured video related text (e.g., movie scripts, web-casting text, close caption). The unlabeled data are obtained by querying related events from the video search engine (e.g., YouTube, Google) in order to give more distributive information for event modeling. Two critical issues of FGSSMIL are: 1) how to calculate the weight assignment for a graph construction, where the weight of an edge specifies the similarity between two data points. To tackle this problem, we propose a novel Multiple Instance Learning Induced Similarity (MILIS) measure by learning instance sensitive classifiers; 2) how to solve the algorithm efficiently for large-scale dataset through an optimization approach. To address this issue, Concave-Convex Procedure (CCCP) and nonnegative multiplicative updating rule are adopted. We perform the extensive experiments in three popular video domains: movies, sports, and news. The results compared with the state-of-the-arts are promising and demonstrate the effectiveness- and efficiency of our proposed approach.
机译:基于学习的视频注释对于视频分析和理解必不可少,并且已经提出了许多种方法来避免纯粹的手工注释的大量人工成本。然而,由于诸如领域知识的依赖,训练数据的不足,对于大型视频数据集没有精确的定位和效率低下之类的一些困难,缺乏通用的框架。在本文中,我们提出了一种基于半监督学习的新颖方法,该方法利用来自互联网的信息对视频中的有趣事件进行注释。具体而言,提出了一种基于快速图的半监督多实例学习(FGSSMIL)算法,旨在同时解决各种视频领域(例如体育,新闻和电影)的通用框架中的这些困难。大规模专家标记视频和大规模未标记视频来训练模型。带有专家标签的视频是通过对结构良好的视频相关文本(例如电影脚本,网络广播文本,隐藏式字幕)进行分析和对齐而获得的。通过向来自视频搜索引擎(例如YouTube,Google)的相关事件进行查询,可以获得未标记的数据,以便为事件建模提供更多的分布式信息。 FGSSMIL的两个关键问题是:1)如何计算图结构的权重分配,其中边的权重指定两个数据点之间的相似度。为了解决这个问题,我们提出了一种通过学习实例敏感分类器的新颖的多实例学习诱导相似性(MILIS)措施。 2)如何通过优化方法有效地解决大规模数据集的算法。为了解决此问题,采用了凹凸程序(CCCP)和非负乘法更新规则。我们在三个受欢迎的视频领域中进行了广泛的实验:电影,体育和新闻。与最新技术相比,结果令人鼓舞,并证明了我们提出的方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号