首页> 外文会议>Advances in multimedia information processing - PCM 2009 >A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision
【24h】

A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision

机译:基于图稿的监督的基于图的谈话节目视频中说话人检索方法

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a graph based strategy to retrieve frames containing the queried speakers in talk show videos. Based on who is speaking and when information from the audio transcript, an initial audio-based step, that restricts the queried person to frames corresponding to when he/she is speaking, with a second step that analyzes visual features of shots is combined. Specifically, based on the production property of talk show video, (1) Shot based graph is constructed first. Then the densest sub-graph is returned as the final result. But instead of direct search (DS) of the densest part, (2) We model the intra node connection and inter node connection by a frame layer degree map to take into account the duration information within each shot node; (3)A graph partition strategy without restriction on the shape and the number of sub-graphs is proposed, in which shots containing the same person are more similar to each other. Experiments on one episode of the French talk show "Le Grand Echi-quier" show more than 10% improvement to audio only method and more than 7.5% improvement to DS method on average.
机译:本文提出了一种基于图的策略来检索脱口秀视频中包含查询的说话者的帧。基于谁在说话以及何时来自音频转录本的信息,结合了基于音频的初始步骤,该步骤将被查询者限制为与他/她讲话时相对应的帧,并结合了用于分析镜头视觉特征的第二步。具体地,基于脱口秀视频的生产特性,(1)首先构造基于镜头的图。然后,将最密集的子图作为最终结果返回。但是,而不是直接搜索(DS)的最密集部分,(2)我们通过帧层度图对节点内连接和节点间连接建模,以考虑每个镜头节点内的持续时间信息; (3)提出了一种对子图的形状和数量没有限制的图划分策略,其中包含同一个人的镜头彼此更相似。在法国脱口秀节目“ Le Grand Echi-quier”的一集中进行的实验表明,与仅音频方法相比,改进了10%以上,与DS方法相比平均提高了7.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号