A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision

机译：基于图稿的监督的基于图的谈话节目视频中说话人检索方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a graph based strategy to retrieve frames containing the queried speakers in talk show videos. Based on who is speaking and when information from the audio transcript, an initial audio-based step, that restricts the queried person to frames corresponding to when he/she is speaking, with a second step that analyzes visual features of shots is combined. Specifically, based on the production property of talk show video, (1) Shot based graph is constructed first. Then the densest sub-graph is returned as the final result. But instead of direct search (DS) of the densest part, (2) We model the intra node connection and inter node connection by a frame layer degree map to take into account the duration information within each shot node; (3)A graph partition strategy without restriction on the shape and the number of sub-graphs is proposed, in which shots containing the same person are more similar to each other. Experiments on one episode of the French talk show "Le Grand Echi-quier" show more than 10% improvement to audio only method and more than 7.5% improvement to DS method on average.

机译：本文提出了一种基于图的策略来检索脱口秀视频中包含查询的说话者的帧。基于谁在说话以及何时来自音频转录本的信息，结合了基于音频的初始步骤，该步骤将被查询者限制为与他/她讲话时相对应的帧，并结合了用于分析镜头视觉特征的第二步。具体地，基于脱口秀视频的生产特性，（1）首先构造基于镜头的图。然后，将最密集的子图作为最终结果返回。但是，而不是直接搜索（DS）的最密集部分，（2）我们通过帧层度图对节点内连接和节点间连接建模，以考虑每个镜头节点内的持续时间信息；（3）提出了一种对子图的形状和数量没有限制的图划分策略，其中包含同一个人的镜头彼此更相似。在法国脱口秀节目“ Le Grand Echi-quier”的一集中进行的实验表明，与仅音频方法相比，改进了10％以上，与DS方法相比平均提高了7.5％。

著录项

来源
《Advances in multimedia information processing - PCM 2009》|2009年|P.962-967|共6页
会议地点 Bangkok(TH);Bangkok(TH)
作者
Yina Han; Guizhong Liu; Hichem Sahbi; Gerard Chollet;
展开▼
作者单位

The School of Electronic and Information Engineering, Xi'an Jiaotong University, 710049, Xi'an, China;

rnThe School of Electronic and Information Engineering, Xi'an Jiaotong University, 710049, Xi'an, China;

rnCNRS LTCIUMR 5141, TELECOM-ParisTech, 75634, Paris, France;

rnCNRS LTCIUMR 5141, TELECOM-ParisTech, 75634, Paris, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
speaker retrieval; talk show video; multi-modality; graph;

机译：说话者检索；脱口秀视频；多模式图形;

相似文献

外文文献
中文文献
专利

1. Computer-aided diagnosis of mammographic masses based on a supervised content-based image retrieval approach [J] . Tsochatzidis Lazaros, Zagoris Konstantinos, Arikidis Nikolaos, Pattern Recognition: The Journal of the Pattern Recognition Society . 2017,第期

机译：基于监督基于内容的图像检索方法的计算机辅助诊断乳房X线肿块
2. A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications [J] . Digital Signal Processing . 2020,第期

机译：基于监督的基于深度卷积的双向短期内记忆视频，用于大规模视频检索应用
3. SPEAKER IDENTIFICATION IN EACH OF THE NEUTRAL AND SHOUTED TALKING ENVIRONMENTS BASED ON GENDER-DEPENDENT APPROACH USING SPHMMS [J] . Ismail Shahin International Journal of Computers & Applications . 2011,第1期

机译：基于性别依赖方法的SPHMMS在中性和喧闹的对话环境中的说话人识别
4. A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision [C] . Yina Han, Guizhong Liu, Hichem Sahbi, Pacific Rim Conference on Multimedia . 2009

机译：基于谈话展览视频的讲话者检索的基于图的曲线图，基于脚本的监督
5. A graph-based approach for modeling and indexing video data [D] . Lee, Jeongkyu 2006

机译：基于图的视频数据建模和索引方法
6. Holistic feedback approach with video and peer discussion under teacher supervision [O] . Agra Dilshani Hunukumbure, Susan F Smith, Saroj Das 2017

机译：在老师的监督下采用视频和同伴讨论的整体反馈方法
7. Speaker Identification in each of the Neutral and Shouted Talking Environments based on Gender-Dependent Approach Using SPHMMs [O] . Shahin, Ismail 2017

机译：每个中立和高喊的说话者中的说话人识别基于性别依赖方法的环境使用spHmm

A Graph Based Approach to Speaker Retrieval in Talk Show Videos with Transcript-Based Supervision

摘要

著录项

相似文献

相关主题

期刊订阅