Modeling audio directional statistics using a probabilistic spatial dictionary for speaker diarization in real meetings

机译：使用概率空间字典对音频方向统计进行建模，以在实际会议中进行演讲者区分

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker diarization is the task of estimating “who spoke when” in a meeting. To realize accurate diarization for real meetings, we have to deal with noise, speaker overlap, reverberation, etc. In this work, we propose to model directional statistics of spatial clusters via a dictionary of probabilistic models. The dictionary is trained using spatial features of possible source locations. Observed mixtures of multiple source signals are statistically represented as the weighted sum of the trained models, where each weight defines the activity of a source associated with a spatial location or a cluster. To detect the active clusters and perform the speaker diarization, the weights are estimated by applying Bayes' rule. Furthermore, a Laplace distribution is proposed to model the background noise. The proposed method was evaluated in real meetings, and it provided high performance comparing to a baseline method.

机译：演讲者差异化是估计会议中“谁在何时发言”的任务。为了实现真实会议的精确二值化，我们必须处理噪声，说话者重叠，混响等问题。在这项工作中，我们建议通过概率模型字典为空间簇的方向统计建模。使用可能的源位置的空间特征来训练字典。观察到的多个源信号的混合在统计上表示为训练模型的加权总和，其中每个权重定义与空间位置或群集关联的源的活动。为了检测活动集群并执行说话者区分，通过应用贝叶斯规则来估计权重。此外，提出了一个拉普拉斯分布来模拟背景噪声。所提议的方法在实际会议中进行了评估，与基线方法相比，它具有很高的性能。

著录项

来源
《2016 IEEE International Workshop on Acoustic Signal Enhancement》|2016年|1-5|共5页
会议地点 Xian(CH)
作者
Mahmoud Fakhry; Nobutaka Ito; Shoko Araki; Tomohiro Nakatani;
展开▼
作者单位

NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Dictionaries; Estimation; Time-frequency analysis; Training; Noise measurement; Speech; Testing;

机译：词典;估计;时频分析;培训;噪声测量;语音;测试;

相似文献

外文文献
中文文献
专利

1. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
2. Development Of A Speaker Diarization System For Speaker Tracking In Audio Broadcast News: A Case Study [J] . Janez Zibert, Bostjan Vesnicer, France Mihelic Journal of Computing and Information Technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者差异化系统的开发：一个案例研究
3. Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information [J] . Ishiguro K., Yamada T., Araki S., Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第2期

机译：说话者角度信息的词袋表示概率的说话人区分
4. Modeling audio directional statistics using a probabilistic spatial dictionary for speaker diarization in real meetings [C] . Mahmoud Fakhry, Nobutaka Ito, Shoko Araki, IEEE International Workshop on Acoustic Signal Enhancement . 2016

机译：使用概率空间字典建模音频定向统计信息讲话中的扬声器简化在真正的会议中
5. Use of speaker location features in meeting diarization. [D] . Otterson, Scott. 2008

机译：会议发言者使用语音定位功能。
6. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model [O] . Rehan Ahmad, Syed Zubair, Hani Alquhayz, 2019

机译：使用预训练的视听同步模型进行多模态扬声器二分法
7. SPEAKER DIARIZATION OF MEETINGS BASED ON SPEAKER ROLE N-GRAM MODELS [O] . Fabio Valente, Deepu Vijayasenan, Petr Motlicek 2015

机译：基于扬声器角度N-GRam模型的会议扬声器演示
8. Speaker Indexing in Large Audio Databases Using Anchor Models [R] . Sturim, D. E., Reynolds, D. A., Singer, E., 2001

机译：使用锚模型在大型音频数据库中进行扬声器索引

Modeling audio directional statistics using a probabilistic spatial dictionary for speaker diarization in real meetings

摘要

著录项

相似文献

相关主题

期刊订阅