首页> 外文会议>IEEE International Conference on Multimedia and Expo >Lost in segmentation: Three approaches for speechon-speech detection in consumer-produced videos
【24h】

Lost in segmentation: Three approaches for speechon-speech detection in consumer-produced videos

机译:细分失败:消费者制作的视频中语音/非语音检测的三种方法

获取原文

摘要

Traditional speechon-speech segmentation systems have been designed for specific acoustic conditions, such as broadcast news or meetings. However, little research has been done on consumer-produced audio. This type of media is constantly growing and has complex characteristics such as low quality recordings, environmental noise and overlapping sounds. This paper discusses an evaluation of three different approaches for speechon-speech detection on consumer-produced audio. The approaches are state-of-the-art speechon-speech detectors-one based on Gaussian Mixture Models (GMM), another on Support Vector Machines (SVM), and the last on Neural Networks (NN). Using the TRECVID MED 2012 database, we designed training/testing sets combinations to aid the understanding of what speechon-speech detection on consumer-produced media entails and how traditional approaches to this detection performed in this domain. The results revealed that the cross-domain state-of-the-art GMM and SVM systems' tests underperformed a one-layer NN algorithm, which had 20% higher accuracy and computed audio 5 times faster.
机译:传统的语音/非语音分割系统已针对特定的声音条件(例如广播新闻或会议)进行了设计。但是,关于消费者生产的音频的研究很少。这种类型的媒体在不断增长,并具有复杂的特性,例如低质量的录音,环境噪音和重叠的声音。本文讨论了对三种由消费者产生的音频进行语音/非语音检测的方法的评估。这些方法是最先进的语音/非语音检测器,一种基于高斯混合模型(GMM),另一种基于支持向量机(SVM),最后一种基于神经网络(NN)。使用TRECVID MED 2012数据库,我们设计了训练/测试集组合,以帮助理解在消费者生产的媒体上进行哪种语音/非语音检测,以及在该领域如何执行这种检测的传统方法。结果表明,跨域最新的GMM和SVM系统的测试性能不如单层NN算法,后者的精度提高了20%,音频计算速度提高了5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号