Lost in segmentation: Three approaches for speechon-speech detection in consumer-produced videos

机译：细分失败：消费者制作的视频中语音/非语音检测的三种方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional speechon-speech segmentation systems have been designed for specific acoustic conditions, such as broadcast news or meetings. However, little research has been done on consumer-produced audio. This type of media is constantly growing and has complex characteristics such as low quality recordings, environmental noise and overlapping sounds. This paper discusses an evaluation of three different approaches for speechon-speech detection on consumer-produced audio. The approaches are state-of-the-art speechon-speech detectors-one based on Gaussian Mixture Models (GMM), another on Support Vector Machines (SVM), and the last on Neural Networks (NN). Using the TRECVID MED 2012 database, we designed training/testing sets combinations to aid the understanding of what speechon-speech detection on consumer-produced media entails and how traditional approaches to this detection performed in this domain. The results revealed that the cross-domain state-of-the-art GMM and SVM systems' tests underperformed a one-layer NN algorithm, which had 20% higher accuracy and computed audio 5 times faster.

机译：传统的语音/非语音分割系统已针对特定的声音条件（例如广播新闻或会议）进行了设计。但是，关于消费者生产的音频的研究很少。这种类型的媒体在不断增长，并具有复杂的特性，例如低质量的录音，环境噪音和重叠的声音。本文讨论了对三种由消费者产生的音频进行语音/非语音检测的方法的评估。这些方法是最先进的语音/非语音检测器，一种基于高斯混合模型（GMM），另一种基于支持向量机（SVM），最后一种基于神经网络（NN）。使用TRECVID MED 2012数据库，我们设计了训练/测试集组合，以帮助理解在消费者生产的媒体上进行哪种语音/非语音检测，以及在该领域如何执行这种检测的传统方法。结果表明，跨域最新的GMM和SVM系统的测试性能不如单层NN算法，后者的精度提高了20％，音频计算速度提高了5倍。

著录项

来源
《IEEE International Conference on Multimedia and Expo》|2013年|1-6|共6页
会议地点
作者
Elizalde Benjamin; Friedland Gerald;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
audio segmentation; gmm; speech non-speech; svm; user-generated content neural networks;

机译：音频分割gmm语音非语音svm用户生成的内容神经网络;

相似文献

外文文献
中文文献
专利

1. Multi-pathology detection and lesion localization in WCE videos by using the instance segmentation approach [J] . Vieira Pedro M., Freitas Nuno R., Lima Verissimo B., Artificial intelligence in medicine . 2021,第Sepa期

机译：使用实例分割方法WCE视频的多病理检测和病变定位
2. SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos [J] . Na Feng, Zikai Song, Junqing Yu, Multimedia Tools and Applications . 2020,第39a40期

机译：SET：用于拍摄分割，事件检测，足球视频中的播放器跟踪的数据集
3. GPU accelerated face detection from low resolution surveillance videos using motion and skin color segmentation [J] . Mutneja Vikram, Singh Satvir Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2018,第期

机译：使用运动和肤色细分，GPU从低分辨率监视视频中加速面部检测
4. LOST IN SEGMENTATION: THREE APPROACHES FOR SPEECH/NON-SPEECH DETECTION IN CONSUMER-PRODUCED VIDEOS [C] . Benjamin Elizalde, Gerald Friedland IEEE International Conference on Multimedia and Expo . 2013

机译：在分割中丢失：在消费者生产的视频中的语音/非语音检测三种方法
5. Video indexing and retrieval techniques using novel approaches to video segmentation, characterization, and similarity matching. [D] . Farag, Waleed Ezzat. 2002

机译：视频索引和检索技术使用了新颖的视频分割，表征和相似度匹配方法。
6. Hand tremor detection in videos with cluttered background using neural network based approaches [O] . Xinyi Wang, Saurabh Garg, Son N. Tran, 2021

机译：使用基于神经网络的方法的杂乱背景手中的颤抖检测
7. Video Instance Segmentation 2019: A Winning Approach for Combined Detection, Segmentation, Classification and Tracking. [O] . Jonathon Luiten, Philip Torr, Bastian Leibe 2019

机译：视频实例分段2019：一种用于组合检测，分割，分类和跟踪的获胜方法。

Lost in segmentation: Three approaches for speechon-speech detection in consumer-produced videos

摘要

著录项

相似文献

相关主题

期刊订阅