Speech endpoint detection with non-language speech sounds for generic speech processing applications

机译：具有非语言语音的语音端点检测，适用于一般语音处理应用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

机译：非语言语音（NLSS）是由人类产生的不携带语言信息的声音。这些声音的例子包括咳嗽，喀哒声，呼吸和英语中的“ uh”和“ um”之类的停顿。 NLSS在会话语音中很突出，但是在语音处理应用程序中可能会导致大量错误。传统上，这些声音会被语音端点检测算法忽略，其中在处理之前先在音频信号中识别语音区域。作为预处理步骤对NLSS进行过滤的功能可以显着提高许多语音处理应用程序的性能，例如说话者识别，语言识别和自动语音识别。为了在所有此类应用中使用，必须在不使用提供语音语音和词法结构知识的语言模型的情况下执行NLSS检测。这尤其适用于音频先验语言未知的情况。我们介绍了使用来自美国和英国英语的数据进行初步实验的结果，其中音频片段使用为语言不可知的NLSS检测和隐马尔可夫设计的一组声学特征被分类为语言语音（LSS）或NLSS模型（HMM）对语音生成进行建模。这些实验的结果表明，所使用的功能和模型能够检测某些类型的NLSS，例如呼吸和滴答声，而检测其他类型的NLSS（例如，充满的停顿）则需要进一步的研究。

著录项

来源
《Conference on sensors, and command, control, communications, and intelligence (C3I) technologies for homeland security and homeland defense VIII; 20090415-17; Orlando, FL(US)》|2009年|P.73051B.1-73051B.9|共9页
会议地点 Orlando FL(US)
作者
Matthew McClain; Brian Romanowski;
展开▼
作者单位

21st Century Technologies, 4515 Seton Center Pkwy, Suite 320, Austin, TX, USA 78759;

21st Century Technologies, 4515 Seton Center Pkwy, Suite 320, Austin, TX, USA 78759;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类无线电电子学的应用;战略进攻与防御;
关键词
speech processing; speech endpoint detection; conversational speech analysis; non-language speech sounds;

机译：语音处理;语音端点检测;会话语音分析;非语言语音;

相似文献

外文文献
中文文献
专利

1. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive Computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
2. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
3. Reducing noise mixed with speech sound by applying image processing techniques to the spectrogram -separation of speech sound form noise within a car [J] . Kensaku Asahi, Yuji Sagawa, Noboru Sugie 電子情報通信学会技術研究報告. ITS. Intelligent Transport Systems Technology . 2002,第233期

机译：通过将图像处理技术应用于声谱图来减少与语音混合的噪声-分离车内语音形式的噪声
4. Speech endpoint detection with non-language speech sounds for generic speech processing applications [C] . Matthew McClain, Brian Romanowski SPIE Conference on Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense . 2009

机译：具有非语言语音声音的语音端点检测，用于通用语音处理应用程序
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. Functional Brain Activation Differences in School-Age Children with Speech Sound Errors: Speech and Print Processing [O] . Jonathan L. Preston, Susan Felsenfeld, Stephen J. Frost, -1

机译：有语音错误的学龄儿童的功能性大脑激活差异：语音和打印处理
7. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement [O] . Cadore Joyner, Valverde-Albacete Francisco J., Gallardo-Antolín Ascensión, 2012

机译：听觉启发的语音频谱图形态处理：自动语音识别和语音增强中的应用
8. Speech Synthesis from Short-Time Fourier Transform Magnitude and Its Application to Speech Processing [R] . Griffin, D. W., Deadrick, D. S., Lim, J. S. 1984

机译：短时傅立叶变换幅度的语音合成及其在语音处理中的应用

Speech endpoint detection with non-language speech sounds for generic speech processing applications

摘要

著录项

相似文献

相关主题

期刊订阅