【24h】

Speech endpoint detection with non-language speech sounds for generic speech processing applications

机译:具有非语言语音的语音端点检测,适用于一般语音处理应用

获取原文
获取原文并翻译 | 示例

摘要

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
机译:非语言语音(NLSS)是由人类产生的不携带语言信息的声音。这些声音的例子包括咳嗽,喀哒声,呼吸和英语中的“ uh”和“ um”之类的停顿。 NLSS在会话语音中很突出,但是在语音处理应用程序中可能会导致大量错误。传统上,这些声音会被语音端点检测算法忽略,其中在处理之前先在音频信号中识别语音区域。作为预处理步骤对NLSS进行过滤的功能可以显着提高许多语音处理应用程序的性能,例如说话者识别,语言识别和自动语音识别。为了在所有此类应用中使用,必须在不使用提供语音语音和词法结构知识的语言模型的情况下执行NLSS检测。这尤其适用于音频先验语言未知的情况。我们介绍了使用来自美国和英国英语的数据进行初步实验的结果,其中音频片段使用为语言不可知的NLSS检测和隐马尔可夫设计的一组声学特征被分类为语言语音(LSS)或NLSS模型(HMM)对语音生成进行建模。这些实验的结果表明,所使用的功能和模型能够检测某些类型的NLSS,例如呼吸和滴答声,而检测其他类型的NLSS(例如,充满的停顿)则需要进一步的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号