首页> 外文学位 >Speech repairs, intonational boundaries and discourse markers: Modeling speakers' utterances in spoken dialog.
【24h】

Speech repairs, intonational boundaries and discourse markers: Modeling speakers' utterances in spoken dialog.

机译:语音修复,国际边界​​和话语标记:在语音对话中模拟说话者的话语。

获取原文
获取原文并翻译 | 示例

摘要

Interactive spoken dialog provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting a speaker's turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where the speaker goes back and changes (or repeats) something she just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. The two problems of segmenting the turn into utterances and resolving speech repairs are strongly intertwined with a third problem: identifying discourse markers. Lexical items that can function as discourse markers, such as "well" and "okay," are ambiguous as to whether they are introducing an utterance unit, signaling a speech repair, or are simply part of the context of an utterance, as in "that's okay." Spoken dialog systems need to address these three issues together and early on in the processing stream. In fact, just as these three issues are closely intertwined with each other, they are also intertwined with identifying the syntactic role or part-of-speech (POS) of each word and the speech recognition problem of predicting the next word given the previous words.; In this thesis, we present a statistical language model for resolving these issues. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allows it to better predict the words involved, since we are able to make use of the predictions of boundary tones, discourse markers and speech repairs to better account for what word will occur next. Furthermore, we can take advantage of acoustic information, such as silence information, which tends to co-occur with speech repairs and intonational phrase endings, that current language models can only regard as noise in the acoustic signal. The output of this language model is a much fuller account of the speaker's turn, with part-of-speech assigned to each word, intonation phrase endings and discourse markers identified, and speech repairs detected and corrected. In fact, the identification of the intonational phrase endings, discourse markers, and resolution of the speech repairs allows the speech recognizer to model the speaker's utterances, rather than simply the words involved, and thus it can return a more meaningful analysis of the speaker's turn for later processing.
机译:交互式语音对话为自然语言理解系统带来了许多新挑战。最关键的挑战之一就是简单地确定说话者的意图话语:既将讲话者的转弯分割成话语,又要确定每个话语中的意图词。即使假设完全的单词识别,后一个问题也会因语音修复而变得复杂,语音修复发生在说话者后退并改变(或重复)她刚刚说过的话的地方。被替换或重复的单词不再是预期发音的一部分,因此需要标识。将转弯分为语音和解决语音修复这两个问题与第三个问题紧密联系在一起:识别话语标记。可以用作话语标记的词汇项目,例如“好”和“好”,对于它们是引入话语单元,发出语音修复信号,还是仅仅是话语上下文的一部分(如“没关系。”口语对话系统需要在处理流程中尽早解决这三个问题。实际上,正如这三个问题相互紧密联系在一起一样,它们也与确定每个单词的句法作用或词性(POS)以及在给定前一个单词的情况下预测下一个单词的语音识别问题缠绕在一起。;本文提出了一种统计语言模型来解决这些问题。我们没有为语音信号找到最佳的单词解释,而是将语音识别问题重新定义为,以便它还能识别POS标签,话语标记,语音修复和国际化的短语结尾(确定发声单位的主要提示)。将这些额外的元素添加到语音识别问题中实际上可以使其更好地预测所涉及的单词,因为我们能够利用边界音调,语篇标记和语音修复的预测来更好地说明接下来将要出现的单词。此外,我们可以利用声学信息(例如沉默信息),这些信息往往与语音修复和国际化的短语结尾同时出现,而当前的语言模型只能视为声学信号中的噪声。该语言模型的输出更加全面地说明了讲话者的讲话情况,将词性分配给每个单词,确定语调短语的结尾和语篇标记,并检测并纠正语音修复。实际上,识别国际化的短语结尾,话语标记和语音修复的解决方案,使语音识别器可以对讲话者的话语进行建模,而不仅仅是对涉及的单词进行建模,因此可以返回对讲话者转弯的更有意义的分析供以后处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号