...
首页> 外文期刊>Computer speech and language >Paraphrastic language models
【24h】

Paraphrastic language models

机译:副语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when using n-gram language models (LMs). This paper proposes a novel form of language model, the paraphrastic LM, that addresses these issues. A phrase level paraphrase model statistically learned from standard text data with no semantic annotation is used to generate multiple paraphrase variants. LM probabilities are then estimated by maximizing their marginal probability. Multi-level language models estimated at both the word level and the phrase level are combined. An efficient weighted finite state transducer (WFST) based paraphrase generation approach is also presented. Significant error rate reductions of 0.5-0.6% absolute were obtained over the baseline n-gram LMs on two state-of-the-art recognition tasks for English conversational telephone speech and Mandarin Chinese broadcast speech using a paraphrastic multi-level LM modelling both word and phrase sequences. When it is further combined with word and phrase level feed-forward neural network LMs, a significant error rate reduction of 0.9% absolute (9% relative) and 0.5% absolute (5% relative) were obtained over the baseline n-gram and neural network LMs respectively.
机译:自然语言以其丰富的表现力而闻名。许多句子可以用来表示相同的基本含义。仅对观察到的表面单词序列进行建模可能会导致较差的上下文覆盖和泛化,例如,在使用n-gram语言模型(LM)时。本文提出了一种新颖的语言模型形式,即意谓LM,来解决这些问题。从没有语义注释的标准文本数据中统计获取的短语级别复述模型用于生成多个复述变体。然后,通过最大化边缘概率来估计LM概率。组合在单词级别和短语级别估计的多级语言模型。还提出了一种有效的基于加权有限状态传感器(WFST)的复述生成方法。在两个会话识别的最先进的英语会话电话语音和普通话广播语音识别技术的两个最先进的识别任务上,与基线n-gram LM相比,绝对误差率降低了0.5-0.6%和短语序列。当它与单词和短语级别的前馈神经网络LM进一步结合时,在基线n-gram和神经元上的错误率分别降低了0.9%绝对(9%相对)和0.5%绝对(5%相对)。网络LM。

著录项

  • 来源
    《Computer speech and language》 |2014年第6期|1298-1316|共19页
  • 作者单位

    Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England;

    Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England;

    Cambridge University, Engineering Department, Trumpington Street, Cambridge CB2 1PZ, England;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Language modelling; Paraphrase; Speech recognition;

    机译:语言建模;释义语音识别;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号