首页> 外文会议>9th International conference on language resources and evaluation >Creating and using large monolingual parallel corpora for sentential paraphrase generation
【24h】

Creating and using large monolingual parallel corpora for sentential paraphrase generation

机译:创建和使用大型单格式并行语料库的表示递拉释录

获取原文

摘要

In this paper we investigate the automatic generation of paraphrases by using machine translation techniques. Three contributions we make are the construction of a large paraphrase corpus for English and Dutch, a re-ranking heuristic to use machine translation for paraphrase generation and a proper evaluation methodology. A large parallel corpus is constructed by aligning clustered headlines that are scraped from a news aggregator site. To generate sentential paraphrases we use a standard phrase-based machine translation (PBMT) framework modified with a re-ranking component (henceforth PBMT-R). We demonstrate this approach for Dutch and English and evaluate by using human judgements collected from 76 participants. The judgments are compared to two automatic machine translation evaluation metrics. We observe that as the paraphrases deviate more from the source sentence, the performance of the PBMT-R system degrades less than that of the word substitution baseline system.
机译:在本文中,我们通过使用机器翻译技术来研究自动生成释义。我们制作的三个贡献是建造英语和荷兰语的大型释义语料库,这是一种重新排名的启发式,用于使用机器翻译以用于解释生成和适当的评估方法。通过对齐从新闻聚合器站点刮擦的集群标题来构建大的并行语料库。要生成句子释义,我们使用重新排名组件(Hustichforth PBMT-R)修改的基于词组的机器翻译(PBMT)框架。我们展示了荷兰语和英语的方法,并通过使用76名参与者收集的人类判断来评估。将判断与两个自动机器翻译评估指标进行比较。我们观察到,随着释义偏离源句,PBMT-R系统的性能降低了比单词替换基线系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号