首页> 中文期刊> 《计算机工程与应用》 >面向短语统计机器翻译的汉日联合分词研究

面向短语统计机器翻译的汉日联合分词研究

         

摘要

Unknown words and word segmentation granularity are two main problems for Chinese-Japanese machine translation. Word segmentation is the first important step for Chinese and Japanese natural language processing. As Chi-nese and Japanese word segmentation is processed with different tagging system and semantic performance, the granularity of word segmentation results should be readjusted to improve the performance of Statistical Machine Translation(SMT). This paper proposes an approach to adjust the word segmentation granularity for improving the performance of SMT, which combines Hanzi-Kanji comparison table and Japanese-Chinese dictionary. Experimental results express that the pro-posed method could adjust the granularity between Chinese and Japanese effectively and improve the performance of SMT. This paper analyses the experimental results and discusses the effect of joint Chinese-Japanese word segmentation granularity for phrase-based SMT.%未登录词与分词粒度是汉日日汉机器翻译研究的两个主要问题。与英语等西方语言不同,汉语与日语词语间不存在空格,分词为汉日双语处理的重要工作。由于词性标注体系、文法及语义表现上的差异,分词结果的粒度需要进一步调整,以改善统计机器翻译系统的性能。提出了面向统计机器翻译的基于汉日汉字对照表及日汉词典信息的汉语与日语的分词粒度调整方法。实验结果表明,该方法能有效地调节源语言和目标语言端的分词粒度,提高统计机器翻译系统的性能。通过对比实验结果,分析探讨分词粒度对汉日双语统计系统性能的影响。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号