...
首页> 外文期刊>Systems and Computers in Japan >Automatic Extraction of Bilingual Word Pairs from Parallel Corpora with Various Languages Using Learning for Adjacent Information
【24h】

Automatic Extraction of Bilingual Word Pairs from Parallel Corpora with Various Languages Using Learning for Adjacent Information

机译:通过学习相邻信息自动从平行语料库中提取双语单词对

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a learning method using adjacent information as the method to extract bilingual word pairs efficiently from parallel corpora with various languages for which language resources are insufficient. In our method, information about correspondence between source language words and target language words is acquired automatically using the word strings that adjoin bilingual word pairs. That acquired information is used to solve the ambiguity problem of correspondence between source language words and target language words in various bilingual sentence pairs. First, the system using our method automatically acquires templates as information that indicates correspondence between source language words and target language words. The templates are based on word strings that adjoin the bilingual word pairs. Moreover, the system using our method efficiently extracts bilingual word pairs from bilingual sentence pairs using the acquired templates. Evaluation experiments showed that the system using our method extracted bilingual word pairs from parallel corpora with five kinds of languages. Results show that the total extraction rate was 60.1%. The total extraction rate was better by 8.0 percentage points compared to that obtained using a system based only on the Dice coefficient without our method. Those results confirm the effectiveness of our method.
机译:本文提出了一种学习方法,该方法利用相邻信息作为从语言资源不足的各种语言的并行语料库中高效提取双语单词对的方法。在我们的方法中,使用与双语单词对相邻的单词字符串自动获取有关源语言单词和目标语言单词之间的对应关系的信息。所获取的信息用于解决各种双语句子对中源语言词与目标语言词之间的对应性歧义问题。首先,使用我们的方法的系统自动获取模板,作为指示源语言单词和目标语言单词之间的对应关系的信息。模板基于与双语单词对相邻的单词串。此外,使用我们的方法的系统使用获取的模板从双语句子对中高效提取双语单词对。评估实验表明,该系统采用五种语言从并行语料库中提取了双语单词对。结果表明,总提取率为60.1%。与仅使用Dice系数而不使用我们的方法的系统相比,总提取率提高了8.0个百分点。这些结果证实了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号