...
首页> 外文期刊>Information retrieval >Multilingual modeling of cross-lingual spelling variants
【24h】

Multilingual modeling of cross-lingual spelling variants

机译:跨语言拼写变体的多语言建模

获取原文
获取原文并翻译 | 示例
           

摘要

Technical term translations are important for cross-lingual information retrieval. In many languages, new technical terms have a common origin rendered with different spelling of the underlying sounds, also known as cross-lingual spelling variants (CLSV). To find the best CLSV in a text database index, we contribute a formulation of the problem in a probabilistic framework, and implement this with an instance of the general edit distance using weighted finite-state transducers. Some training data is required when estimating the costs for the general edit distance. We demonstrate that after some basic training our new multilingual model is robust and requires little or no adaptation for covering additional languages, as the model takes advantage of language independent transliteration patterns. We train the model with medical terms in seven languages and test it with terms from varied domains in six languages. Two test languages are not in the training data. Against a large text database index, we achieve 64-78 % precision at the point of 100% recall. This is a relative improvement of 22% on the simple edit distance.
机译:技术术语翻译对于跨语言信息检索非常重要。在许多语言中,新的技术术语具有共同的起源,即使用不同的基础声音拼写来表示,也称为跨语言拼写变体(CLSV)。为了在文本数据库索引中找到最佳的CLSV,我们在概率框架中提出了问题的表述,并使用加权有限状态传感器通过一般编辑距离的实例来实现该问题。在估算一般编辑距离的成本时,需要一些训练数据。我们证明,经过一些基础培训后,我们的新多语言模型很健壮,并且几乎不需要适应任何其他语言即可适应,因为该模型利用了独立于语言的音译模式。我们使用7种语言的医学术语训练模型,并使用6种语言的不同领域的术语对其进行测试。训练数据中没有两种测试语言。针对大型文本数据库索引,我们在100%召回率时达到了64-78%的精度。与简单编辑距离相比,这是22%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号