首页> 外文会议>International conference on web information systems engineering >A Multilingual Approach to Discover Cross-Language Links in Wikipedia
【24h】

A Multilingual Approach to Discover Cross-Language Links in Wikipedia

机译:多语言方法发现维基百科跨语言链接

获取原文

摘要

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications axe the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, we present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.
机译:维基百科是一家着名的公共和协作百科全书,包括数百万条文章。最初用英语,流行的网站已经增长,包括超过288种语言的版本。这些版本及其文章是通过跨语言链接互连的,这不仅促进了以多种语言的概念导航和理解,而是已用于自然语言处理应用程序,链接开放数据的开发,并扩展次要维基百科语言版本。这些应用程序AX为识别交叉语言链接的自动,鲁棒和准确的技术的动机。在本文中,我们提出了一种称为Eurekacl的多语言方法,可以自动识别维基百科的缺失的跨语言链接。更准确地说,给定维基百科文章(源)Eurekacl使用Babelnet 2.0的多语言和语义特征,以便有效地识别可能涵盖与源相同主题相同主题的目标语言集的一组候选文章。然后,维基百科图形结构既可以审聚并对候选人进行排名。我们的评估在八种语言版本的维基百科的42,000对文章中显示,我们的候选人选择和修剪程序允许有效选择候选人,这显着帮助确定了目标语言版本的正确文章。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号