首页> 外文期刊>Information Systems >Cross-lingual entity matching and infobox alignment in Wikipedia
【24h】

Cross-lingual entity matching and infobox alignment in Wikipedia

机译:维基百科中的跨语言实体匹配和信息框对齐

获取原文
获取原文并翻译 | 示例
           

摘要

Wikipedia has grown to a huge, multi-lingual source of encyclopedic knowledge. Apart from textual content, a large and ever-increasing number of articles feature so-called infoboxes, which provide factual information about the articles' subjects. As the different language versions evolve independently, they provide different information on the same topics. Correspondences between infobox attributes in different language editions can be leveraged for several use cases, such as automatic detection and resolution of inconsistencies in infobox data across language versions, or the automatic augmentation of infoboxes in one language with data from other language versions. We present an instance-based schema matching technique that exploits information overlap in infoboxes across different language editions. As a prerequisite we present a graph-based approach to identify articles in different languages representing the same real-world entity using (and correcting) the interlanguage links in Wikipedia. To account for the untyped nature of infobox schemas, we present a robust similarity measure that can reliably quantify the similarity of strings with mixed types of data. The qualitative evaluation on the basis of manually labeled attribute correspondences between infoboxes in four of the largest Wikipedia editions demonstrates the effectiveness of the proposed approach.
机译:维基百科已经发展成为庞大的,多语言的百科知识源。除了文本内容之外,大量且不断增加的文章还具有所谓的信息框,该信息框提供有关文章主题的事实信息。随着不同语言版本的独立发展,它们在相同主题上提供了不同的信息。可以在几种用例中利用不同语言版本中信息框属性之间的对应关系,例如自动检测和解决跨语言版本的信息框数据中的不一致之处,或者自动将一种语言的信息框与其他语言版本的数据一起进行扩充。我们提出了一种基于实例的模式匹配技术,该技术可利用不同语言版本的信息框中的信息重叠。作为前提,我们提供了一种基于图的方法,可以使用Wikipedia中的中间语言链接来识别代表同一真实世界实体的不同语言的文章。为了说明信息框模式的无类型性质,我们提出了一种鲁棒的相似性度量,可以可靠地量化具有混合类型数据的字符串的相似性。基于四个最大的Wikipedia版本中信息框之间手动标记的属性对应关系的定性评估证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号