首页> 外文会议>The Semantic Web - ASWC 2008 >Catriple: Extracting Triples from Wikipedia Categories
【24h】

Catriple: Extracting Triples from Wikipedia Categories

机译:Catriple:从Wikipedia类别中提取三元组

获取原文
获取原文并翻译 | 示例

摘要

As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from a low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties, which is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve a wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.
机译:作为引导语义网的重要一步,由于其广泛的覆盖范围,良好的组织和丰富的知识,已经做出了许多努力来从Wikipedia中提取三元组。一种重要的三元组是有关Wikipedia文章及其非ISA属性的,例如(北京,国家/地区,中国)。先前的工作试图从Wikipedia信息框,文章文本和类别中提取此类三元组。基于信息框和基于文本的提取方法取决于信息框,并且文章覆盖率较低。相反,基于类别的提取方法利用了广泛的类别。但是,它们依赖于预定义的属性,这非常费力,并且仅探索类别中非常有限的知识。本文自动从较少探索的Wikipedia类别中提取属性和三元组,从而以较少的人工工作来实现更广泛的文章覆盖范围。我们通过利用Wikipedia中的超子类别对带来的语法和语义来设法实现这一目标。我们的原型实现输出约1000万个三元组,其12级置信度介于47.0%至96.4%之间,占Wikipedia文章的78.2%。其中,127万个三元组的置信度为96.4%。应用程序可以按需使用三元组,并具有适当的置信度。

著录项

  • 来源
    《The Semantic Web - ASWC 2008》|2008年|330-344|共15页
  • 会议地点 Bangkok(TH);Bangkok(TH)
  • 作者单位

    Apex Data and Knowledge Management Lab Shanghai Jiao Tong University, Shanghai, 200240, China;

    Apex Data and Knowledge Management Lab Shanghai Jiao Tong University, Shanghai, 200240, China;

    IBM China Research Lab Beijing, 100094, China;

    Apex Data and Knowledge Management Lab Shanghai Jiao Tong University, Shanghai, 200240, China;

    Apex Data and Knowledge Management Lab Shanghai Jiao Tong University, Shanghai, 200240, China;

    IBM China Research Lab Beijing, 100094, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机网络;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号