首页> 外文期刊>Future generation computer systems >Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain
【24h】

Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain

机译:通过在学术域中采用自然语言处理和机器学习技术来生成知识图表

获取原文
获取原文并翻译 | 示例
           

摘要

The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we (ⅰ) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, (ⅱ) describe an approach for integrating entities and relationships generated by these tools, (ⅲ) show the advantage of such an hybrid system over alternative approaches, and (ⅵ) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge.
机译:科学文献的持续增长带来了创新,同时提出了新的挑战。其中一个与事实有关,即由于需要手动努力和管理的发布纸张,其分析变得困难。需要新颖的技术基础设施来帮助研究人员,研究决策者和公司有效地浏览,分析和预测科学研究。知识图形即,大型实体和关系网络已经证明是在此空间中的有效解决方案。科学知识图表专注于学术域,通常包含描述研究出版物,如作者,场所,组织,研究主题和引文。然而,目前的知识图表缺乏研究论文中提出的知识的明确表示。因此,在本文中,我们提出了一种新的架构,该架构利用自然语言处理和机器学习方法,用于从研究出版物中提取实体和关系,并将它们集成在大规模的知识图中。在这项研究中,我们(Ⅰ)通过采用几种最先进的自然语言处理和文本挖掘工具来解决知识提取的挑战,(Ⅱ)描述了一种用于集成这些工具生成的实体和关系的方法( Ⅲ)显示出这种混合系统通过替代方法的优点,(ⅵ)作为所选用例,我们产生了一种科学知识图,包括109,105个三元组,从语义Web域内的26,827个论文中提取。由于我们的方法是一般的,并且可以应用于任何领域,我们预计它可以促进科学知识的管理,分析,传播和处理。

著录项

  • 来源
    《Future generation computer systems》 |2021年第3期|253-264|共12页
  • 作者单位

    Department of Mathematics and Computer Science University of Cagliari Cagliari Italy FIZ Karlsruhe - Leibniz Institute for Information Infrastructure Germany Karlsruhe Institute of Technology Institute AIFB Germany;

    Knowledge Media Institute The Open University Milton Keynes UK;

    Department of Mathematics and Computer Science University of Cagliari Cagliari Italy;

    LIPN CNRS (UMR 7030) University Paris 13 Villetaneuse France;

    Knowledge Media Institute The Open University Milton Keynes UK;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号