...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >GrantExtractor: Accurate Grant Support Information Extraction from Biomedical Fulltext Based on Bi-LSTM-CRF
【24h】

GrantExtractor: Accurate Grant Support Information Extraction from Biomedical Fulltext Based on Bi-LSTM-CRF

机译:授予申请商:基于Bi-LSTM-CRF的生物医学全文提取准确授权支持信息

获取原文
获取原文并翻译 | 示例
           

摘要

Grant support (GS) in the MEDLINE database refers to funding agencies and contract numbers. It is important for funding organizations to track their funcing outcomes from the GS information. As such, how to accurately and automatically extract funding information from biomedical literature is challenging. In this paper, we present a pipeline system called GrantExtractor that is able to accurately extract GS information from fulltext biomedical literature. GrantExtractor effectively integrates several advanced machine learning techniques. In particular, we use a sentence classifier to identify funcing sentences from articles first. A bi-directional LSTM and the CRF layer (BiLSTM-CRF), and pattern matching are then used to extract entities of grant numbers and agencies from these identified funcing sentences. After removing noisy numbers by a multi-class model, we finally match each grant number with its corresponding agency. Experimental results on benchmark datasets have demonstrated that GrantExtractor clearly outperforms all baseline methods. It is further evident that GrantExtractor won the first place in Task 5C of 2017 BioASQ challenge, with achieving the Micro-recall of 0.9526 for 22,610 articles. Moreover, GrantExtractor has achieved the Micro F-measure score as high as 0.90 in extracting grant pairs.
机译:MEDLINE数据库中的授予支持(GS)是指资金代理商和合同号。资助组织履行从GS信息的漏洞成果非常重要。因此,如何准确和自动从生物医学文献中提取资金信息是具有挑战性的。在本文中,我们提出了一个称为授权申请商的管道系统,可以精确地从全文生物医学文献中提取GS信息。授权拨射器有效地集成了几种先进的机器学习技术。特别是,我们使用句子分类器首先从文章中识别漏洞句。然后,双向LSTM和CRF层(BILSTM-CRF),以及模式匹配将从这些识别的漏洞句中提取授权号和机构的实体。通过多级模型删除嘈杂的数字后,我们最终将每个授予号码与其相应的代理相匹配。基准数据集的实验结果表明,授予拨射器显然优于所有基线方法。进一步明显,授予拨射仪在2017年Bioasq挑战的任务5C中获得的第一名,实现了22,610篇文章的微召回0.9526。此外,授予拨射仪在提取授权对中实现了高达0.90的微量测量得分。

著录项

  • 来源
  • 作者单位

    Fudan Univ Sch Comp Sci Shanghai 200433 Peoples R China|Fudan Univ Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China;

    Fudan Univ Sch Comp Sci Shanghai 200433 Peoples R China|Fudan Univ Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China;

    Fudan Univ Sch Comp Sci Shanghai 200433 Peoples R China|Fudan Univ Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China;

    Fudan Univ Sch Data Sci Shanghai 200433 Peoples R China;

    Charles Sturt Univ Dept Comp & Math Albury NSW 2640 Australia;

    Fudan Univ Sch Comp Sci Shanghai 200433 Peoples R China|Fudan Univ Shanghai Key Lab Intelligent Informat Proc Shanghai 200433 Peoples R China|Fudan Univ Inst Sci & Technol Brain Inspired Intelligence Shanghai 200433 Peoples R China|Fudan Univ Key Lab Computat Neurosci & Brain Inspired Intell Minist Educ Shanghai 200433 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Grant support; BiLSTM-CRF; biomedical text mining; information extraction; biomedical fulltext;

    机译:授予支持;Bilstm-CRF;生物医学文本挖掘;信息提取;生物医学全文;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号