首页> 美国卫生研究院文献>other >RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information
【2h】

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information

机译:RLIMS-P 2.0:用于蛋白质磷酸化信息文献挖掘的可扩展的基于规则的信息提取系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We introduce RLIMS-P version 2.0, an enhanced rule-based information extraction (IE) system for mining kinase, substrate, and phosphorylation site information from scientific literature. Consisting of natural language processing and IE modules, the system has integrated several new features, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs). To evaluate the system, sets of abstracts and full-text articles, containing a variety of textual expressions, were annotated. On the abstract corpus, the system achieved F-scores of 0.91, 0.92, and 0.95 for kinases, substrates, and sites, respectively. The corresponding scores on the full-text corpus were 0.88, 0.91, and 0.92. It was additionally evaluated on the corpus of the 2013 BioNLP-ST GE task, and achieved an F-score of 0.87 for the phosphorylation core task, improving upon the results previously reported on the corpus. Full-scale processing of all abstracts in MEDLINE and all articles in PubMed Central Open Access Subset has demonstrated scalability for mining rich information in literature, enabling its adoption for biocuration and for knowledge discovery. The new system is generalizable and it will be adapted to tackle other major PTM types. RLIMS-P 2.0 online system is available online () and the developed corpora are available from iProLINK ().
机译:我们引入RLIMS-P 2.0版,这是一个增强的基于规则的信息提取(IE)系统,用于从科学文献中挖掘激酶,底物和磷酸化位点信息。该系统由自然语言处理和IE模块组成,集成了几个新功能,包括处理全文文章的功能和针对不同翻译后修饰(PTM)的通用性。为了评估该系统,对包含各种文本表达的摘要和全文文章进行了注释。在抽象语料库上,该系统的激酶,底物和位点的F值分别为0.91、0.92和0.95。全文语料库的相应分数分别为0.88、0.91和0.92。此外,还对2013 BioNLP-ST GE任务的语料库进行了评估,磷酸化核心任务的F值达到0.87,与之前报道的结果相比有所改善。 MEDLINE中所有摘要的全部处理以及PubMed Central Open Access Subset中的所有文章的全面处理都证明了可伸缩性,可用于挖掘文献中的丰富信息,从而使其可用于生物固化和知识发现。新系统具有通用性,将适用于其他主要的PTM类型。可以在线获得RLIMS-P 2.0在线系统(),可以从iProLINK获得已开发的语料库()。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号