...
首页> 外文期刊>Database >How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience
【24h】

How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience

机译:如何将本体论和蛋白质之间的相互作用与文献联系:文本挖掘方法和BioCreative经验

获取原文
           

摘要

There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein–protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein–protein interaction data and PSI-MI terms referring to interaction detection methods.
机译:对开发本体和受控词汇表以提高手动文献管理的效率和一致性,实现更正式的生物管理工作流程结果并最终改善生物数据分析的兴趣日益浓厚。已成功用于此目的的两种本体是用于注释基因产物方面的基因本体(GO)和用于存档蛋白质间相互作用的数据库所使用的分子相互作用本体(PSI-MI)。事实证明,蛋白质相互作用的检查对于理解细胞过程极为有前途。从生物医学文献到生物本体学术语的信息手工映射是策展流程中最具挑战性的组成部分之一。它要求专业的策展人解释文章中包含的自然语言描述,并在本体(受控词汇)中推断其语义对等物。由于手动管理是一个耗时的过程,因此强烈有动机实施文本挖掘技术以从自由文本中自动提取注释。已经设计了一系列文本挖掘策略来帮助自动提取生物数据。这些策略要么识别文献中经常使用的技术术语,然后将其提议为包含在本体中的候选者,要么检索用作注释本体论术语的证据支持的段落,例如。来自PSI-MI或GO控制的词汇表。在此,我们将概述当前的文本挖掘方法,以在BioCreative(生物学信息提取系统的关键评估)挑战的背景下自动提取GO和PSI-MI本体术语的注释。特别强调涉及相互作用检测方法的蛋白质间相互作用数据和PSI-MI术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号