...
首页> 外文期刊>Comparative and functional genomics >Protein name tagging guidelines: lessons learned
【24h】

Protein name tagging guidelines: lessons learned

机译:蛋白质名称标签指南:经验教训

获取原文
获取原文并翻译 | 示例
           

摘要

Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses the issue of a lack of standard definition of the problem of protein name tagging. We describe the lessons learned in developing a set of guidelines and present the first set of inter-coder results, viewed as an upper bound on system performance. Problems coders face include: (a) the ambiguity of names that can refer to either genes or proteins; (b) the difficulty of getting the exact extents of long protein names; and (c) the complexity of the guidelines. These problems have been addressed in two ways: (a) defining the tagging targets as protein named entities used in the literature to describe proteins or protein-associated or -related objects, such as domains, pathways, expression or genes, and (b) using two types of tags, protein tags and long-form tags, with the latter being used to optionally extend the boundaries of the protein tag when the name boundary is difficult to determine. Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure. The guidelines and annotated datasets, along with automatic tools, are available for research use.
机译:从生物医学文献中提取信息的兴趣是由于需要加快创建代表有关特定对象(如蛋白质和基因)的最新科学知识的结构化数据库的需要。本文解决了蛋白质名称标签问题缺乏标准定义的问题。我们描述了在制定一套指导方针时吸取的教训,并介绍了第一组编码器间结果,这些结果被视为系统性能的上限。编码人员面临的问题包括:(a)可能涉及基因或蛋白质的名称不明确; (b)很难获得长蛋白质名称的确切范围; (c)准则的复杂性。这些问题已通过两种方式解决:(a)将标记目标定义为文献中用于描述蛋白质或蛋白质相关或相关对象(例如结构域,途径,表达或基因)的蛋白质命名实体,以及(b)使用两种类型的标签,蛋白质标签和长格式标签,当名称边界难以确定时,后者可用于有选择地扩展蛋白质标签的边界。 300份MEDLINE摘要上蛋白质标签上三个注释器之间的编码器间一致性为0.868 F-measure。准则和带注释的数据集以及自动工具可供研究使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号