首页> 外文会议>Pacific Symposium on Biocomputing; 20080104-08; Kohala Coast,HI(US) >INTRINSIC EVALUATION OF TEXT MINING TOOLS MAY NOT PREDICT PERFORMANCE ON REALISTIC TASKS
【24h】

INTRINSIC EVALUATION OF TEXT MINING TOOLS MAY NOT PREDICT PERFORMANCE ON REALISTIC TASKS

机译:文本挖掘工具的内部评估可能无法预测现实任务的性能

获取原文
获取原文并翻译 | 示例

摘要

Biomedical text mining and other automated techniques are beginning to achieve performance which suggests that they could be applied to aid database curators. However, few studies have evaluated how these systems might work in practice. In this article we focus on the problem of annotating mutations in Protein Data Bank (PDB) entries, and evaluate the relationship between performance of two automated techniques, a text-mining-based approach (MutationFinder) and an alignment-based approach, in intrinsic versus extrinsic evaluations. We find that high performance on gold standard data (an intrinsic evaluation) does not necessarily translate to high performance for database annotation (an extrinsic evaluation). We show that this is in part a result of lack of access to the full text of journal articles, which appears to be critical for comprehensive database annotation by text mining. Additionally, we evaluate the accuracy and completeness of manually annotated mutation data in the PDB, and find that it is far from perfect. We conclude that currently the most cost-effective and reliable approach for database annotation might incorporate manual and automatic annotation methods.
机译:生物医学文本挖掘和其他自动化技术已开始实现性能,这表明它们可用于辅助数据库管理员。但是,很少有研究评估这些系统在实际中的工作方式。在本文中,我们着重于对蛋白质数据库(PDB)条目中的突变进行注释的问题,并评估了两种自动化技术(基于文本挖掘的方法(MutationFinder)和基于比对的方法)的内在性能之间的关系。与外部评估。我们发现,在黄金标准数据上的高性能(内在评估)并不一定会转化为数据库注释的高性能(外在评估)。我们表明,这部分是由于缺乏对期刊文章全文的访问权限的结果,这对于通过文本挖掘进行全面的数据库注释至关重要。此外,我们评估了PDB中手动注释的突变数据的准确性和完整性,并发现它远非完美。我们得出的结论是,当前用于数据库注释的最具成本效益和最可靠的方法可能包含手动和自动注释方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号