首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Empirical Investigations into Full-Text Protein Interaction Article Categorization Task (ACT) in the BioCreative II.5 Challenge
【24h】

Empirical Investigations into Full-Text Protein Interaction Article Categorization Task (ACT) in the BioCreative II.5 Challenge

机译:BioCreative II.5挑战中全文蛋白质相互作用文章分类任务(ACT)的实证研究

获取原文
获取原文并翻译 | 示例
           

摘要

The selection of protein interaction documents is one important application for biology research and has a direct impact on the quality of downstream BioNLP applications, i.e., information extraction and retrieval, summarization, QA, etc. The BioCreative II.5 Challenge Article Categorization task (ACT) involves doing a binary text classification to determine whether a given structured full-text article contains protein interaction information. This may be the first attempt at classification of full-text protein interaction documents in wide community. In this paper, we compare and evaluate the effectiveness of different section types in full-text articles for text classification. Moreover, in practice, the less number of true-positive samples results in unstable performance and unreliable classifier trained on it. Previous research on learning with skewed class distributions has altered the class distribution using oversampling and downsampling. We also investigate the skewed protein interaction classification and analyze the effect of various issues related to the choice of external sources, oversampling training sets, classifiers, etc. We report on the various factors above to show that 1) a full-text biomedical article contains a wealth of scientific information important to users that may not be completely represented by abstracts and/or keywords, which improves the accuracy performance of classification and 2) reinforcing true-positive samples significantly increases the accuracy and stability performance of classification.
机译:蛋白质相互作用文档的选择是生物学研究的重要应用之一,它直接影响下游BioNLP应用程序的质量,即信息提取和检索,摘要,质量保证等。BioCreative II.5挑战文章归类任务(ACT )涉及进行二进制文本分类,以确定给定的结构化全文文章是否包含蛋白质相互作用信息。这可能是在广泛的社区中对全文蛋白质相互作用文档进行分类的首次尝试。在本文中,我们比较和评估全文文章中不同部分类型对文本分类的有效性。此外,在实践中,较少数量的真实阳性样本会导致性能不稳定和对其进行训练的分类器不可靠。先前关于偏态类分布学习的研究已经使用过采样和下采样来改变类分布。我们还研究了偏斜的蛋白质相互作用分类,并分析了与外部来源选择,过采样训练集,分类器等相关的各种问题的影响。我们对上述各种因素进行了报告,以表明1)全文生物医学文章包含对用户而言很重要的大量科学信息,可能无法完全用摘要和/或关键字来表示,从而提高了分类的准确性,并且2)增强正阳性样本可显着提高分类的准确性和稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号