首页> 外文会议>Advances in Information Retrieval >A Machine Learning Approach for the Curation of Biomedical Literature
【24h】

A Machine Learning Approach for the Curation of Biomedical Literature

机译:一种用于生物医学文献管理的机器学习方法

获取原文

摘要

In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs preprocessing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naive Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.
机译:在生物医学领域,存在大量研究论文中的大量信息库。很多时候,研究人员需要花费大量时间阅读整篇论文,然后才能确定是否应该对其进行整理(存档)。在本文中,我们提出了一种用于生物医学论文分类的自动文本分类系统。该分类基于给定论文中是否有针对特定基因的分子基因产物表达的实验证据。该系统执行预处理和数据清理,然后从原始文本中提取特征。随后,它通过Naive Bayes分类器使用提取的特征对论文进行分类。我们的方法使自动分类(和管理)生物医学论文成为可能,从而潜在地节省了大量时间和资源。该系统被证明是高度准确的,并在2002年KDD Cup任务1中获得了荣誉奖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号