首页> 外文会议>European Conference on Information Retrieval Research >A Machine Learning Approach for the Curation of Biomedical Literature
【24h】

A Machine Learning Approach for the Curation of Biomedical Literature

机译:生物医学文献策择机器学习方法

获取原文

摘要

In the field of the biomedical sciences there exists a vast repository of information located within large quantities of research papers. Very often, researchers need to spend considerable amounts of time reading through entire papers before being able to determine whether or not they should be curated (archived). In this paper, we present an automated text classification system for the classification of biomedical papers. This classification is based on whether there is experimental evidence for the expression of molecular gene products for specified genes within a given paper. The system performs preprocessing and data cleaning, followed by feature extraction from the raw text. It subsequently classifies the paper using the extracted features with a Naive Bayes Classifier. Our approach has made it possible to classify (and curate) biomedical papers automatically, thus potentially saving considerable time and resources. The system proved to be highly accurate, and won honourable mention in the KDD Cup 2002 task 1.
机译:在生物医学科学领域,存在于大量研究论文中的巨大信息存储库。通常,研究人员需要花费大量的时间通过整个文件读取,然后能够确定它们是否应该被策划(存档)。在本文中,我们为生物医学论文分类提供了自动文本分类系统。该分类基于是否存在对特定纸张内特定基因表达分子基因产物的实验证据。系统执行预处理和数据清洁,然后从原始文本中提取功能提取。随后使用具有天真凸起分类器的提取的功能对纸张进行分类。我们的方法使得可以自动分类(和愈合)生物医学论文,从而潜在地节省了相当多的时间和资源。该系统被证明是高度准确的,并在KDD Cup 2002任务1中赢得了荣誉。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号