首页> 外文期刊>Bioinformatics >Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion
【24h】

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

机译:将全文生物医学文章中的句子自动分类为简介,方法,结果和讨论

获取原文
获取原文并翻译 | 示例
       

摘要

Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naive Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.
机译:生物医学文本通常可以用四个修辞学类别表示:简介,方法,结果和讨论(IMRAD)。将句子分类为这些类别可以使许多其他文本挖掘任务受益。尽管许多研究采用了不同的方法来将MEDLINE摘要中的句子自动分类为IMRAD类别,但是很少有人探索全文生物医学文章中出现的句子分类。我们首先评估全文生物医学文章中的句子是否可以可靠地注释为IMRAD格式,然后探索了将这些句子自动分类为IMRAD类别的不同方法。我们的结果显示总体注释一致性为82.14%,Kappa分数为0.756。最好的分类系统是经过人工注释的数据训练的多项式朴素贝叶斯分类器,其准确度达到91.95%,平均F评分为91.55%,这明显高于基线系统。该系统的Web版本可从以下网址在线获得:http://wood.ims.uwm.edu/full_text_classifier/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号