【24h】

Identifying Key Sentences for Precision Oncology Using Semi-Supervised Learning

机译:使用半监督学习识别精确肿瘤学的关键句子

获取原文

摘要

We present a machine learning pipeline that identifies key sentences in abstracts of oncological articles to aid evidence-based medicine. This problem is characterized by the lack of gold standard data-sets, data imbalance and thematic differences between available silver standard corpora. Additionally, available training and target data differs with regard to their domain (professional summaries vs. sentences in abstracts). This makes supervised machine learning inapplicable. We propose the use of two semi-supervised machine learning approaches: To mitigate difficulties arising from heterogeneous data sources, overcome data imbalance and create reliable training data we propose using transductive learning from positive and unlabelled data (PU Learning). For obtaining a realistic classification model, we propose the use of abstracts summarised in relevant sentences as unlabelled examples through Self-Training. The best model achieves 84% accuracy and 0.84 F1 score on our dataset.
机译:我们展示了一条机器学习管道,识别肿瘤文章摘要中的关键句,以帮助循证医学。此问题的特点是缺乏金标准数据集,可用银标准Corpora之间的数据不平衡和主题差异。此外,可用的培训和目标数据对其域的不同之处(摘要中的专业摘要与句子)。这使得监督机器学习不适用。我们建议使用两个半监督机器学习方法:减轻异构数据源产生的困难,克服数据不平衡,并创造了使用从积极和未标记的数据(PU学习)的转换学习的可靠培训数据。为了获得现实的分类模式,我们通过自我训练提出了在相关句子中汇总的摘要使用。最佳模型可实现84%的准确度和我们数据集的0.84 F1分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号