首页> 外文期刊>BMC Medical Informatics and Decision Making >Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
【24h】

Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning

机译:基于集合学习和度量学习的临床试验资格标准自动分类

获取原文
           

摘要

Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.
机译:资格标准是筛选临床试验的目标参与者的主要策略。通过使用机器学习方法自动分类临床试验资格标准文本提高招聘效率以降低临床研究成本。然而,由于资格标准文本数据的复杂性和不平衡,现有方法遭受了差的分类性能。提出了一种基于集学习的公制学习模型,用于资格标准分类。该模型集成了一组预先训练的模型,包括来自变压器(BERT)的双向编码器表示,强大优化的BERT预先预订方法(ROBERTA),XLNET,预训练文本编码器作为鉴别器而不是发电机(电磁),并通过增强的表示知识集成(厄尼)。焦点损失用作解决数据不平衡问题的损失函数。公制学习用于训练每个基础模型的嵌入特征区分。软投票适用于实现集合模型的最终分类。 DataSet是来自第5个中国健康信息处理会议的标准评估任务3,其中44个类别中包含38,341个资格标准案文。我们的集合方法的精度为0.8497,精度为0.8229,并在数据集上召回0.8216。宏F1分数为0.8169,优于最先进的基线方法平均提高0.84%。此外,性能改善的P值为2.152E-07,标准T检验,表明我们的模型实现了显着的改进。提出了一种基于多模型集合学习和度量学习的临床试验资格标准文本的模型。实验表明,我们的集合模型显着提高了分类性能。此外,公制学习能够改善嵌入式表示,焦损减少数据不平衡对模型性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号