Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

Wang Ni; Huang Yanqun; Liu Honglei; Zhang Zhiqiang; Wei Lan; Fei Xiaolu; Chen Hui

首页> 外文期刊>BMC Medical Informatics and Decision Making >Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

【24h】

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

机译：异构电子病历中半监督学习患者相似性研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A new learning-based patient similarity measurement was proposed to measure patients’ similarity for heterogeneous electronic medical records (EMRs) data. We first calculated feature-level similarities according to the features’ attributes. A domain expert provided patient similarity scores of 30 randomly selected patients. These similarity scores and feature-level similarities for 30 patients comprised the labeled sample set, which was used for the semi-supervised learning algorithm to learn the patient-level similarities for all patients. Then we used the k-nearest neighbor (kNN) classifier to predict four liver conditions. The predictive performances were compared in four different situations. We also compared the performances between personalized kNN models and other machine learning models. We assessed the predictive performances by the area under the receiver operating characteristic curve (AUC), F1-score, and cross-entropy (CE) loss. As the size of the random training samples increased, the kNN models using the learned patient similarity to select near neighbors consistently outperformed those using the Euclidean distance to select near neighbors (all P values??0.001). The kNN models using the learned patient similarity to identify the top k nearest neighbors from the random training samples also had a higher best-performance (AUC: 0.95 vs. 0.89, F1-score: 0.84 vs. 0.67, and CE loss: 1.22 vs. 1.82) than those using the Euclidean distance. As the size of the similar training samples increased, which composed the most similar samples determined by the learned patient similarity, the performance of kNN models using the simple Euclidean distance to select the near neighbors degraded gradually. When exchanging the role of the Euclidean distance, and the learned patient similarity in selecting the near neighbors and similar training samples, the performance of the kNN models gradually increased. These two kinds of kNN models had the same best-performance of AUC 0.95, F1-score 0.84, and CE loss 1.22. Among the four reference models, the highest AUC and F1-score were 0.94 and 0.80, separately, which were both lower than those for the simple and similarity-based kNN models. This learning-based method opened an opportunity for similarity measurement based on heterogeneous EMR data and supported the secondary use of EMR data.

机译：提出了一种新的基于学习的患者相似性测量来衡量异构电子医疗记录（EMRS）数据的患者的相似性。我们首先根据功能的属性计算特征级相似度。域名专家提供了30名随机选择的患者的患者相似性评分。这些相似性分数和30名患者的特征级别相似之处包括标记的样本集，用于半监督的学习算法来学习所有患者的患者水平相似之处。然后我们使用K-Collect邻居（KNN）分类器来预测四个肝脏条件。在四种不同的情况下比较预测性表演。我们还将个性化KNN模型与其他机器学习模型之间的性能进行了比较。我们评估了接收器操作特征曲线（AUC），F1分数和交叉熵（CE）损耗的区域的预测性能。随着随机训练样本的尺寸增加，KNN模型使用所学的患者相似性来选择近邻的选择始终优于使用欧几里德距离选择近邻的那些（所有P值？<0.001）。来自随机训练样本的学习患者相似性的KNN模型也具有较高的最佳性能（AUC：0.95对0.89，F1分数：0.84与0.67，CE损失：1.22 VS 。1.82）比使用欧几里德距离的人。随着类似训练样本的大小增加，这组成了由学习患者相似性确定的最相似的样本，因此使用简单的欧几里德距离选择KNN模型的性能，以逐渐降低近邻。在交换欧几里德距离的作用时，以及在选择近邻和类似的训练样本方面学习的患者相似性时，KNN模型的性能逐渐增加。这两种KNN模型具有相同的AUC 0.95，F1分数0.84和CE损失1.22。在四个参考模型中，最高的AUC和F1分数分别为0.94和0.80，它们均低于基于简单和相似性的KNN模型。基于学习的方法为基于异构EMR数据开辟了相似性测量的机会，并支持EMR数据的二次使用。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2021年第2期|共13页
作者
Wang Ni; Huang Yanqun; Liu Honglei; Zhang Zhiqiang; Wei Lan; Fei Xiaolu; Chen Hui;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词
Patient similarityElectronic medical recordsSemi-supervised learningk -nearest neighborsLiver diseases;

机译：患者相似性电子医疗记录中监督学习 - 最早的邻居疾病;

相似文献

外文文献
中文文献
专利

1. Semi-supervised estimation of covariance with application to phenome-wide association studies with electronic medical records data [J] . Chan Stephanie F., Hejblum Boris P., Chakrabortty Abhishek, Statistical methods in medical research . 2020,第2期

机译：具有应用于电子医疗记录数据的苯群协会研究的半监督协方差估算
2. Assessment of the impact on time to complete medical record using an electronic medical record versus a paper record on emergency department patients: A study [J] . PerryJ.J., SutherlandJ., SymingtonC., Emergency medicine journal: EMJ . 2014,第12期

机译：使用电子病历与纸质病历评估急诊科患者完成病历对时间的影响：一项研究
3. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records [J] . Ni Wang, Yanqun Huang, Honglei Liu, BioMedical Engineering OnLine . 2019,第1期

机译：基于电子病历的个性化预测模型中患者相似度的测量和应用
4. Study on Patient Similarity Measurement Based on Electronic Medical Records [C] . Yanqun Huang, Ni Wang, Honglei Liu, MEDINFO . 2019

机译：基于电子医疗记录的患者相似性测量研究
5. Understanding, evaluating and enhancing electronic medical record adoption in a primary caresetting: A programme to improve electronic medical record data quality and its effect on familypractice provision of incentivized and enhanced care for chronic disease patients [D] . Bowen, Michael. 2013

机译：了解，评估和增强在初级护理环境中采用电子病历的方案：一项旨在提高电子病历数据质量及其对家庭实践的激励措施的计划，该方案为慢性病患者提供激励和加强护理
6. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records [O] . Ni Wang, Yanqun Huang, Honglei Liu, 2019

机译：基于电子病历的个性化预测模型中患者相似度的测量和应用
7. Semi-Supervised Patient Similarity Clustering Algorithm Based on Electronic Medical Records [O] . Jiao Zhang, Dan Chang 2019

机译：基于电子医疗记录的半监督患者相似聚类算法
8. Identifying Patients for Clinical Studies from Electronic Health Records: TREC 2012 Medical Records Track at OHSU. [R] . Bedrick, S., Edinger, T., Cohen, A., 2012

机译：从电子健康记录中识别临床研究患者：TREC 2012医学记录在OHsU进行跟踪。

Study on the semi-supervised learning-based patient similarity from heterogeneous electronic medical records

摘要

著录项

相似文献

相关主题

期刊订阅