首页> 外文期刊>JMIR Medical Informatics >Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches
【24h】

Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches

机译:利用电子病历和生物医学文献通过数据融合和协同过滤方法来支持罕见病的诊断

获取原文
           

摘要

Background In the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature. Objective This study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases. Methods In our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model. Results We evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F -measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively. Conclusions This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.
机译:背景技术在美国,罕见疾病的特征是在特定时期内影响的患者不超过200,000。患有罕见疾病的患者常常被误诊或未被诊断,这可能是由于临床医生对这种罕见疾病的知识或经验不足。随着可电子访问的医疗数据的数量呈指数级增长,有关数千种罕见疾病及其潜在相关诊断信息的大量信息被埋藏在电子病历(EMR)和医学文献中。目的本研究旨在利用异构数据集中包含的信息来辅助罕见疾病的诊断。 EMRs中存在患者的表型信息,生物医学文献可被充分利用以加快疾病的诊断。方法在以前的工作中,我们改进了协作过滤推荐系统的使用,以支持基于仅来自EMR数据的表型的罕见病诊断决策。但是,没有讨论将异类数据与协同过滤一起使用的影响,当面对来自各种资源的大量数据时,这是一个基本问题。在这项研究中,为了进一步研究在异构数据集上协同过滤的性能,我们研究了Mayo Clinic生成的EMR数据以及从语义MEDLINE数据库检索的已发表文章摘要。具体来说,在这项研究中,我们设计了来自异构资源的不同数据融合策略,并将其与协作过滤模型集成在一起。结果我们使用从EMR数据和文献的各种组合以及单独的EMR数据得出的特征评估了所提出系统的性能。我们从Mayo Clinic于2010年至2015年产生的患者队列中提取了近1300万个EMR,并从半结构化语义MEDLINE数据库中检索了所有文章摘要,这些摘要已出版至2016年底。我们应用了协作过滤模型并比较了不同的指标。对数似然比相似度与k近邻在异类数据集上的组合显示出在患者推荐中的最佳性能,精确召回曲线(PRAUC)下的面积为0.475(字符串匹配),0.511(系统化的医学术语[SNOMED]匹配)和0.752(遗传和罕见病信息中心[GARD]匹配)。对数似然比相似性也表现最佳,平均平均精度为0.465(字符串匹配),0.5(SNOMED匹配)和0.749(GARD匹配)。通过使用最佳算法也证明了罕见病预测的性能。字符串,SNOMED和GARD匹配的宏平均F度量分别为0.32、0.42和0.63。结论本研究证明了在协同过滤模型中支持异构疾病诊断的潜在利用异构数据集的潜力。除了基于表型的分析之外,未来,我们还计划通过挖掘基因型信息来进一步解决异质性问题,并减少EMR和文献之间的误解,以建立用于罕见病诊断的综合疾病-表型-基因网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号