An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

Shi Xi; Prins Charlotte; Van Pottelbergh Gijs; Mamouris Pavlos; Vaes Bert; De Moor Bart

首页> 外文期刊>BMC Medical Informatics and Decision Making >An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

【24h】

An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

机译：通过纳入临床知识来实现电子健康记录的自动化数据清洁方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The use of Electronic Health Records (EHR) data in clinical research is incredibly increasing, but the abundancy of data resources raises the challenge of data cleaning. It can save time if the data cleaning can be done automatically. In addition, the automated data cleaning tools for data in other domains often process all variables uniformly, meaning that they cannot serve well for clinical data, as there is variable-specific information that needs to be considered. This paper proposes an automated data cleaning method for EHR data with clinical knowledge taken into consideration. We used EHR data collected from primary care in Flanders, Belgium during 1994–2015. We constructed a Clinical Knowledge Database to store all the variable-specific information that is necessary for data cleaning. We applied Fuzzy search to automatically detect and replace the wrongly spelled units, and performed the unit conversion following the variable-specific conversion formula. Then the numeric values were corrected and outliers were detected considering the clinical knowledge. In total, 52 clinical variables were cleaned, and the percentage of missing values (completeness) and percentage of values within the normal range (correctness) before and after the cleaning process were compared. All variables were 100% complete before data cleaning. 42 variables had a drop of less than 1% in the percentage of missing values and 9 variables declined by 1–10%. Only 1 variable experienced large decline in completeness (13.36%). All variables had more than 50% values within the normal range after cleaning, of which 43 variables had a percentage higher than 70%. We propose a general method for clinical variables, which achieves high automation and is capable to deal with large-scale data. This method largely improved the efficiency to clean the data and removed the technical barriers for non-technical people.

机译：在临床研究中使用电子健康记录（EHR）数据是令人难以置信的，但数据资源的丰富提出了数据清洁的挑战。如果可以自动完成数据清洁，可以节省时间。此外，其他域中的数据的自动数据清洁工具通常均匀地处理所有变量，这意味着它们不能用于临床数据，因为存在需要考虑的可变特定信息。本文提出了一种具有临床知识的EHR数据的自动数据清洁方法，考虑了临床知识。我们在1994 - 2015年使用了比利时法兰德斯初级保健中收集的EHR数据。我们构建了一个临床知识库，用于存储数据清洁所需的所有可变特定信息。我们应用模糊搜索以自动检测并替换错误拼写的单位，并在可变特定的转换公式之后执行单元转换。然后纠正数值，并且考虑临床知识检测到异常值。总共清洁了52个临床变量，比较了比较清洁过程之前和之后的正常范围内（正确性）内缺失值（完整性）和值百分比的百分比。在数据清洁之前，所有变量均为100％。 42个变量在缺失值的百分比下降小于1％，9个变量下降1-10％。只有1种变量经历了完整性的大幅下降（13.36％）。清洁后，所有变量在正常范围内具有超过50％的值，其中43个变量的百分比高于70％。我们提出了一种临床变量的一般方法，实现了高自动化，能够处理大规模数据。该方法在很大程度上提高了清洁数据的效率，并取出了非技术人员的技术障碍。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2021年第1期|共10页
作者
Shi Xi; Prins Charlotte; Van Pottelbergh Gijs; Mamouris Pavlos; Vaes Bert; De Moor Bart;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词
Data cleaningAutomated methodClinical decision support;

机译：数据清除方法临床决策支持;

相似文献

外文文献
中文文献
专利

1. Evidence appraisal of Hu Z, Simon GJ, Arsoniadis EG, Wang Y, Kwaan MR, Melton GB. Automated detection of postoperative surgical site infections using supervised methods with electronic health record data. Stud Health Technol Inform. 2015;216:706‐710 [J] . AORN journal . 2018,第6期

机译：Hu Z，Simon GJ，Arsoniadis的证据评估，例如，王Y，Kwaan Mr，Melton GB。使用电子健康记录数据的监督方法自动检测术后手术部位感染。螺柱健康技术信息 2015; 216：706-710
2. Validation of automated sepsis surveillance based on the Sepsis-3 clinical criteria against physician record review in a general hospital population: observational study using electronic health records data [J] . Valik John Karlsson, Ward Logan, Tanushi Hideyuki, BMJ quality & safety . 2020,第9期

机译：基于SEPSIS-3对综合医生临床标准的自动化败血症监测验证在一般医院人口中的医生记录审查：使用电子健康记录数据的观察研究
3. Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data [J] . Elizabeth Ford, Philip Rooney, Peter Hurley, Frontiers in Public Health . 2020,第a期

机译：是否可以使用贝叶斯分析方法对电子健康的不完整性进行纠正诊断数据？利用模拟和现实生活临床数据的开发新方法
4. Improving Clinical Practice and Education Using the Electronic Medical Record -Can We Incorporate New Medical Research into Clinical Practice Using the Clinical Data Repository [C] . Natalie DUBOVOY, Julio LAM-SALAZAR, Amy KULE, World multi-conference on systemics, cybernetics and informatics;WMSCI 2010 . 2012

机译：利用电子病历改善临床实践和教育-我们是否可以使用临床数据存储库将新的医学研究纳入临床实践
5. A Clinical Decision Support Model for Incorporating Pharmacogenomics Knowledge Into Electronic Health Records for Drug Therapy Individualization: A Microcosm of Personalized Medicine. [D] . Overby, Casey Lynnette. 2011

机译：将药物基因组学知识整合到电子健康记录中以进行药物治疗个性化的临床决策支持模型：个性化医学的一个缩影。
6. Automated data cleaning of paediatric anthropometric data from longitudinal electronic health records: protocol and application to a large patient cohort [O] . Hang T. T. Phan, Florina Borca, David Cable, -1

机译：自动从纵向电子健康记录中清除儿科人体测量学数据：方案和在大型患者队列中的应用
7. Using nationwide ‘big data’ from linked electronic health records to help improve outcomes in cardiovascular diseases:33 studies using methods from epidemiology, informatics, economics and social science in the ClinicAl disease research using LInked Bespoke studies and Electronic health Records (CALIBER) programme [O] . Hemingway Harry, Feder Gene, Fitzpatrick Natalie, 2017

机译：利用链接的电子健康记录在全国范围内的“大数据”来帮助改善心血管疾病的预后：33项研究使用流行病学，信息学，经济学和社会科学方法进行临床疾病研究，并使用LInked定制研究和电子健康记录（CALIBER）计划

An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

摘要

著录项

相似文献

相关主题

期刊订阅