首页> 外文期刊>BMC Medical Informatics and Decision Making >An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge
【24h】

An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

机译:通过纳入临床知识来实现电子健康记录的自动化数据清洁方法

获取原文
       

摘要

The use of Electronic Health Records (EHR) data in clinical research is incredibly increasing, but the abundancy of data resources raises the challenge of data cleaning. It can save time if the data cleaning can be done automatically. In addition, the automated data cleaning tools for data in other domains often process all variables uniformly, meaning that they cannot serve well for clinical data, as there is variable-specific information that needs to be considered. This paper proposes an automated data cleaning method for EHR data with clinical knowledge taken into consideration. We used EHR data collected from primary care in Flanders, Belgium during 1994–2015. We constructed a Clinical Knowledge Database to store all the variable-specific information that is necessary for data cleaning. We applied Fuzzy search to automatically detect and replace the wrongly spelled units, and performed the unit conversion following the variable-specific conversion formula. Then the numeric values were corrected and outliers were detected considering the clinical knowledge. In total, 52 clinical variables were cleaned, and the percentage of missing values (completeness) and percentage of values within the normal range (correctness) before and after the cleaning process were compared. All variables were 100% complete before data cleaning. 42 variables had a drop of less than 1% in the percentage of missing values and 9 variables declined by 1–10%. Only 1 variable experienced large decline in completeness (13.36%). All variables had more than 50% values within the normal range after cleaning, of which 43 variables had a percentage higher than 70%. We propose a general method for clinical variables, which achieves high automation and is capable to deal with large-scale data. This method largely improved the efficiency to clean the data and removed the technical barriers for non-technical people.
机译:在临床研究中使用电子健康记录(EHR)数据是令人难以置信的,但数据资源的丰富提出了数据清洁的挑战。如果可以自动完成数据清洁,可以节省时间。此外,其他域中的数据的自动数据清洁工具通常均匀地处理所有变量,这意味着它们不能用于临床数据,因为存在需要考虑的可变特定信息。本文提出了一种具有临床知识的EHR数据的自动数据清洁方法,考虑了临床知识。我们在1994 - 2015年使用了比利时法兰德斯初级保健中收集的EHR数据。我们构建了一个临床知识库,用于存储数据清洁所需的所有可变特定信息。我们应用模糊搜索以自动检测并替换错误拼写的单位,并在可变特定的转换公式之后执行单元转换。然后纠正数值,并且考虑临床知识检测到异常值。总共清洁了52个临床变量,比较了比较清洁过程之前和之后的正常范围内(正确性)内缺失值(完整性)和值百分比的百分比。在数据清洁之前,所有变量均为100%。 42个变量在缺失值的百分比下降小于1%,9个变量下降1-10%。只有1种变量经历了完整性的大幅下降(13.36%)。清洁后,所有变量在正常范围内具有超过50%的值,其中43个变量的百分比高于70%。我们提出了一种临床变量的一般方法,实现了高自动化,能够处理大规模数据。该方法在很大程度上提高了清洁数据的效率,并取出了非技术人员的技术障碍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号