首页> 外文期刊>JMIR Medical Informatics >Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study
【24h】

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study

机译:开发临床大数据研究的分类实验室测试标准化算法:回顾性研究

获取原文
           

摘要

Background Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods. Objective We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology. Methods We developed a method called standardization algorithm for laboratory test–categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned. Results The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests). Conclusions The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts.
机译:背景技术数据标准化对于临床实践和回顾性研究的电子健康记录(EHRS)至关重要。但是,由于非识别的重复项,印刷错误或不一致,仍然不容易标准化EHR数据。为了克服这一缺点,已经开展了标准化工作以标准化的格式收集数据,以及策划在EHR中的存储数据。为了进行临床大数据研究,鉴于他们的重要性,EHR中的存储数据应该是标准化的,从实验室结果开始。然而,以前的大部分努力一直基于劳动密集型的手工方法。目的我们旨在开发一种自动标准化方法,用于消除使用标准术语来消除清除数据的分类实验室数据,分组和映射的噪声。方法我们开发了一种称为实验室测试分类结果(Salt-C)的标准化算法的方法,可以处理分类实验室数据,例如POS +,250 4+(尿液分析结果)和Reddish(尿液分析颜色结果)。盐-c由五个步骤组成。首先,它将数据清洁规则应用于分类实验室数据。其次,它将清洁的数据分类为5个预定义组(尿红色,尿液,血液类型,存在调查和发病机构)。第三,每组中的所有数据都被向量化。第四,在数据的vecors和预定义值集中的每个值之间计算相似性。最后,分配了最接近数据的值。结果使用59,213,696个数据点(167,938个独特价值)验证了Salt-C的性能,从第三级医院产生了超过23年。除了正确的数据不能正确解释其原始含义(例如,**和_ ^),盐-c映射了唯一的原始数据,对每个组的正确参考值,精度为97.6%(123/126;尿颜色测试),97.5%(198/203;(尿鼠试验),95%(53/56;血型试验),99.68%(162,291 / 162,805;存在调查试验),99.61%(4643/4661;发病机制试验)。结论所提出的盐-c成功地标准化了具有高可靠性的分类实验室测试结果。通过减少费力的手工标准化努力,盐-c可以有利于临床大数据研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号