首页> 外文期刊>JMIR Medical Informatics >Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study
【24h】

Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study

机译:临床研究中的现实数据和证据的异常检测算法:实施,评估和验证研究

获取原文
           

摘要

Background Statistical analysis, which has become an integral part of evidence-based medicine, relies heavily on data quality that is of critical importance in modern clinical research. Input data are not only at risk of being falsified or fabricated, but also at risk of being mishandled by investigators. Objective The urgent need to assure the highest data quality possible has led to the implementation of various auditing strategies designed to monitor clinical trials and detect errors of different origin that frequently occur in the field. The objective of this study was to describe a machine learning–based algorithm to detect anomalous patterns in data created as a consequence of carelessness, systematic error, or intentionally by entering fabricated values. Methods A particular electronic data capture (EDC) system, which is used for data management in clinical registries, is presented including its architecture and data structure. This EDC system features an algorithm based on machine learning designed to detect anomalous patterns in quantitative data. The detection algorithm combines clustering with a series of 7 distance metrics that serve to determine the strength of an anomaly. For the detection process, the thresholds and combinations of the metrics were used and the detection performance was evaluated and validated in the experiments involving simulated anomalous data and real-world data. Results Five different clinical registries related to neuroscience were presented—all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric–based classifiers. Conclusions The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85%.
机译:背景技术已成为循证医学组成部分的统计分析依赖于现代临床研究中至关重要的数据质量。输入数据不仅有伪造或制作的风险,而且面临受调查人员误解的风险。目的迫切需要确保最高数据质量可能导致实施各种审计策略,旨在监测临床试验,并检测经常发生在该领域的不同起源的误差。本研究的目的是描述一种基于机器学习的算法,以通过进入制造值来检测由粗心,系统误差或故意的数据创建的数据中的异常模式。方法提出了一种特定的电子数据捕获(EDC)系统,用于临床注册表中的数据管理,包括其架构和数据结构。该EDC系统采用基于机器学习的算法,旨在检测定量数据中的异常模式。检测算法将聚类与一系列7距离指标相结合,用于确定异常的强度。对于检测过程,使用度量的阈值和组合,并在涉及模拟异常数据和现实世界数据的实验中进行评估和验证检测性能。结果介绍了与神经科学相关的五种不同的临床登记,其中所有临床注册表都在给定的EDC系统中运行。为评估实验选择了两个注册表,并提供了在独立数据集上验证检测性能的服务。距离指标的最佳表现组合是堪培拉,曼哈顿和马哈拉诺比斯,而余弦和Chebyshev指标被排除在进一步的分析之外,因为用作基于单距离度量的分类器时的性能最低。结论实验结果表明,该算法本质上是普遍的,因此可以在其他EDC系统中实现,并且能够具有超过85%的敏感性的异常数据检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号