首页> 美国卫生研究院文献>other >Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy
【2h】

Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy

机译:使用来自天文学的新颖贝叶斯概率方法在囊性纤维化注册表中创建纵向数据集并清除现有数据标识符

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Patient registry data are commonly collected as annual snapshots that need to be amalgamated to understand the longitudinal progress of each patient. However, patient identifiers can either change or may not be available for legal reasons when longitudinal data are collated from patients living in different countries. Here, we apply astronomical statistical matching techniques to link individual patient records that can be used where identifiers are absent or to validate uncertain identifiers. We adopt a Bayesian model framework used for probabilistically linking records in astronomy. We adapt this and validate it across blinded, annually collected data. This is a high-quality (Danish) sub-set of data held in the European Cystic Fibrosis Society Patient Registry (ECFSPR). Our initial experiments achieved a precision of 0.990 at a recall value of 0.987. However, detailed investigation of the discrepancies uncovered typing errors in 27 of the identifiers in the original Danish sub-set. After fixing these errors to create a new gold standard our algorithm correctly linked individual records across years achieving a precision of 0.997 at a recall value of 0.987 without recourse to identifiers. Our Bayesian framework provides the probability of whether a pair of records belong to the same patient. Unlike other record linkage approaches, our algorithm can also use physical models, such as body mass index curves, as prior information for record linkage. We have shown our framework can create longitudinal samples where none existed and validate pre-existing patient identifiers. We have demonstrated that in this specific case this automated approach is better than the existing identifiers.
机译:通常将患者注册表数据收集为年度快照,需要合并以了解每个患者的纵向进展情况。但是,当从居住在不同国家的患者整理纵向数据时,出于法律原因,患者标识符可能会更改或可能无法使用。在这里,我们应用天文统计匹配技术来链接可以在没有标识符的情况下使用的单个患者记录,或者用于验证不确定的标识符。我们采用贝叶斯模型框架来概率性地连接天文学中的记录。我们对此进行调整,并通过每年收集的盲数据进行验证。这是欧洲囊性纤维化学会患者注册中心(ECFSPR)中保存的高质量(丹麦)子集。我们的初始实验在0.987的召回值下实现了0.990的精度。但是,对差异的详细调查发现了原始丹麦子集中27个标识符的键入错误。修正这些错误以创建新的金标准后,我们​​的算法正确地链接了多年的单个记录,从而在不求助于标识符的情况下,以0.987的召回值实现了0.997的精度。我们的贝叶斯框架提供了一对记录是否属于同一患者的概率。与其他记录链接方法不同,我们的算法还可以使用物理模型(例如体重指数曲线)作为记录链接的先验信息。我们已经展示了我们的框架可以创建不存在的纵向样本,并验证先前存在的患者标识符。我们已经证明,在这种特定情况下,这种自动化方法比现有标识符要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号