首页> 美国卫生研究院文献>other >Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy

【2h】

Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy

机译：使用来自天文学的新颖贝叶斯概率方法在囊性纤维化注册表中创建纵向数据集并清除现有数据标识符

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Patient registry data are commonly collected as annual snapshots that need to be amalgamated to understand the longitudinal progress of each patient. However, patient identifiers can either change or may not be available for legal reasons when longitudinal data are collated from patients living in different countries. Here, we apply astronomical statistical matching techniques to link individual patient records that can be used where identifiers are absent or to validate uncertain identifiers. We adopt a Bayesian model framework used for probabilistically linking records in astronomy. We adapt this and validate it across blinded, annually collected data. This is a high-quality (Danish) sub-set of data held in the European Cystic Fibrosis Society Patient Registry (ECFSPR). Our initial experiments achieved a precision of 0.990 at a recall value of 0.987. However, detailed investigation of the discrepancies uncovered typing errors in 27 of the identifiers in the original Danish sub-set. After fixing these errors to create a new gold standard our algorithm correctly linked individual records across years achieving a precision of 0.997 at a recall value of 0.987 without recourse to identifiers. Our Bayesian framework provides the probability of whether a pair of records belong to the same patient. Unlike other record linkage approaches, our algorithm can also use physical models, such as body mass index curves, as prior information for record linkage. We have shown our framework can create longitudinal samples where none existed and validate pre-existing patient identifiers. We have demonstrated that in this specific case this automated approach is better than the existing identifiers.

机译：通常将患者注册表数据收集为年度快照，需要合并以了解每个患者的纵向进展情况。但是，当从居住在不同国家的患者整理纵向数据时，出于法律原因，患者标识符可能会更改或可能无法使用。在这里，我们应用天文统计匹配技术来链接可以在没有标识符的情况下使用的单个患者记录，或者用于验证不确定的标识符。我们采用贝叶斯模型框架来概率性地连接天文学中的记录。我们对此进行调整，并通过每年收集的盲数据进行验证。这是欧洲囊性纤维化学会患者注册中心（ECFSPR）中保存的高质量（丹麦）子集。我们的初始实验在0.987的召回值下实现了0.990的精度。但是，对差异的详细调查发现了原始丹麦子集中27个标识符的键入错误。修正这些错误以创建新的金标准后，我们的算法正确地链接了多年的单个记录，从而在不求助于标识符的情况下，以0.987的召回值实现了0.997的精度。我们的贝叶斯框架提供了一对记录是否属于同一患者的概率。与其他记录链接方法不同，我们的算法还可以使用物理模型（例如体重指数曲线）作为记录链接的先验信息。我们已经展示了我们的框架可以创建不存在的纵向样本，并验证先前存在的患者标识符。我们已经证明，在这种特定情况下，这种自动化方法比现有标识符要好。

著录项

期刊名称 other
作者
Peter Donald Hurley; Seb Oliver; Anil Mehta;
展开▼
作者单位

展开▼
年(卷),期 -1(13),7
年度 -1
页码 e0199815
总页数 15
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Growth status in children with cystic fibrosis based on the National Cystic Fibrosis Patient Registry data: evaluation of various criteria used to identify malnutrition. [J] . Lai HC, Kosorok MR, Sondel SA, The Journal of pediatrics . 1998,第3aPta1期

机译：基于国家囊性纤维化患者注册数据的囊性纤维化儿童的生长状况：评估用于鉴定营养不良的各种标准。
2. Characteristics of cystic fibrosis‐related diabetes: Data from two different sources the European cystic fibrosis society patient registry and German/Austrian diabetes prospective follow‐up registry [J] . Prinz Nicole, Zolin Anna, Konrad Katja, Pediatric diabetes. . 2019,第3期

机译：囊性纤维化相关糖尿病的特征：来自两种不同来源的数据欧洲囊性纤维化协会患者登记处和德国/奥糖尿病预期后续登记处
3. Longitudinal relationship among growth, nutritional status, and pulmonary function in children with cystic fibrosis: analysis of the Cystic Fibrosis Foundation National CF Patient Registry. [J] . Zemel BS, Jawad AF, FitzSimmons S, The Journal of pediatrics . 2000,第3期

机译：囊性纤维化儿童的生长，营养状况和肺功能之间的纵向关系：囊性纤维化基金会国家CF患者注册中心的分析。
4. Identifying frequent flows in large datasets through probabilistic bloom filters [C] . Yanjun Yao, Sisi Xiong, Jilong Liao, IEEE International Symposium of Quality of Service . 2015

机译：通过概率布隆过滤器识别大型数据集中的频繁流
5. A multi-center study of cystic fibrosis: Assessment of institutional review and a comparison of longitudinal data from multiplex and simplex families. [D] . McWilliams, Rita. 2005

机译：囊性纤维化的多中心研究：评估机构评价并比较多重和单纯性家族的纵向数据。
6. A Probabilistic Matching Approach to Link De-identified Data from a Trauma Registry and a Traumatic Brain Injury Model System Center [O] . M. Kesinger, RG. Kumar, AC. Ritter, -1

机译：从创伤登记处和颅脑外伤模型系统中心链接去身份数据的概率匹配方法
7. Up-to-date and projected estimates of survival for people with cystic fibrosis using baseline characteristics: A longitudinal study using UK patient registry data [O] . Keogh, R, Szczesniak, R, Taylor-Robinson, DC, 2018

机译：使用基线特征的囊性纤维化患者的最新和预计生存率：使用英国患者注册数据的纵向研究

Creating longitudinal datasets and cleaning existing data identifiers in a cystic fibrosis registry using a novel Bayesian probabilistic approach from astronomy

摘要

著录项

相似文献

相关主题

期刊订阅