Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach

Erica Tavazzi; Sebastian Daberdaku; Rosario Vasta; Andrea Calvo; Adriano Chiò; Barbara Di Camillo

首页> 外文期刊>BMC Medical Informatics and Decision Making >Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach

【24h】

Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach

机译：利用自适应k-最近邻居方法利用静态和动态混合型临床数据的载体的互信

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clinical registers constitute an invaluable resource in the medical data-driven decision making context. Accurate machine learning and data mining approaches on these data can lead to faster diagnosis, definition of tailored interventions, and improved outcome prediction. A typical issue when implementing such approaches is the almost unavoidable presence of missing values in the collected data. In this work, we propose an imputation algorithm based on a mutual information-weighted k-nearest neighbours approach, able to handle the simultaneous presence of missing information in different types of variables. We developed and validated the method on a clinical register, constituted by the information collected over subsequent screening visits of a cohort of patients affected by amyotrophic lateral sclerosis. For each subject with missing data to be imputed, we create a feature vector constituted by the information collected over his/her first three months of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An ad hoc similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features captured by the mutual information statistic. We validated the proposed imputation method on an independent test set, comparing its performance with those of three state-of-the-art competitors, resulting in better performance. We further assessed the validity of our algorithm by comparing the performance of a survival classifier built on the data imputed with our method versus the one built on the data imputed with the best-performing competitor. Imputation of missing data is a crucial –and often mandatory– step when working with real-world datasets. The algorithm proposed in this work could effectively impute an amyotrophic lateral sclerosis clinical dataset, by handling the temporal and the mixed-type nature of the data and by exploiting the cross-information among features. We also showed how the imputation quality can affect a machine learning task.

机译：临床寄存器构成医疗数据驱动决策中的宝贵资源。这些数据的准确机器学习和数据挖掘方法可以导致更快的诊断，定制干预措施的定义和改善的结果预测。实现此类方法时的典型问题是收集数据中缺失值的几乎不可避免地存在。在这项工作中，我们提出了一种基于相互信息加权的K-Collecti邻邻居方法的估算算法，能够处理不同类型的变量中缺失信息的同时存在。我们在临床登记册上开发并验证了该方法，由收集的信息，随后筛选受肌萎缩侧面硬化的患者群组的综合筛查。对于具有缺失数据的每个主题，我们创建一个由在他/她前三个月内收集的信息构成的特征向量。该载体用作K-CORMALY邻居程序中的样品，以便在其他患者中选择具有最相似的疾病的时间逐渐发展的患者。为样本比较实现了Ad Hoc相似度指标，其能够处理数据的不同性质，存在多个缺失值的存在，并且包括由互信息统计捕获的特征之间的交叉信息。我们在独立的测试集中验证了拟议的撤销方法，将其与三个最先进的竞争对手的性能进行比较，从而提高性能。我们进一步通过比较了对由我们的方法所归发的数据的生存分类器的性能进行了评估了我们的算法的有效性，而不是基于最佳竞争对手的数据。缺失数据的归责是一个重要的 - 使用现实世界数据集时经常是强制性的。本作作品中提出的算法可以通过处理数据的时间和混合类型性质以及利用特征之间的交叉信息来有效地赋予肌营养的横向硬化临床数据集。我们还显示了估算质量如何影响机器学习任务。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2020年第5期|共23页
作者
Erica Tavazzi; Sebastian Daberdaku; Rosario Vasta; Andrea Calvo; Adriano Chiò; Barbara Di Camillo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
ImputationMissing dataK-nearest neighboursMutual informationNa?ve BayesClinical datasetsAmyotrophic lateral sclerosis;

机译：ImpitionMissing Datak-Recestbourmutual Informationna？ve Bayesclinical DataSetsamoytrophic Lastal硬化;

相似文献

外文文献
中文文献
专利

1. How distance metrics influence missing data imputation with k-nearest neighbours [J] . Miriam Seoane Santos, Pedro Henriques Abreu, Szymon Wilk, Pattern recognition letters . 2020,第Auga期

机译：距离指标如何影响k-intele邻居缺少数据估算
2. Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns [J] . Silva-Ramireza Esther-Lydia, Pino-Mejias Rafael, Lopez-Coello Manuel Applied Soft Computing . 2015,第Null期

机译：带有多层感知器的单插补和结合多层感知器和k近邻的多重插补的单调模式
3. Missing data imputation by K nearest neighbours based on grey relational structure and mutual information [J] . Pan Ruilin, Yang Tingsheng, Cao Jianhua, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2015,第3期

机译：基于灰色关联结构和互信息的K个最近邻缺失数据归因
4. A Preliminary Approach for the Exploitation of Citizen Science Data for Fast and Robust Fuzzy k-Nearest Neighbour Classification [C] . Manuel Jiménez, Mercedes Torres Torres, Robert John, IEEE International Conference on Fuzzy Systems . 2019

机译：快速和稳健的模糊k最近邻分类的公民科学数据开发的初步方法
5. A categorical data clustering approach with expectation maximization and K-nearest neighbour. [D] . Liu, Yu. 2003

机译：一种具有期望最大化和K近邻的分类数据聚类方法。
6. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach [O] . Erica Tavazzi, Sebastian Daberdaku, Rosario Vasta, 2020

机译：利用自适应k-最近邻居方法利用静态和动态混合型临床数据的载体的互信
7. A Preliminary Approach for the Exploitation of Citizen Science Data for Fast and Robust Fuzzy k-Nearest Neighbour Classification [O] . Manuel Jimenez, Mercedes Torres Torres, Robert John, 2019

机译：快速鲁棒模糊k最近邻分类公民科学数据初步方法

Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach

摘要

著录项

相似文献

相关主题

期刊订阅