A comparison of various software tools for dealing with missing data via imputation

Jose Cortinas Abrahantes; Cristina Sotto; Geert Molenberghs; Geert Vromman; Bart Bierinckx

首页> 外文期刊>Journal of statistical computation and simulation >A comparison of various software tools for dealing with missing data via imputation

【24h】

A comparison of various software tools for dealing with missing data via imputation

机译：各种通过插补处理丢失数据的软件工具的比较

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual - an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.

机译：在现实生活中，我们经常会遇到包含缺失观测值的数据集。近年来，针对缺失的统计方法已得到广泛研究。较流行的方法之一是在分析之前估算缺失值，从而使数据完整。插补广泛地涵盖了为推断不完整数据而开发的整个技术范围，从非常简单的策略（例如均值插补）到更高级的方法，例如需要使用马尔可夫链蒙特卡罗方法进行后验分布估算。当缺失模式的数量增加和/或同时涉及到分类随机变量和连续随机变量时，会增加额外的复杂性。如今，能够生成不完整数据的插补的例程，过程或程序包的实现已广泛可用。我们在一个激励性例子的背景下以及在模拟研究中，在两种缺失机制下（随机缺失和非随机缺失）对其中一些进行了回顾。到目前为止，在估算缺失数据后，对现有实现的评估通常集中在感兴趣的指定模型的结果参数估计上。但是，在某些情况下，人们很可能会对个人估算值的质量产生兴趣-这个问题很少受到关注。在本文中，我们将重点放在后者上，以提供有关这方面不同例程，过程和程序包的性能的进一步信息。

著录项

来源
《Journal of statistical computation and simulation》 |2011年第12期|p.1653-1675|共23页
作者
Jose Cortinas Abrahantes; Cristina Sotto; Geert Molenberghs; Geert Vromman; Bart Bierinckx;
展开▼
作者单位

Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium;

Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium,School of Statistics, University of the Philippines, Diliman, Quezon City,Philippines;

Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium,Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Kapucijnenvoer 35, B-3000 Leuven, Belgium;

IM Associates BVBA, Sales and Marketing Effectiveness, Brusselsesteenweg 52, B-3000 leuven, Belgium;

IM Associates BVBA, Sales and Marketing Effectiveness, Brusselsesteenweg 52, B-3000 leuven, Belgium;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
multiple imputation; missing data; missing at random; missing not at random; random forest;

机译：多重插补缺失数据;随机失踪;并非随机失踪;随机森林;

相似文献

外文文献
中文文献
专利

1. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey [J] . Hugo Peyre, Alain Leplège, Joël Coste Quality of Life Research . 2011,第2期

机译：用于处理生活质量问卷中缺失项目的缺失数据方法。通过对法国2003年十年健康调查中SF-36所使用的个人平均得分，最大信息的最大信息可能性，多次归因和热甲板技术的模拟进行比较
2. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey. [J] . Peyre H, Leplege A, Coste J Quality of life research: An international journal of quality of life aspects of treatment, care and rehabilitation . 2011,第2期

机译：用于处理生活质量问卷中缺失项目的缺失数据方法。通过对法国2003十年期健康调查中SF-36所使用的个人平均得分，全部信息的最大可能性，多次归因和热甲板技术的模拟进行比较。
3. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods [J] . Fiona M Shrive, Heather Stuart, Hude Quan, BMC Medical Research Methodology . 2006,第1期

机译：在多问题抑郁量表中处理缺失数据：估算方法的比较
4. Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis [C] . Jia Li, Mengdie Wang, Michael S. Steinbach, IEEE International Conference on Big Knowledge . 2018

机译：不要插补：在EHR数据分析中处理信息性缺失值
5. The impact of missing data treatments in a multiple regression analysis: A Monte Carlo comparison of deterministic imputation, stochastic imputation, multiple imputation, and the deletion procedures [D] . Newsome, Dwight Howard. 1996

机译：多元回归分析中缺失数据处理的影响：确定性归因，随机归因，多重归因和删除程序的蒙特卡洛比较
6. Dealing with missing data in a multi-question depression scale: a comparison of imputation methods [O] . Fiona M Shrive, Heather Stuart, Hude Quan, 2006

机译：在多问题抑郁量表中处理缺失数据：估算方法的比较
7. PCN109 DEALING WITH QUALITY OF LIFE MISSING DATA IN A SINGLE ARM STUDY. COMPARISON OF MULTIPLE IMPUTATION METHODS [O] . Arnault A, Ivanescu C, van Engen A, 2008

机译：在单臂研究中处理PCN109的质量，降低寿命数据的质量。多种插补方法的比较

A comparison of various software tools for dealing with missing data via imputation

摘要

著录项

相似文献

相关主题

期刊订阅