...
首页> 外文期刊>Journal of statistical computation and simulation >A comparison of various software tools for dealing with missing data via imputation
【24h】

A comparison of various software tools for dealing with missing data via imputation

机译:各种通过插补处理丢失数据的软件工具的比较

获取原文
获取原文并翻译 | 示例
           

摘要

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual - an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect.
机译:在现实生活中,我们经常会遇到包含缺失观测值的数据集。近年来,针对缺失的统计方法已得到广泛研究。较流行的方法之一是在分析之前估算缺失值,从而使数据完整。插补广泛地涵盖了为推断不完整数据而开发的整个技术范围,从非常简单的策略(例如均值插补)到更高级的方法,例如需要使用马尔可夫链蒙特卡罗方法进行后验分布估算。当缺失模式的数量增加和/或同时涉及到分类随机变量和连续随机变量时,会增加额外的复杂性。如今,能够生成不完整数据的插补的例程,过程或程序包的实现已广泛可用。我们在一个激励性例子的背景下以及在模拟研究中,在两种缺失机制下(随机缺失和非随机缺失)对其中一些进行了回顾。到目前为止,在估算缺失数据后,对现有实现的评估通常集中在感兴趣的指定模型的结果参数估计上。但是,在某些情况下,人们很可能会对个人估算值的质量产生兴趣-这个问题很少受到关注。在本文中,我们将重点放在后者上,以提供有关这方面不同例程,过程和程序包的性能的进一步信息。

著录项

  • 来源
    《Journal of statistical computation and simulation》 |2011年第12期|p.1653-1675|共23页
  • 作者单位

    Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium;

    Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium,School of Statistics, University of the Philippines, Diliman, Quezon City,Philippines;

    Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Universiteit Hasselt, Agoralaan I,B-3590 Diepenbeek, Belgium,Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Katholieke Universiteit Leuven, Kapucijnenvoer 35, B-3000 Leuven, Belgium;

    IM Associates BVBA, Sales and Marketing Effectiveness, Brusselsesteenweg 52, B-3000 leuven, Belgium;

    IM Associates BVBA, Sales and Marketing Effectiveness, Brusselsesteenweg 52, B-3000 leuven, Belgium;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    multiple imputation; missing data; missing at random; missing not at random; random forest;

    机译:多重插补缺失数据;随机失踪;并非随机失踪;随机森林;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号