首页> 外文期刊>Statistics and computing >MIMCA: multiple imputation for categorical variables with multiple correspondence analysis
【24h】

MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

机译:MIMCA:具有多重对应分析的类别变量的多重插补

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.
机译:我们提出了一种多重插补方法来处理不完整的分类数据。此方法使用专用于分类数据的主成分方法:多重对应分析(MCA)来估算缺少的条目。使用非参数自举法可反映有关插补模型参数的不确定性。由于MCA的降维特性,使用MCA(MIMCA)的多重插补需要估算少量参数。它允许用户估算大范围的数据集。特别地,对于MIMCA而言,每个变量的类别数量很大,变量的数量很大或个体数量很少。通过基于真实数据集的模拟研究,对该方法进行评估,并将其与参考方法(使用对数线性模型进行多次插补,通过逻辑回归进行多次插补)以及该主题的最新著作(通过随机森林或由Dirichlet过程的乘积的多项式分布的乘积模型)。所提出的方法为所考虑的分析模型的参数提供了良好的点估计,例如主效应逻辑回归模型的系数,以及估计量变化的可靠估计。另外,MIMCA具有很大的优势,即与其他多种插补方法相比,在高维数据集上的耗时要少得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号