首页> 外文会议>National Symposium on Mathematical Sciences >A Comparison of Model-Based Imputation Methods for Handling Missing Predictor Values in a Linear Regression Model: A Simulation Study
【24h】

A Comparison of Model-Based Imputation Methods for Handling Missing Predictor Values in a Linear Regression Model: A Simulation Study

机译:用于处理线性回归模型中缺失预测值的模型的估算方法的比较:模拟研究

获取原文

摘要

In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
机译:在回归分析中,缺少协变量数据一直是一个常见问题。由于易于实施,许多研究人员使用Ad Hoc方法来克服这个问题。然而,这些方法需要对很少在实践中持有的数据的假设。使用期望最大化(EM)算法(EM)算法(EM)算法(MI)的基于模型的方法在处理缺失数据引起的困难时更有希望。然后,缺少价值估算的不适当方法可能导致严重影响参数估计的严重偏差。本研究的主要目标是提供有关缺失数据概念的更好的理解,可以帮助研究人员选择适当的缺失数据载体方法。进行了模拟研究以评估不同缺失数据技术对回归模型性能的影响。使用基础多元正常分布生成协变量数据,并且生成从属变量作为解释变量的组合。使用随机(MAR)丢失的机制模拟协变量中缺失的协变量。施加了四个潜水率(10%,20%,30%和40%)。调查了SAS软件中可用的ML和MI技术。拟合线性回归分析和模型性能措施;获得MSE和R角。分析结果表明,当缺失百分比小于30%时,MI在处理缺失的数据时处理缺失数据,最低的MSE和最低的MSE。两种方法都无法处理大于30%的缺失水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号