首页> 外文期刊>Journal of applied statistics >Addressing the problem of missing data in decision tree modeling
【24h】

Addressing the problem of missing data in decision tree modeling

机译:解决决策树建模中数据丢失的问题

获取原文
获取原文并翻译 | 示例
           

摘要

Tree-based models (TBMs) can substitute missing data using the surrogate approach (SUR). The aim of this study is to compare the performance of statistical imputation against the performance of SUR in TBMs. Employing empirical data, a TBM was constructed. Thereafter, 10%, 20%, and 40% of variable values appeared as the first split was deleted, and imputed with and without the use of outcome variables in the imputation model (IMP- and IMP+). This was repeated one thousand times. Absolute relative bias above 0.10 was defined as sever (SARB). Subsequently, in a series of simulations, the following parameters were changed: the degree of correlation among variables, the number of variables truly associated with the outcome, and the missing rate. At a 10% missing rate, the proportion of times SARB was observed in either SUR or IMP- was two times higher than in IMP+ (28% versus 13%). When the missing rate was increased to 20%, all these proportions were approximately doubled. Irrespective of the missing rate, IMP+ was about 65% less likely to produce SARB than SUR. Results of IMP- and SUR were comparable up to a 20% missing rate. At a high missing rate, IMP- was 76% more likely to provide SARB estimates. Statistical imputation of missing data and the use of outcome variable in the imputation model is recommended, even in the content of TBM.
机译:基于树的模型(TBM)可以使用替代方法(SUR)替代丢失的数据。这项研究的目的是比较统计插补的性能与TBM中SUR的性能。利用经验数据,构建了TBM。此后,在删除第一个拆分时,将出现10%,20%和40%的变量值,并在插补模型(IMP-和IMP +)中使用和不使用结果变量进行插补。重复一千次。高于0.10的绝对相对偏差定义为服务器(SARB)。随后,在一系列模拟中,更改了以下参数:变量之间的相关程度,与结果真正相关的变量数量以及丢失率。以10%的丢失率,在SUR或IMP-中观察到SARB的时间比例是IMP +的两倍(28%对13%)。当丢失率增加到20%时,所有这些比例大约增加了一倍。无论丢失率如何,IMP +产生SARB的可能性均比SUR低约65%。 IMP-和SUR的结果可比,丢失率高达20%。在高失误率下,IMP-提供SARB估计的可能性要高76%。即使在TBM的内容中,也建议对缺失数据进行统计插补,并在插补模型中使用结果变量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号