首页> 外文学位 >Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.
【24h】

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.

机译:使用纵向因素分析模型通过多重插补处理具有混合数据类型的不完整的高维多元纵向数据。

获取原文
获取原文并翻译 | 示例

摘要

We developed an imputation model solving the missing-data problem in a high-dimensional longitudinal data set with mixed data types (continuous and ordinal) based on a factor-analysis and a linear mixed-effect model. Markov Chain Monte Carlo is used to fit the model, drawing parameters, latent variables and missing values iteratively. The imputation model is written in an R package.;We tested the newly developed imputation model using simulated data sets under 32 scenarios and 2 hypothetical missing-data mechanisms. Two competitive models PAN (Multiple Imputation for Multivariate Panel or Clustered Data) and MICE (Multiple Imputation using Chained Equations) are also tested in the same way for comparison, to show the necessity of addressing the high-dimension and mixed continuous and ordinal data type issues.;Part of the effort we made is to accelerate the simulation using C++ (a low-level language) and the parallel computing by the Hoffman 2 Cluster. Compared to running the simulation evaluation in an R program on one single computer, the program we use for the simulation evaluation runs approximately 600 times faster.;We also tested the robustness of the newly developed imputation model in the cases of violation of assumptions. We found that assuming less than the true number of factors corresponds to invalid inferences, while assuming more than that corresponds to reasonable inferences. We also found that only omitting very strong underlying quadratic trends of the factor scores hurt the inferences based on the imputation. In the most unfavorable scenario we tested, when the underlying quadratic coefficient is as large as .8 of the linear coefficient, the actual coverage rates of 95% interval estimates start falling below 90%.;An application to a dentistry data is shown, in comparison to the PAN, NORM and a fore runner of the newly developed method.
机译:我们基于因子分析和线性混合效应模型,开发了一种归因模型,用于解决带有混合数据类型(连续和有序)的高维纵向数据集中的缺失数据问题。 Markov Chain Monte Carlo用于迭代拟合模型,绘制参数,潜在变量和缺失值。归因模型使用R包编写。我们在32种情况下使用模拟数据集和2种假设的缺失数据机制对新开发的归因模型进行了测试。还以相同的方式测试了两个竞争模型PAN(用于多元面板或聚类数据的多重插补)和MICE(使用链式方程式的多重插补)进行比较,以显示解决高维数据以及混合的连续和有序数据类型的必要性我们所做的部分工作是使用C ++(一种低级语言)和Hoffman 2 Cluster的并行计算来加速仿真。与在一台计算机上的R程序中运行仿真评估相比,我们用于仿真评估的程序的运行速度快了大约600倍。我们还测试了新开发的归因模型在违反假设的情况下的鲁棒性。我们发现,假设少于实际数量的因素对应于无效的推断,而假设超过实际数目的因素对应于合理的推断。我们还发现,仅忽略非常强烈的因素得分的二次方趋势会损害基于推论的推论。在我们测试过的最不利的情况下,当基础二次系数大到线性系数的.8时,95%区间估计的实际覆盖率开始低于90%。与PAN,NORM和新开发方法的先行者相比。

著录项

  • 作者

    Lu, Xiang.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Biostatistics.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 114 p.
  • 总页数 114
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号