首页> 外文学位 >Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.

【24h】

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.

机译：使用纵向因素分析模型通过多重插补处理具有混合数据类型的不完整的高维多元纵向数据。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We developed an imputation model solving the missing-data problem in a high-dimensional longitudinal data set with mixed data types (continuous and ordinal) based on a factor-analysis and a linear mixed-effect model. Markov Chain Monte Carlo is used to fit the model, drawing parameters, latent variables and missing values iteratively. The imputation model is written in an R package.;We tested the newly developed imputation model using simulated data sets under 32 scenarios and 2 hypothetical missing-data mechanisms. Two competitive models PAN (Multiple Imputation for Multivariate Panel or Clustered Data) and MICE (Multiple Imputation using Chained Equations) are also tested in the same way for comparison, to show the necessity of addressing the high-dimension and mixed continuous and ordinal data type issues.;Part of the effort we made is to accelerate the simulation using C++ (a low-level language) and the parallel computing by the Hoffman 2 Cluster. Compared to running the simulation evaluation in an R program on one single computer, the program we use for the simulation evaluation runs approximately 600 times faster.;We also tested the robustness of the newly developed imputation model in the cases of violation of assumptions. We found that assuming less than the true number of factors corresponds to invalid inferences, while assuming more than that corresponds to reasonable inferences. We also found that only omitting very strong underlying quadratic trends of the factor scores hurt the inferences based on the imputation. In the most unfavorable scenario we tested, when the underlying quadratic coefficient is as large as .8 of the linear coefficient, the actual coverage rates of 95% interval estimates start falling below 90%.;An application to a dentistry data is shown, in comparison to the PAN, NORM and a fore runner of the newly developed method.

机译：我们基于因子分析和线性混合效应模型，开发了一种归因模型，用于解决带有混合数据类型（连续和有序）的高维纵向数据集中的缺失数据问题。 Markov Chain Monte Carlo用于迭代拟合模型，绘制参数，潜在变量和缺失值。归因模型使用R包编写。我们在32种情况下使用模拟数据集和2种假设的缺失数据机制对新开发的归因模型进行了测试。还以相同的方式测试了两个竞争模型PAN（用于多元面板或聚类数据的多重插补）和MICE（使用链式方程式的多重插补）进行比较，以显示解决高维数据以及混合的连续和有序数据类型的必要性我们所做的部分工作是使用C ++（一种低级语言）和Hoffman 2 Cluster的并行计算来加速仿真。与在一台计算机上的R程序中运行仿真评估相比，我们用于仿真评估的程序的运行速度快了大约600倍。我们还测试了新开发的归因模型在违反假设的情况下的鲁棒性。我们发现，假设少于实际数量的因素对应于无效的推断，而假设超过实际数目的因素对应于合理的推断。我们还发现，仅忽略非常强烈的因素得分的二次方趋势会损害基于推论的推论。在我们测试过的最不利的情况下，当基础二次系数大到线性系数的.8时，95％区间估计的实际覆盖率开始低于90％。与PAN，NORM和新开发方法的先行者相比。

著录项

作者
Lu, Xiang.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Biostatistics.
学位 Ph.D.
年度 2016
页码 114 p.
总页数 114
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Imputation for incomplete high-dimensional multivariate normal data using a common factor model. [J] . Song J, Belin TR Statistics in medicine . 2004,第18期

机译：使用公共因子模型估算不完整的高维多元正态数据。
2. Analysis of incomplete longitudinal binary data using multiple imputation. [J] . Li X, Mehrotra DV, Barnard J Statistics in medicine . 2006,第12期

机译：使用多重插补分析不完整的纵向二进制数据。
3. Multiple imputation for high-dimensional mixed incomplete continuous and binary data [J] . HeR., BelinT. Statistics in medicine . 2014,第13期

机译：高维混合不完整连续和二进制数据的多重插补
4. Multiple-vs Non-or Single-Imputation Based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data [C] . Zhaoyang Zhang, Hua Fang 2016 IEEE First Conference on Connected Health: Applications, Systems and Engineering Technologies . 2016

机译：不完整纵向行为干预数据的基于多VS非或单输入的模糊聚类
5. Evaluating Multiple Imputation Methods for Longitudinal Healthy Aging Index—A Score Variable with Data Missing Due to Death, Dropout and Several Missing Data Mechanisms [D] . Kane, Elizabeth L. 2017

机译：纵向健康老龄化指数的多种估算方法的评估-一个因死亡，辍学和几种缺失数据机制导致数据缺失的得分变量
6. Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data [O] . Zhaoyang Zhang, Hua Fang -1

机译：不完整纵向行为干预数据的基于多输入或非输入的模糊聚类
7. A latent factor linear mixed model for high-dimensional longitudinal data analysis [O] . Xinming An, Qing Yang, Peter M. Bentler 2013

机译：高维纵向数据分析的潜在因子线性混合模型

Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model.

摘要

著录项

相似文献

相关主题

期刊订阅