首页> 外文学位 >A Continuous Latent Factor Model for Non-ignorable Missing Data in Longitudinal Studies.
【24h】

A Continuous Latent Factor Model for Non-ignorable Missing Data in Longitudinal Studies.

机译:纵向研究中不可忽略缺失数据的连续潜在因子模型。

获取原文
获取原文并翻译 | 示例

摘要

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias in results if the missing mechanism is not taken into account. In other words, the missing mechanism is related to the unobserved responses.;Data are said to be non-ignorable missing if the probabilities of missingness depend on quantities that might not be included in the model. Classical pattern-mixture models for non-ignorable missing values are widely used for longitudinal data analysis because they do not require explicit specification of the missing mechanism, with the data stratified according to a variety of missing patterns and a model specified for each stratum. However, this usually results in under-identifiability, because of the need to estimate many stratum-specific parameters even though the eventual interest is usually on the marginal parameters. Pattern mixture models have the drawback that a large sample is usually required.;In this thesis, two studies are presented. The first study is motivated by an open problem from pattern mixture models. Simulation studies from this part show that information in the missing data indicators can be well summarized by a simple continuous latent structure, indicating that a large number of missing data patterns may be accounted by a simple latent factor. Simulation findings that are obtained in the first study lead to a novel model, a continuous latent factor model (CLFM). The second study develops CLFM which is utilized for modeling the joint distribution of missing values and longitudinal outcomes. The proposed CLFM model is feasible even for small sample size applications. The detailed estimation theory, including estimating techniques from both frequentist and Bayesian perspectives is presented. Model performance and evaluation are studied through designed simulations and three applications. Simulation and application settings change from correctly-specified missing data mechanism to mis-specified mechanism and include different sample sizes from longitudinal studies. Among three applications, an AIDS study includes non-ignorable missing values; the Peabody Picture Vocabulary Test data have no indication on missing data mechanism and it will be applied to a sensitivity analysis; the Growth of Language and Early Literacy Skills in Preschoolers with Developmental Speech and Language Impairment study, however, has full complete data and will be used to conduct a robust analysis. The CLFM model is shown to provide more precise estimators, specifically on intercept and slope related parameters, compared with Roy's latent class model and the classic linear mixed model. This advantage will be more obvious when a small sample size is the case, where Roy's model experiences challenges on estimation convergence. The proposed CLFM model is also robust when missing data are ignorable as demonstrated through a study on Growth of Language and Early Literacy Skills in Preschoolers.
机译:许多纵向研究,尤其是在临床试验中,都缺少数据问题。大多数估算程序都假定丢失的值是可忽略的或随机丢失的(MAR)。但是,这种假设导致不现实的简化,并且在许多情况下是不可行的。例如,研究人员正在研究治疗对抑郁症的效果。定期与医生安排受试者的时间,并询问有关近期情绪状况的问题。患有严重抑郁症的患者更有可能错过约会,而使该次就诊的数据丢失。如果不考虑缺失机制,则随机缺失的数据可能会在结果中产生偏差。换句话说,缺失机制与未观察到的响应有关。如果缺失概率取决于模型中可能未包含的数量,则数据被认为是不可忽略的缺失。对于不可忽略的缺失值,经典的模式混合模型被广泛用于纵向数据分析,因为它们不需要显式指定缺失机制,数据根据各种缺失模式和为每个层次指定的模型进行分层。但是,这通常会导致无法识别,因为尽管最终的兴趣通常是边际参数,但仍需要估算许多特定于层次的参数。模式混合模型的缺点是通常需要大量的样本。本论文提出了两项​​研究。第一项研究是由模式混合模型中的一个开放问题引起的。从这一部分进行的模拟研究表明,缺少的数据指标中的信息可以通过简单的连续潜在结构很好地概括,这表明大量的丢失数据模式可能是由简单的潜在因素引起的。在第一项研究中获得的仿真结果导致了一个新颖的模型,即连续潜在因子模型(CLFM)。第二项研究开发了CLFM,用于建模缺失值和纵向结果的联合分布。所提出的CLFM模型甚至对于小样本应用也是可行的。提出了详细的估计理论,包括从常客和贝叶斯角度的估计技术。通过设计的仿真和三个应用程序研究模型的性能和评估。模拟和应用程序设置从正确指定的缺失数据机制变为错误指定的机制,并包括纵向研究中不同的样本量。在三种应用中,一项艾滋病研究包括不可忽略的缺失值;皮博迪图片词汇测试数据没有数据丢失机制的迹象,将被用于敏感性分析;然而,具有发展性言语和语言障碍的学龄前儿童的语言和早期识字能力的增长研究具有完整的数据,将用于进行可靠的分析。与Roy的潜在类模型和经典的线性混合模型相比,CLFM模型可提供更精确的估计,尤其是在截距和坡度相关参数方面。当样本量较小时,Roy模型在估计收敛性方面遇到挑战,这种优势将更加明显。通过对学龄前儿童的语言发展和早期识字技能的研究表明,当缺少的数据可忽略时,建议的CLFM模型也很健壮。

著录项

  • 作者

    Zhang, Jun.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号