We look at fitting regression models using data from stratified cluster samples when thestrata may depend in some way on the observed responses within clusters. One importantsubclass of examples is that of family studies in genetic epidemiology, where the probabilityof selecting a family into the study depends on the incidence of disease within the family.We develop the survey-weighted estimating equation approach for this problem,with particular emphasis on the estimation of superpopulation parameters. Full maximumlikelihood for this class of problems involves modelling the population distribution of thecovariates which is simply not feasible when there are a large number of potential covariates.We discuss efficient semiparametric maximum likelihood methods in which the covariatedistribution is left completely unspecified. We further discuss the relative efficiencies of thesetwo approaches.
展开▼