首页> 美国卫生研究院文献>other >Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling

【2h】

Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling

机译：在混合建模中的大型数据集中从大数据集中采样

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the challenges in using Markov chain Monte Carlo for model analysis in studies with very large datasets is the need to scan through the whole data at each iteration of the sampler, which can be computationally prohibitive. Several approaches have been developed to address this, typically drawing computationally manageable subsamples of the data. Here we consider the specific case where most of the data from a mixture model provides little or no information about the parameters of interest, and we aim to select subsamples such that the information extracted is most relevant. The motivating application arises in flow cytometry, where several measurements from a vast number of cells are available. Interest lies in identifying specific rare cell subtypes and characterizing them according to their corresponding markers. We present a Markov chain Monte Carlo approach where an initial subsample of the full dataset is used to guide selection sampling of a further set of observations targeted at a scientifically interesting, low probability region. We define a Sequential Monte Carlo strategy in which the targeted subsample is augmented sequentially as estimates improve, and introduce a stopping rule for determining the size of the targeted subsample. An example from flow cytometry illustrates the ability of the approach to increase the resolution of inferences for rare cell subtypes.

著录项

期刊名称 other
作者
Ioanna Manolopoulou; Cliburn Chan; Mike West;
展开▼
作者单位

展开▼
年(卷),期 -1(5),3
年度 -1
页码 1–22
总页数 22
原文格式 PDF
正文语种
中图分类
关键词
Flow cytometry large data sets mixture models rare events resampling selection sampling sequential Monte Carlo;

机译：流式细胞仪;大数据集;混合模型;罕见事件;重采样;选择采样;顺序蒙特卡洛;

相似文献

外文文献
中文文献
专利

1. New Workflow for QSAR Model Development from Small Data Sets: Small Dataset Curator and Small Dataset Modeler. Integration of Data Curation, Exhaustive Double Cross-Validation, and a Set of Optimal Model Selection Techniques [J] . Ambure Pravin, Gajewicz-Skretna Agnieszka, Cordeiro M. Natalia D. S., Journal of chemical information and modeling . 2019,第10期

机译：来自小型数据集的QSAR模型开发的新工作流程：小型数据集策划器和小型数据集型号。数据策择集成，详尽的双交叉验证以及一组最佳模型选择技术
2. Data‐adaptive longitudinal model selection in causal inference with collaborative targeted minimum loss‐based estimation [J] . Schnitzer Mireille E., Sango Joel, Ferreira Guerra Steve, Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2020,第1期

机译：数据自适应纵向模型选择因因果推断而具有协作目标最小损失的估计
3. On the Robustness of Conceptual Rainfall-Runoff Models to Calibration and Evaluation Data Set Splits Selection: A Large Sample Investigation [J] . Guo Danlu, Zheng Feifei, Gupta Hoshin, Water resources research . 2020,第3期

机译：关于概念降雨径流模型的鲁棒性和评估数据集分裂选择：大型样本调查
4. The Impact of Sampling and Rule Set Size on Generated Fuzzy Inference System Predictive Accuracy: Analysis of a Software Engineering Data Set [C] . Stephen G. MacDonell Artificial intelligence applications and innovations . 2011

机译：抽样和规则集大小对生成的模糊推理系统预测精度的影响：软件工程数据集的分析
5. Order selection in classical finite mixture models and variable selection and inference in finite mixture of regression models. [D] . Khalili Mahmoudabadi, Abbasali. 2006

机译：经典有限混合模型中的顺序选择以及回归模型的有限混合中的变量选择和推断。
6. Bayesian Inference for Growth Mixture Models with Latent ClassDependent Missing Data [O] . Zhenqiu Laura Lu, Zhiyong Zhang, Gitta Lubke -1

机译：贝叶斯推理与潜类增长混合模型相关数据缺失
7. Selection Sampling from Large Data sets for Targeted Inference in Mixture Modeling [O] . Ioanna Manolopoulou, Cliburn Chan, Mike West 2009

机译：混合建模中目标推理的大数据集选择抽样

Selection Sampling from Large Data Sets for Targeted Inference in Mixture Modeling

摘要

著录项

相似文献

相关主题

期刊订阅