首页> 外文学位 >Bolstering CART and Bayesian variable selection methods for classification.
【24h】

Bolstering CART and Bayesian variable selection methods for classification.

机译:支持CART和贝叶斯变量选择方法进行分类。

获取原文
获取原文并翻译 | 示例

摘要

An important problem in many areas is exploring the relationship between object categories and their observational characteristics. In particular, it is important to understand which measurements are related to a specific category. One way of tackling this sort of discriminant problem is by a nonparametric method known as Classification and Regression Trees (CART). In this thesis, a stochastic step is added to the CART algorithm and an annealing schedule is used to find 'optimal' models. Two approaches to model selection are proposed to avoid overfitting problems.; For the problems with high dimensional and collinear data sets, we propose a Bayesian variable selection approach to multinomial probit models. Motivated by the binary probit model with latent variables, we build a multivariate extension to the case of more than two categories and use latent variables to specialize the general distributional setting to the linear model with Gaussian errors. We then apply Bayesian variable selection techniques that adopt natural conjugate prior distributions. A posteriori we integrate some of the parameters out and do inference on the marginal distribution of single models by using MCMC methods and truncated normal or student-t sampling techniques to draw multivariate vectors. We apply the methodology to problems in both chemometrics and functional genomics, first to a dataset with three wheats and 100 near infra-red absorbance as regressors, then to two datasets involving microarray data.
机译:在许多领域中,一个重要的问题是探索物体类别与其观测特征之间的关系。特别重要的是要了解哪些测量值与特定类别有关。解决这类判别问题的一种方法是通过称为分类和回归树(CART)的非参数方法。在本文中,将随机步骤添加到CART算法中,并使用退火时间表查找“最佳”模型。提出了两种模型选择方法,以避免过度拟合问题。对于高维和共线数据集的问题,我们提出了多项式概率模型的贝叶斯变量选择方法。受具有潜在变量的二进制概率模型的启发,我们针对两个以上类别的情况构建了多元扩展,并使用潜在变量将一般分布设置专门化为具有高斯误差的线性模型。然后,我们应用采用自然共轭先验分布的贝叶斯变量选择技术。后验我们将某些参数整合出来,并通过使用MCMC方法和截断法线或学生t采样技术绘制多元向量来推断单个模型的边际分布。我们将该方法应用于化学计量学和功能基因组学方面的问题,首先将其应用到具有三个小麦和100个近红外吸收率的回归数据集,然后再应用到涉及微阵列数据的两个数据集。

著录项

  • 作者

    Sha, Naijun.;

  • 作者单位

    Texas A&M University.;

  • 授予单位 Texas A&M University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 87 p.
  • 总页数 87
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号