首页> 外文学位 >Generalized statistical methods for mixed exponential families.
【24h】

Generalized statistical methods for mixed exponential families.

机译:混合指数族的广义统计方法。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation considers the problem of learning the underlying statistical structure of complex data sets for fitting a generative model, and for both supervised and unsupervised data-driven decision making purposes. Using properties of exponential family distributions, a new unified theoretical model called Generalized Linear Statistics is established.;The complexity of data is generally a consequence of the existence of a large number of components and the fact that the components are often of mixed data types (i.e., some components might be continuous, with different underlying distributions, while other components might be discrete, such as categorical, count or Boolean). Such complex data sets are typical in drug discovery, health care, or fraud detection.;The proposed statistical modeling approach is a generalization and amalgamation of techniques from classical linear statistics placed into a unified framework referred to as Generalized Linear Statistics (GLS). This framework includes techniques drawn from latent variable analysis as well as from the theory of Generalized Linear Models (GLMs), and is based on the use of exponential family distributions to model the various mixed types (continuous and discrete) of complex data sets. The methodology exploits the connection between data space and parameter space present in exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that have been mapped into parameter space.;One key aspect of the GLS framework is that often the natural parameter of the exponential family distributions is assumed to be constrained to a lower dimensional latent variable subspace, modeling the belief that the intrinsic dimensionality of the data is smaller than the dimensionality of the observation space.;The framework is equivalent to a computationally tractable, mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We demonstrate that exponential family Principal Component Analysis, Semi-Parametric exponential family Principal Component Analysis, and Bregman soft clustering are not separate unrelated algorithms, but different manifestations of model assumptions and parameter choices taken within this common GLS framework. Because of this insight, these algorithms are readily extended to deal with the important mixed data-type case. This framework has the critical advantage of allowing one to transfer high-dimensional mixed-type data components to low-dimensional common-type latent variables, which are then, in turn, used to perform regression or classification in a much simpler manner using well-known continuous-parameter classical linear techniques.;Classification results on synthetic data and data sets from the University of California, Irvine machine learning repository are presented.
机译:本文考虑了学习复杂数据集的基础统计结构以适应生成模型以及用于有监督和无监督的数据驱动决策目的的问题。利用指数族分布的性质,建立了一个称为广义线性统计的新统一理论模型。;数据的复杂性通常是由于大量组件的存在以及这些组件通常是混合数据类型的事实的结果(也就是说,某些组件可能是连续的,具有不同的基础分布,而其他组件可能是离散的,例如分类,计数或布尔值。这种复杂的数据集在药物发现,医疗保健或欺诈检测中很常见。所提出的统计建模方法是将经典线性统计技术的归纳和合并,将其归入称为“广义线性统计(GLS)”的统一框架中。该框架包括从潜在变量分析以及广义线性模型(GLM)理论中汲取的技术,并且基于使用指数族分布来建模复杂数据集的各种混合类型(连续和离散)的技术。该方法利用了指数族分布中存在的数据空间和参数空间之间的联系,并通过使用经典线性统计工具来解决已映射到参数空间的数据的非线性问题。GLS框架的一个关键方面是经常假设指数族分布的自然参数被限制在一个较低维的潜在变量子空间中,从而建立了这样一种信念,即数据的固有维数小于观测空间的维数;该框架等效于可计算的可计算性,混合数据类型分层贝叶斯图形模型假设,其中潜在变量被约束到低维参数子空间。我们证明了指数族主成分分析,半参数指数族主成分分析和Bregman软聚类不是单独的不相关算法,而是在该通用GLS框架内采用的模型假设和参数选择的不同体现。由于这种见识,这些算法很容易扩展以处理重要的混合数据类型情况。该框架的关键优势在于可以将高维混合类型的数据分量转移到低维普通类型的潜在变量,然后,这些变量又可以使用井井有条地以更简单的方式执行回归或分类。 ;已知的连续参数经典线性技术。;给出了来自加州大学尔湾分校机器学习存储库的合成数据和数据集的分类结果。

著录项

  • 作者

    Levasseur, Cecile.;

  • 作者单位

    University of California, San Diego.;

  • 授予单位 University of California, San Diego.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 249 p.
  • 总页数 249
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号