首页> 外文学位 >Small and stable descriptors of distributions for geometric statistical problems.
【24h】

Small and stable descriptors of distributions for geometric statistical problems.

机译:几何统计问题的分布的小而稳定的描述符。

获取原文
获取原文并翻译 | 示例

摘要

This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter &egr;. Two examples of coresets are &egr;-samples and &egr;-kernels. An &egr;-sample can estimate the density of a point set in any range from a geometric family of ranges (e.g., disks, axis-aligned rectangles). An &egr;-kernel approximates the width of a point set in all directions. Both coresets have size that depends only on &egr;, the error parameter, not the size of the original data set. We demonstrate several improvements to these coresets and how they are useful for geometric statistical problems.;We reduce the size of &egr;-samples for density queries in axis-aligned rectangles to nearly a square root of the size when the queries are with respect to more general families of shapes, such as disks. We also show how to construct &egr;-samples of probability distributions.;We show how to maintain "stable" &egr;-kernels, that is, if the point set P changes by a small amount, then the &egr;-kernel also changes by a small amount. This is useful in surveillance and tracking problems, and the stable properties leads to more efficient algorithms for maintaining &egr;-kernels.;We next study when the input point sets are uncertain and their uncertainty is modeled by probability distributions. Statistics on these point sets (e.g., radius of smallest enclosing ball) do not have exact answers, but rather distributions of answers. We describe data structures to represent approximations of these distributions and algorithms to compute them. We also show how to create distributions of &egr;-kernels and &egr;-samples for these uncertain data sets.;Finally, we examine a spatial anomaly detection problem: computing a spatial scan statistic. The input is a point set P and measurements on the point set. The spatial scan statistic finds the range (e.g., an axis-aligned bounding box) where the measurements inside the range are the most different from measurements outside of the range. We show how to compute this statistic efficiently while allowing for a bounded amount of approximation error. This result generalizes to several statistical models and types of input point sets.
机译:本文探讨了如何稀疏地表示几何统计问题的点分布。核心集C是点集P的一个小总结,因此,如果在P和C上计算出某个统计量,则可以保证结果的差异受参数&egr;的限制。核心集的两个示例是&egr--samples和&egr; -kernels。样本可以从几何范围的范围(例如,磁盘,轴对齐的矩形)中的任何范围内估计点集的密度。核近似于在所有方向上设置的点的宽度。两个核心集的大小仅取决于错误参数&egr ;,而不取决于原始数据集的大小。我们展示了对这些核心集的一些改进以及它们如何用于几何统计问题。我们将轴对齐矩形中用于密度查询的&egr; -samples的大小减小到与查询有关时的大小的平方根更一般的形状系列,例如圆盘。我们还展示了如何构造概率分布的&egr.-样本。我们展示了如何维持“稳定”的eegr;-核,即,如果点集P发生少量变化,则&egr;-核也将变化少量。这在监视和跟踪问题中很有用,并且稳定的属性导致维护内核的更有效算法。我们接下来研究当输入点集不确定并且其不确定性由概率分布建模时。这些点集的统计数据(例如最小包围球的半径)没有确切的答案,而是有答案的分布。我们描述了表示这些分布的近似值的数据结构以及计算它们的算法。我们还展示了如何为这些不确定的数据集创建-eg核和-eg样本的分布。最后,我们研究了空间异常检测问题:计算空间扫描统计量。输入是点集P和对点集的测量。空间扫描统计信息会找到范围(例如,与轴对齐的边界框),其中范围内的测量值与范围外的测量值差异最大。我们展示了如何在允许有限数量的近似误差的同时,有效地计算该统计量。该结果概括为几种统计模型和输入点集的类型。

著录项

  • 作者

    Phillips, Jeff M.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:38:26

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号