首页> 外文会议>Scientific and statistical database management >Stratified Reservoir Sampling over Heterogeneous Data Streams
【24h】

Stratified Reservoir Sampling over Heterogeneous Data Streams

机译:异构数据流上的分层储层采样

获取原文
获取原文并翻译 | 示例

摘要

Reservoir sampling is a well-known technique for random sampling over data streams. In many streaming applications, however, an input stream may be naturally heterogeneous, i.e., composed of sub-streams whose statistical properties may also vary considerably. For this class of applications, the conventional reservoir sampling technique does not guarantee a statistically sufficient number of tuples from each sub-stream to be included in the reservoir, and this can cause a damage on the statistical quality of the sample. In this paper, we deal with this heterogeneity problem by stratifying the reservoir sample among the underlying sub-streams. We particularly consider situations in which the stratified reservoir sample is needed to obtain reliable estimates at the level of either the entire data stream or individual sub-streams. The first challenge in this stratification is to achieve an optimal allocation of a fixed-size reservoir to individual sub-streams. The second challenge is to adaptively adjust the allocation as sub-streams appear in, or disappear from, the input stream and as their statistical properties change over time. We present a stratified reservoir sampling algorithm designed to meet these challenges, and demonstrate through experiments the superior sample quality and the adaptivity of the algorithm.
机译:储层采样是一种众所周知的对数据流进行随机采样的技术。然而,在许多流应用中,输入流可以自然地是异质的,即由其统计特性也可以变化很大的子流组成。对于此类应用,常规的储层采样技术不能保证每个子流的统计上足够多的元组包含在储层中,这可能会损害样本的统计质量。在本文中,我们通过对基础子流之间的储层样本进行分层来处理这种非均质性问题。我们特别考虑需要分层储层样本才能获得整个数据流或单个子流水平的可靠估计的情况。这种分层的第一个挑战是实现固定大小的储层到各个子流的最佳分配。第二个挑战是随着子流出现在输入流中或从输入流中消失以及它们的统计属性随时间变化而自适应地调整分配。我们提出了一种为应对这些挑战而设计的分层油藏采样算法,并通过实验证明了该算法的优越采样质量和适应性。

著录项

  • 来源
  • 会议地点 Heidelberg(DE);Heidelberg(DE)
  • 作者单位

    Department of Computer Science, The University of Vermont, Burlington VT, USA;

    Department of Computer Science, The University of Vermont, Burlington VT, USA;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP311.13;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号