...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended Aggregates
【24h】

OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended Aggregates

机译:OLAP over Probabilistic Data Cubes II: Parallel Materialization and Extended Aggregates

获取原文
获取原文并翻译 | 示例
           

摘要

On-Line Analytical Processing (&italic&OLAP&/italic&) enables powerful analytics by quickly computing aggregate values of numerical measures over multiple hierarchical dimensions for massive datasets. However, many types of source data, e.g., from GPS, sensors, and other measurement devices, are intrinsically inaccurate (imprecise and/or uncertain) and thus OLAP cannot be readily applied. In this paper, we address the resulting &italic&data veracity&/italic& problem in OLAP by proposing the concept of probabilistic data cubes. Such a cube is comprised of a set of probabilistic cuboids which summarize the aggregated values in the form of probability mass functions (pmfs &italic&in short&/italic&) and thus offer insights into the underlying data quality and enable confidence-aware query evaluation and analysis. However, the probabilistic nature of data poses computational challenges, since a probabilistic database can have exponential number of possible worlds under the possible world semantics. Even worse, it is hard to share computations among different cuboids, as aggregation functions that are distributive for traditional data cubes, e.g., &inline-formula&&tex-math notation="LaTeX"&$tt SUM$&/tex-math&&alternatives&&mml:math&&mml:mi mathvariant="monospace"&SUM&/mml:mi&&/mml:math&&inline-graphic xlink:href="xie-ieq1-2913420.gif"/&&/alternatives&&/inline-formula&, become holistic in probabilistic settings. In this paper, we propose a complete set of techniques for probabilistic data cubes, from cuboid aggregation, over cube materialization, to query evaluation. We study two types of aggregation: convolution and sketch-based, which take polynomial time complexities for aggregation and jointly enable efficient query processing. Also, our proposal is versatile in terms of: 1) its capability of supporting common aggregation functions, i.e., &inline-formula&&tex-math notation="LaTeX"&$tt SUM$&/tex-math&&alternatives&&mml:math&&mml:mi mathvariant="monospace"&SUM&/mml:mi&&/mml:math&&inline-graphic xlink:href="xie-ieq2-2913420.gif"/&&/alternatives&&/inline-formula&, &inline-formula&&tex-math notation="LaTeX"&$tt COUNT$&/tex-math&&alternatives&&mml:math&&mml:mi mathvariant="monospace"&COUNT&/mml:mi&&/mml:math&&inline-graphic xlink:href="xie-ieq3-2913420.gif"/&&/alternatives&&/inline-formula&, &inline-formula&&tex-math notation="LaTeX"&$tt MAX$&/tex-math&&alternatives&&mml:math&&mml:mi mathvariant="monospace"&MAX&/mml:mi&&/mml:math&&inline-graphic xlink:href="xie-ieq4-2913420.gif"/&&/alternatives&&/inline-formula&, and &inline-formula&&tex-math notation="LaTeX"&$tt AVG$&/tex-math&&alternatives&&mml:math&&mml:mi mathvariant="monospace"&AVG&/mml:mi&&/mml:math&&inline-graphic xlink:href="xie-ieq5-2913420.gif"/&&/alternatives&&/inline-formula&; 2) its adaptivity to different materialization strategies, e.g., full versus partial materialization, with support of our devised cost models and parallelization framework; 3) its coverage of common OLAP operations, i.e., probabilistic slicing and dicing queries. Extensive experiments over real and synthetic datasets show that our techniques are effective and

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号