【24h】

Petabyte Scale Data Mining: Dream or Reality?

机译:PB级数据挖掘:梦想还是现实?

获取原文
获取原文并翻译 | 示例

摘要

Science is becoming very data intensive. Today's astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such a task today is trivial, almost manageable with a laptop. We think that the issue of a PB database will be very similar in six years. In this paper we scale our current experiments in data archiving and analysis on the Sloan Digital Sky Survey data six years into the future. We analyze these projections and look at the requirements of performing data mining on such data sets. We conclude that the task scales rather well: we could do the job today, although it would be expensive. There do not seem to be any show-stoppers that would prevent us from storing and using a Petabyte dataset six years from today.
机译:科学正变得非常数据密集。如今,拥有数千万个星系的天文学数据集已经对数据挖掘提出了严峻的挑战。在不到10年的时间里,目录预计将增长到数十亿个对象,并且图像档案将达到PB。想象一下,1996年有一个100GB的数据库,当时磁盘扫描速度为30MB / s,而数据库工具还不成熟。今天的这项任务是微不足道的,几乎可以用笔记本电脑来管理。我们认为,PB数据库的问题将在六年内非常相似。在本文中,我们将在六年后对Sloan Digital Sky Survey数据进行数据归档和分析的现有实验进行扩展。我们分析这些预测,并研究对此类数据集进行数据挖掘的需求。我们得出的结论是,任务的伸缩性相当好:尽管成本很高,但我们今天可以完成工作。似乎没有阻止我们从今天起六年后存储和使用Petabyte数据集的障碍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号