【24h】

Data Processing Factory for the Sloan Digital Sky Survey

机译:斯隆数字天空测量数据处理厂

获取原文
获取原文并翻译 | 示例

摘要

The Sloan Digital Sky Survey (SDSS) data handling presents two challenges: large data volume and timely production of spectroscopic plates from imaging data. A data processing factory, using technologies both old and new, handles this flow. Distribution to end users is via disk farms, to serve corrected images and calibrated spectra, and a database, to efficiently process catalog queries. For distribution of modest amounts of data from Apache Point Observatory to Fermilab, scripts use rsync to update files, while larger data transfers are accomplished by shipping magnetic tapes commercially. All data processing pipelines are wrapped in scripts to address consecutive phases: preparation, submission, checking, and quality control. We constructed the factory by chaining these pipelines together while using an operational database to hold processed imaging catalogs. The science database catalogs all imaging and spectroscopic object, with pointers to the various external files associated with them. Diverse computing systems address particular processing phases. UNIX computers handle tape reading and writing, as well as calibration steps that require access to a large amount of data with relatively modest computational demands. Commodity CPUs process steps that require access to a limited amount of data with more demanding computations requirements. Disk servers optimized for cost per Gbyte serve terabytes of processed data, while servers optimized for disk read speed run SQLServer software to process queries on the catalogs. This factory produced data for the SDSS Early Data Release in June 2001, and it is currently producing Data Release One, scheduled for January 2003.
机译:斯隆数字天空测量(SDSS)数据处理面临两个挑战:大数据量和如何从成像数据中及时生成光谱板。数据处理工厂使用新旧技术来处理此流程。通过磁盘场分配给最终用户,以提供校正后的图像和校正后的光谱,以及一个数据库以有效地处理目录查询。为了将少量数据从Apache Point天文台分发到Fermilab,脚本使用rsync更新文件,而较大的数据传输则通过商业上出售磁带来完成。所有数据处理管道都包装在脚本中,以解决连续的阶段:准备,提交,检查和质量控制。我们通过将这些管线链接在一起,同时使用运营数据库来保存经过处理的成像目录来构建工厂。科学数据库对所有成像和光谱对象进行分类,并提供指向与之关联的各种外部文件的指针。各种计算系统解决特定的处理阶段。 UNIX计算机处理磁带读取和写入,以及校准步骤,这些步骤要求以相对适度的计算需求访问大量数据。商品CPU的处理步骤需要访问数量有限的数据,而计算要求却更高。针对每GB成本进行了优化的磁盘服务器可服务于数TB的已处理数据,而针对磁盘读取速度进行了优化的服务器则运行SQLServer软件来处理目录中的查询。该工厂于2001年6月为SDSS早期数据发布制作了数据,目前正在生产计划于2003年1月发布的数据发布1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号