首页> 外文会议>Symposium on Mass Storage Systems and Technologies >DRepl: Optimizing Access to Application Data for Analysis and Visualization
【24h】

DRepl: Optimizing Access to Application Data for Analysis and Visualization

机译:DREPL:优化对应用程序数据进行分析和可视化的访问

获取原文

摘要

Until recently most scientific applications produced data that is saved, analyzed and visualized at later time. In recent years, with the large increase in the amount of data and computational power available there is demand for applications to support data access in-situ, or close-to simulation to provide application steering, analytics and visualization. Data access patterns required for these activities are usually different than the data layout produced by the application. In most of the large HPC clusters scientific data is stored in parallel file systems instead of locally on the cluster nodes. To increase reliability, the data is replicated, using standard RAID schemes. Parallel file server nodes usually have more processing power than they need, so it is feasible to offload some of the data intensive processing to them. DRepl replaces the standard methods of data replication with replicas having different layouts, optimized for the most commonly used access patterns. Replicas can be complete (i.e. any other replica can be reconstructed from it), or incomplete. DRepl consists of a language to describe the dataset and the necessary data layouts and tools to create a user-space file server that provides and keeps the data consistent and up to date in all optimized layouts. DRepl decouples the data producers and consumers and the data layouts they use from the way the data is stored on the storage system. DRepl has shown up to 2x for cumulative performance when data is accessed using optimized replicas.
机译:直到最近大多数科学应用程序产生的数据被保存,分析和可视化。近年来,随着数据和计算能力的大幅增加,需要应用程序来支持原位的数据访问,或者接近模拟,以提供应用转向,分析和可视化。这些活动所需的数据访问模式通常与应用程序产生的数据布局不同。在大多数大多数大型HPC集群中,科学数据存储在并行文件系统中而不是本地存储在群集节点上。为了提高可靠性,使用标准RAID方案复制数据。并行文件服务器节点通常具有比需要更多的处理能力,因此可以将一些数据密集的处理卸载到它们。 DREPL用具有不同布局的副本替换数据复制的标准方法,针对最常用的访问模式进行了优化。副本可以完整(即,可以从中重建任何其他副本)或不完整。 DREPL由一种语言组成,用于描述数据集和必要的数据布局和工具,以创建提供的用户空间文件服务器,该文件在所有优化的布局中提供并保持数据一致和最新的数据。 DREPL将数据制作人和消费者解耦,以及他们使用的数据布局从数据存储在存储系统上。使用优化的副本访问数据时,DREPL显示为累积性能的2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号