Data Replication in Data Intensive Scientific Applications with Performance Guarantee

Nukarapu Dharma; Tang Bin; Wang Liqiang; Lu Shiyong

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Data Replication in Data Intensive Scientific Applications with Performance Guarantee

【24h】

Data Replication in Data Intensive Scientific Applications with Performance Guarantee

机译：具有性能保证的数据密集型科学应用程序中的数据复制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.

机译：数据复制已在数据密集型科学应用中得到广泛采用，以减少数据文件传输时间和带宽消耗。但是，事实证明，数据网格中的数据复制问题是一项数据密集型应用程序的使能技术，它是NP难题，甚至是不可近似的，这使该问题难以解决。同时，该领域以前的大多数研究要么是没有实际考虑的理论研究，要么是基于启发式的，几乎没有理论性能保证。在本文中，我们提出了一种数据复制算法，该算法不仅具有可证明的理论性能保证，而且可以以分布式和实用的方式实现。具体而言，我们设计了多项式时间集中式复制算法，该算法可将总数据文件访问延迟减少到最佳复制解决方案所减少的总延迟的至少一半。基于此集中式算法，我们还设计了一种分布式缓存算法，该算法可以轻松地在诸如数据网格之类的分布式环境中采用。进行了广泛的仿真，以验证我们提出的算法的效率。使用我们自己的模拟器，我们证明了在不同的网络参数下，集中复制算法的性能与最佳算法和其他直观启发式算法相当。使用流行的分布式Grid模拟器GridSim，我们证明了分布式缓存技术显着优于数据网格中现有的流行文件缓存技术，并且它具有更高的可伸缩性和适应性，可以适应数据网格中文件访问模式的动态变化。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2011年第8期|p.1299-1306|共8页
作者
Nukarapu Dharma; Tang Bin; Wang Liqiang; Lu Shiyong;
展开▼
作者单位

Wichita State University, Wichita;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data Grids; Data intensive applications; algorithm design and analysis; data replication; simulations.;

机译：数据网格;数据密集型应用程序;算法设计和分析;数据复制;模拟。;

相似文献

外文文献
中文文献
专利

1. Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers [J] . Jinghui Zhang, Jian Chen, Junzhou Luo, Tsinghua Science and Technology . 2016,第5期

机译：地理分布科学数据中心中用于数据密集型应用程序的高效位置感知数据放置
2. Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers [J] . Jinghui Zhang, Jian Chen, Junzhou Luo, 清华大学学报（英文版） . 2016,第005期

机译：地理分布科学数据中心中用于数据密集型应用程序的有效位置感知数据放置
3. A data replication strategy with tenant performance and provider economic profit guarantees in Cloud data centers [J] . Mokadem Riad, Hameurlain Abdelkader The Journal of Systems and Software . 2020,第Jana期

机译：云数据中心中具有租户性能和提供商经济利润保证的数据复制策略
4. FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems [C] . Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, IEEE International Congress on Big Data . 2014

机译：FusionFS：致力于在超大规模高性能计算系统上支持数据密集型科学应用
5. Specification, configuration and execution of data-intensive scientific applications. [D] . Kumar, Vijay S. 2010

机译：规范，配置和执行数据密集型科学应用程序。
6. Strategies of data layout and cache writing for input-output optimization in high performance scientific computing: Applications to the forward electrocardiographic problem [O] . Louie Cardone-Noott, Blanca Rodriguez, Alfonso Bueno-Orovio 2012

机译：高性能科学计算中输入输出优化的数据布局和高速缓存写入策略：应用于正向心电图问题
7. Data replication in data intensive scientific applications with performance guarantee [O] . Dharma Teja Nukarapu, Student Member, Bin Tang, 2012

机译：具有性能保证的数据密集型科学应用中的数据复制

Data Replication in Data Intensive Scientific Applications with Performance Guarantee

摘要

著录项

相似文献

相关主题

期刊订阅