首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Data Replication in Data Intensive Scientific Applications with Performance Guarantee
【24h】

Data Replication in Data Intensive Scientific Applications with Performance Guarantee

机译:具有性能保证的数据密集型科学应用程序中的数据复制

获取原文
获取原文并翻译 | 示例
           

摘要

Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.
机译:数据复制已在数据密集型科学应用中得到广泛采用,以减少数据文件传输时间和带宽消耗。但是,事实证明,数据网格中的数据复制问题是一项数据密集型应用程序的使能技术,它是NP难题,甚至是不可近似的,这使该问题难以解决。同时,该领域以前的大多数研究要么是没有实际考虑的理论研究,要么是基于启发式的,几乎没有理论性能保证。在本文中,我们提出了一种数据复制算法,该算法不仅具有可证明的理论性能保证,而且可以以分布式和实用的方式实现。具体而言,我们设计了多项式时间集中式复制算法,该算法可将总数据文件访问延迟减少到最佳复制解决方案所减少的总延迟的至少一半。基于此集中式算法,我们还设计了一种分布式缓存算法,该算法可以轻松地在诸如数据网格之类的分布式环境中采用。进行了广泛的仿真,以验证我们提出的算法的效率。使用我们自己的模拟器,我们证明了在不同的网络参数下,集中复制算法的性能与最佳算法和其他直观启发式算法相当。使用流行的分布式Gri​​d模拟器GridSim,我们证明了分布式缓存技术显着优于数据网格中现有的流行文件缓存技术,并且它具有更高的可伸缩性和适应性,可以适应数据网格中文件访问模式的动态变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号