Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

D Cameron; J Elmsheuser; L Heinrich; W Lavrijsen; P Nilsson; V Tsulaia; M Vogel; ATLAS Collaboration

首页> 外文期刊>Journal of Physics: Conference Series >Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

【24h】

Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

机译：利用检查点重启技术优化机会平台上ATLAS生产应用程序的CPU效率

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step and then use the generated checkpoint image for rapid startup of thousands of production jobs. By applying this technique we can bring the initialization time of a job from tens of minutes down to just a few seconds. In addition to that we can leverage container technology for restarting checkpointed applications on the variety of computing platforms, in particular of platforms different from the one on which the checkpoint image was created. We will describe the mechanism of creating checkpoint images of Geant4 simulation jobs with AthenaMP (the multi-process version of the ATLAS data simulation, reconstruction and analysis framework Athena) and the usage of these images for running ATLAS Simulation production jobs on volunteer computing resources (ATLAS@Home) and on Supercomputers.

机译：ATLAS实验的数据处理应用程序（例如事件模拟和重建）在初始化阶段要花费大量时间。此阶段包括加载大量共享库，从外部数据库读取检测器几何形状和条件数据，构建检测器几何形状的瞬态表示并初始化各种算法和服务。在某些情况下，初始化步骤可能需要长达10-15分钟的时间。这种缓慢的初始化对生产作业的总体CPU效率具有重大的负面影响，尤其是当作业是在机会性的，通常是短暂的资源（如商业云或自愿计算）上执行时。为了改善这种情况，我们可以利用以下事实：ATLAS运行大量具有相似配置参数的生产作业（例如，同一生产任务中的作业）。这使我们可以在配置作业结束时检查一个作业，然后使用生成的检查点映像快速启动数千个生产作业。通过应用此技术，我们可以将作业的初始化时间从几十分钟缩短到几秒钟。除此之外，我们还可以利用容器技术在各种计算平台上重新启动检查点应用程序，尤其是与创建检查点映像的平台不同的平台。我们将描述使用AthenaMP（ATLAS数据模拟，重建和分析框架Athena的多进程版本）创建Geant4模拟工作的检查点图像的机制，以及这些图像在志愿计算资源上运行ATLAS Simulation生产工作的用法（ ATLAS @ Home）和超级计算机上。

著录项

来源
《Journal of Physics: Conference Series》 |2018年第3期|共页
作者
D Cameron; J Elmsheuser; L Heinrich; W Lavrijsen; P Nilsson; V Tsulaia; M Vogel; ATLAS Collaboration;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类物理学;
关键词

相似文献

外文文献
中文文献
专利

1. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms [J] . Yash Ukidave, Amir Kavyan Ziabari, Perhaad Mistry, Experimental Mechanics . 2014,第3期

机译：分析异构平台上应用的优化技术和算法设计方法的能效
2. Scaling up ATLAS Event Service to production levels on opportunistic computing platforms [J] . D Benjamin, J Caballero, M Ernst, Journal of Physics: Conference Series . 2016,第1期

机译：在机会计算平台上将ATLAS事件服务扩展到生产级别
3. Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs [J] . Said Issam, Fortin Pierre, Lamotte Jean-Luc, Experimental Mechanics . 2018,第6期

机译：利用加速处理单元进行地震成像：将性能和电源效率与CPU和GPU进行比较
4. A Timing Aware Connectivity Optimization Technique for Improving Energy Efficiency of High-Performance CPUs [C] . Ayan Datta, Karanvir Singh, Arpita Dutta, IEEE Symposium in Low-Power and High-Speed Chipsand Systems . 2021

机译：提高高性能CPU能效的定时意识连接优化技术
5. Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms. [D] . Wu, Jing. 2014

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。
6. Production Efficiency and Market Orientation in Food Crops in North West Ethiopia: Application of Matching Technique for Impact Assessment [O] . Habtamu Yesigat Ayenew -1

机译：埃塞俄比亚西北部粮食作物的生产效率和市场导向：匹配技术在影响评估中的应用
7. Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms [O] . D Cameron, J Elmsheuser, L Heinrich, 2018

机译：利用检查点重启技术，以优化机遇平台上的Atlas生产应用的CPU效率

Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

摘要

著录项

相似文献

相关主题

期刊订阅