首页> 美国政府科技报告 >2009 Fault Tolerance for Extreme-Scale Computing Workshop, Albuquerque, NM - March 19-20, 2009

【24h】

2009 Fault Tolerance for Extreme-Scale Computing Workshop, Albuquerque, NM - March 19-20, 2009

机译：2009年极端规模计算研讨会的容错能力，新墨西哥州阿尔伯克基 - 2009年3月19日至20日

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

著录项

作者
Katz, D. A.; Daly, J.; Kramer, B.; Lathrop, S.; Nystrom, N.; DeBardeleben, M.;
展开▼
作者单位

展开▼
年度 2010
页码 1-25
总页数 25
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Supercomputers ; Errors ; Mathematical methods ; Fault tolerance ; Reliability ; Mitigation ; Computing;

机译：超级计算机;错误;数学方法;容错;可靠性;缓解;计算;

相似文献

外文文献
中文文献
专利

1. Physical and Philosophical Perspectives on Probability,Explanation and Time (Workshop of the ESF Programme "The Philosophy of Science in a European Perspective", Utrecht University, 19-20 October 2009) [J] . Dennis Dieks Journal for general philosophy of science . 2010,第2期

机译：关于概率，解释和时间的物理和哲学观点（ESF计划“欧洲视野中的科学哲学”课程工作坊，乌得勒支大学，2009年10月19日至20日）
2. Workshop 2009—modern techniques in pHPT surgery: an evidence based perspective, European Society of Endocrine Surgeons—19–21 March 2009, Lund, Sweden [J] . Langenbeck's Archives of Surgery . 2009,第2期

机译：2009年工作坊-pHPT手术中的现代技术：基于证据的观点，欧洲内分泌外科医师学会-2009年3月19日至21日，瑞典隆德
3. Autism Speaks: meeting on folate metabolism and Autism spectrum disorders, March 19-20, 2009, Washington, DC. [J] . Kamen BA, Chukoskie L Journal of pediatric hematology/oncology: Official journal of the American Society of Pediatric Hematology/Oncology . 2011,第3期

机译：自闭症的发言：关于叶酸代谢和自闭症谱系障碍的会议，2009年3月19日至20日，华盛顿特区。
4. EEA CONFERENCE EXHIBITION 2009, 19-20 JUNE 2009, CHRISTCHURCH [C] . Andrew Jones Electricity Engineers' Association Conference . 2009

机译：EEA会议和展览2009年，19-20六月六月，基督城
5. The 3 May 2006 (Mw 8.0) and 19 March 2009 (Mw 7.6) Tonga Earthquakes: Intraslab Compressional Faulting Below the Megathrust [D] . Meng, Qingjun. 2015

机译：2006年5月3日（MW 8.0）和2009年3月19日（MW 7.6）汤加地震：Megathrust下面的intraslab压缩断裂
6. Abstracts from Venom Week 2009 June 1–4 2009 Albuquerque NM [O] . Steven A. Seifert 2010

机译：2009年毒液周摘要2009年6月1日至4日新墨西哥州阿尔伯克基
7. How do we convert the transport sector to renewable energy and improve the sector's interplay with the energy system?:Background paper for the workshop on transport - renewable energy in the transport sector and planning, Technical Universityof Denmark, 17-18 March 2009 [O] . Larsen Hans Hvidtfeldt, Kristensen Niels Buus, Sønderberg Petersen Leif, 2009

机译：我们如何将交通运输部门转变为可再生能源并改善该部门与能源系统的相互作用？：运输研讨会的背景文件 - 运输部门和规划中的可再生能源，丹麦技术大学，2009年3月17日至18日

2009 Fault Tolerance for Extreme-Scale Computing Workshop, Albuquerque, NM - March 19-20, 2009

摘要

著录项

相似文献

相关主题

期刊订阅