首页> 外文会议>IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale >A Comprehensive Informative Metric for Analyzing HPC System Status Using the LogSCAN Platform
【24h】

A Comprehensive Informative Metric for Analyzing HPC System Status Using the LogSCAN Platform

机译:使用LogSCAN平台分析HPC系统状态的综合信息量度

获取原文

摘要

Log processing by Spark and Cassandra-based ANalytics (LogSCAN) is a newly developed analytical platform that provides flexible and scalable data gathering, transformation and computation. One major challenge is to effectively summarize the status of a complex computer system, such as the Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). Although there is plenty of operational and maintenance information collected and stored in real time, which may yield insights about short- and long-term system status, it is difficult to present this information in a comprehensive form. In this work, we present system information entropy (SIE), a newly developed metric that leverages the powers of traditional machine learning techniques and information theory. By compressing the multivariant multi-dimensional event information recorded during the operation of the targeted system into a single time series of SIE, we demonstrate that the historical system status can be sensitively represented concisely and comprehensively. Given a sharp indicator as SIE, we argue that follow-up analytics based on SIE will reveal in-depth knowledge about system status using other sophisticated approaches, such as pattern recognition in the temporal domain or causality analysis incorporating extra independent metrics of the system.
机译:基于Spark和Cassandra的ANalytics(LogSCAN)进行的日志处理是新开发的分析平台,可提供灵活且可扩展的数据收集,转换和计算。一个主要的挑战是如何有效地总结复杂计算机系统的状态,例如,橡树岭领导力计算设施(OLCF)的Titan超级计算机。尽管有大量实时收集和存储的操作和维护信息,这可能会产生有关短期和长期系统状态的见解,但很难以全面的形式显示此信息。在这项工作中,我们提出系统信息熵(SIE),这是一种新开发的度量标准,它利用了传统机器学习技术和信息理论的力量。通过将目标系统运行过程中记录的多维多维事件信息压缩到SIE的单个时间序列中,我们证明了历史系统的状态可以简洁,全面地灵敏地表示出来。给定一个作为SIE的敏锐指标,我们认为基于SIE的后续分析将使用其他复杂的方法来揭示有关系统状态的深入知识,例如时域中的模式识别或包含系统额外独立指标的因果关系分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号