Failure prediction plays an important role for many tasks such as optimal resource management in large-scale system. However, accurately failure number prediction of repairable large-scale long-running computing (RLLC) is a challenge because of the reparability and large-scale. To address the challenge, a general Bayesian serial revision prediction method based on Bootstrap approach and moving average approach is put forward, which can make an accurately prediction for the failure number. To demonstrate the performance gains of our method, extensive experiments on the data of Los Alamos National Laboratory (LANL) cluster is implemented, which is a typical RLLC system. And experimental results show that the prediction accuracy of our method is 80.2 %, and it is a greatly improvement with 4 % compared with some typical methods. Finally, the managerial implications of the models are discussed.
展开▼