Close this search box.

Campos J.; Costa E.

2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)


Due to the complexity of modern software, identifying every fault before deployment is extremely difficult or even not possible. Such residual faults can ultimately lead to failures, often incurring considerable risks or costs. Online Failure Prediction (OFP) is a fault-tolerance technique that attempts to predict the occurrence of failures in the near future and thus prevent/mitigate their consequences. Combined with recent technological developments, Machine Learning (ML) has been successfully used to create predictive models for OFP. However, as failures are rare events, failure data are often not available for building accurate models. Although fault injection has been accepted as a viable solution to generate realistic failure data, fault injectors are difficult to implement/update and thus research on Operating System (OS)-level OFP has become stale, with most works using data from outdated OSs. In this paper, we conduct a comprehensive fault injection campaign on an up-to-date Linux kernel and thoroughly study its behavior in the presence of faults. We then transform the data to explore and assess the predictive performance of various ML techniques for OFP. Finally, we study the influence of different OFP parameters (i.e., lead-time, prediction-window) and compare the results with existing related work. Results suggest that the various failures observed can be grouped into categories that can then be accurately predicted and distinguished by diverse ML models.