Zurück zur Übersicht

Understanding Unsuccessful Executions in Big-Data Systems

Publikationsart Peer-reviewed
Publikationsform Tagungsbeitrag (peer-reviewed)
Autor/in Rosà Andrea, Chen Lydia Y., Binder Walter,
Projekt LoadOpt - Workload Characterization and Optimization for Multicore Systems
Alle Daten anzeigen

Tagungsbeitrag (peer-reviewed)

Titel der Proceedings Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Ort Shenzhen, China
DOI 10.1109/CCGrid.2015.138


Big-data applications are being increasingly used in today’s large-scale datacenters for a large variety of purposes, such as solving scientific problems, running enterprise services, and computing data-intensive tasks. Due to the growing scale of these systems and the complexity of running applications, jobs running in big-data systems experience unsuccessful terminations of different nature. While a large body of existing studies sheds light on failures occurred in large-scale datacenters, the current literature overlooks the characteristics and the performance impairment of a broader class of unsuccessful executions which can arise due to application failures, dependency violations, machine constraints, job kills, and task preemption. Nonetheless, deepening our understanding in this field is of paramount importance, as unsuccessful executions can lower user satisfaction, impair reliability, and lead to a high resource waste. In this paper, we describe the problem of unsuccessful executions in big-data systems, and highlight the critical importance of improving our knowledge on this subject. We review the existing literature on this field, discuss its limitations, and present our own contributions to the problem, along with our research plan for the future.