Back to overview

Catching the Response Time Tail in the Cloud

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Author Spicuglia Sebastiano, Björkqvist Mathias, Chen Lydia Y., Binder Walter,
Project LoadOpt - Workload Characterization and Optimization for Multicore Systems
Show all

Proceedings (peer-reviewed)

Title of proceedings 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM-2015)
Place Ottawa, ON, Canada


As modern service systems are pressured to provide competitive prices via cost-effective capacity planning, especially in the paradigm of cloud computing, service level agreements (SLAs) end up becoming ever more sophisticated, i.e., fulfilling targets of different percentiles of response times. However, it is no mean feat to predict even the average response times of real systems, or even abstracted queueing systems that typically simplify system details, and it gets even more complicated when trying to manage SLAs defined by various percentiles of response times. To efficiently capture these different percentiles, we first develop a novel and autonomic methodology - termed Burst Based Simulation, which combines burst profiling on real systems with complex, state-dependent simulations. Moreover, based on our methodology, we construct an analysis on SLA management: the prediction of SLA violations given a certain request pattern. We evaluate our approach on two types of service systems, virtualized and bare-metal, with wide ranges of SLAs and traffic loads. Our evaluation results show that our methodology is able to achieve an average error below 15% when predicting different response time percentiles, and accurately capture SLA violations.