Project overview

The widespread availability of hardware virtualization technologies have accelerated a trend towards Cloud computing, which continues to transform industries and our society. Given stringent performance and availability requirements, applications deployed on cloud environments are typically built as distributed systems comprising multiple tiers of loosely coupled, communicating and geo-distributed clusters of components that adapt to demand. Consistent low latency is critical for many cloud applications yet difficult to achieve due to many complex sources that skew the tail of latency distribution even in well provisioned systems. Execution in multi-tenant environments further exacerbates things as it is known that performance degrades due to contention for shared resources in face of imperfect resource isolation.

The goal of this project is to enable cloud-based applications to achieve more predictable performance. Our main focus is on modern, highly distributed applications that are realized as service-oriented architectures.

Since uncertainty of the environment in the cloud is a fact of life, applications have increasingly begun to implement functionalities such as load balancing, straggler mitigation, and request routing in order to stay performant. These mechanisms are typically implemented as control loops that use performance measurements of the environment as feedback for making a control decision. We study how different mechanisms of different real systems such as Cassandra and Hadoop may interfere with each other "in the wild" and degrade performance. A better understanding of the problem will enable system designers to implement adaptive mechanisms that are more robust to external interference. We use these measurements derived from extensive performance evaluation to define performance and cost models of cloud applications that inform the design of novel mechanisms to improve the systems' robustness to the environment and achieve more predictable performance.

In summary, achieving predictable performance is a major obstacle for many enterprises that are considering to migrate their workloads to the cloud. By addressing this problem, our research will empower organizations to confidently run their applications in cloud environments while enjoying significant reductions of operating costs.

Project members


W. Reda, M. Canini, L. Suresh, D. Kostic and S. Braithwaite. Rein: Taming Tail Latency in Key-Value Stores via Multiget Scheduling. Proceedings of EuroSys'17, Apr 2017.

W. Reda, L. Suresh, M. Canini and S. Braithwaite. BRB: BetteR Batch Scheduling to Reduce Tail Latencies in Cloud Data Stores. Proceedings of SIGCOMM'15 (Poster), Aug 2015.

L. Suresh, M. Canini, S. Schmid and A. Feldmann. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. Proceedings of NSDI'15, May 2015.

L. Suresh and M. Canini. Application Aware Placement and Scheduling for Multi-tenant Clouds. To appear as HPI Technical Report for Spring 2013 Future SOC Lab.

Last updated: Sunday, 26-Feb-2017 10:33:27 | © 2012-2017 Marco Canini, all rights reserved