Projekt

Zurück zur Übersicht

The Datacenter Observatory

Titel Englisch The Datacenter Observatory
Gesuchsteller/in Zwaenepoel Willy
Nummer 133835
Förderungsinstrument R'EQUIP
Forschungseinrichtung Laboratoire de systèmes d'exploitation EPFL - IC - IIF - LABOS
Hochschule EPF Lausanne - EPFL
Hauptdisziplin Informatik
Beginn/Ende 01.10.2012 - 30.09.2013
Bewilligter Betrag 300'000.00
Alle Daten anzeigen

Lay Summary (Englisch)

Lead
Lay summary

The Datacenter Observatory is an experimental research facility for cloud computing research. While relatively small on the scale of industrial datacenters, the system nonetheless allows cloud computing experiments on a scale that is relatively unique in the academic world and that is relevant to industrial practice. It contains 7 racks hosting 178 2-processor servers with 8-core processors, for a total of 2848 processing cores. The system further contains 22.3TB of memory, 408TB of local disk storage (composed of 52TB SSD and 356TB SATA disks), and a storage array of 72TB of disks, for an aggregate of around 0.5PB of storage. The various components will be connected by two 96-port 10 Gigabit Ethernet network switches.

For various reasons, including economies of scale, datacenters in current use are very large, and this will continue and even increase to be the case. Many of the interesting research problems are therefore concerned with scalability: how to build the systems that can scale to this size of installation. Very few academic researchers have access to this kind of facility. Many researchers use smaller clusters in their labs, and these can serve as prototyping facilities, but they need access to a large datacenter for experimentation and measurement. They cannot use commercially available cloud computing services or datacenters, because they want to experiment with the hardware, the network, the operating system, and all other levels of the software stack. They could not possibly hope to do experimentation at such low levels of the system without upsetting production use.

Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Publikationen

Publikation
GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks
Du Jiaqing, Iorgulescu Calin, Roy Amitabha, Zwaenepoel Willy (2014), GentleRain: Cheap and Scalable Causal Consistency with Physical Clocks, in Proceedings of the ACM Symposium on Cloud Computing , ACM, 2014.
Scale-up Graph Processing in the Cloud: Challenges and Solutions
Malicevic Jasmina, Roy Amitabha, Zwaenepoel Willy (2014), Scale-up Graph Processing in the Cloud: Challenges and Solutions, in Proceedings of the Fourth International Workshop on Cloud Data and Platforms , CloudDP’14: Fourth International Workshop on Cloud Data and Platforms, Amsterdam, Netherlands, April.
Closing The Performance Gap between Causal Consistency and Eventual Consistency
Du Jiaqing, Iorgulescu Calin, Roy Amitabha, Zwaenepoel Willy (2014), Closing The Performance Gap between Causal Consistency and Eventual Consistency, in 1st Workshop on Principles and Practice of Eventual Consistency (PaPEC 2014), Amsterdam, The Netherlands, 13 April, 2014.
X-Stream: Edge-centric Graph Processing using Streaming Partitions
Roy Amitabha, Mihailovic Ivo, Zwaenepoel Willy (2013), X-Stream: Edge-centric Graph Processing using Streaming Partitions, in Proceedings of the 24th ACM Symposium on Operating Systems Principles , The 24th ACM Symposium on Operating Systems Principles, Farmington, Pennsylvania, USA, November 3-6.
Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks
Du Jiaqing, Elnikety Sameh, Roy Amitabha, Zwaenepoel Willy (2013), Orbe: Scalable Causal Consistency Using Dependency Matrices and Physical Clocks, in 2013 ACM Symposium on Cloud Computing (SOCC), Santa Clara, California, USA, October 1-3, 2013.
Clock-SI: Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks
Du Jiaqing, Elnikety Sameh, Zwaenepoel Willy (2013), Clock-SI: Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks, in 2013 IEEE 32nd International Symposium on Reliable Distributed Systems (SRDS), Braga, Portugal, October 1-3, 2013.
Profiling Software for Energy Consumption
Schubert Simon, Kostic Dejan, Zwaenepoel Willy, Shin Kang (2012), Profiling Software for Energy Consumption, in Proceedings of the IEEE International Conference on Green Computing and Communications (GreenCom) , The IEEE International Conference on Green Computing and Communications (GreenCom), November 2012.
Building global and scalable systems with Atomic Multicast
Samuel Benz Parisa Jalili Marandi Fernando Pedone Benoît Garbinato (2014), Building global and scalable systems with Atomic Multicast, in In Proc. of the 15th International Middleware Conference (Middleware 2014)., Bordeaux, France, December 2014.
Scalable State-Machine Replication
C. E. Bezerra F. Pedone and R. van Renesse (2014), Scalable State-Machine Replication, in In Proc. of the 44th International Conference on Dependable Systems and Networks (DSN 2014), Atlanta, USA, June 2014.
From A to E: Analyzing TPC’s OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored
Tözün Pinar, Pandis Ippokratis, Kaynak Ilknur Cansu, Jevdic Dorde, Ailamaki Anastasia (2013), From A to E: Analyzing TPC’s OLTP Benchmarks -- The obsolete, the ubiquitous, the unexplored, in Proceedings of the 16th International Conference on Extending Database Technology, p. 17-28 , 16th International Conference on Extending Database Technology, Genoa, Italy, March 18-22, 2013.
OLTP in Wonderland -- Where do cache misses come from in major OLTP components?
Tözün Pinar, Gold Brian, Ailamaki Anastasia (2013), OLTP in Wonderland -- Where do cache misses come from in major OLTP components?, in Proceedings of the 9th International Workshop on Data Management on New Hardware, p. 8:1--8:6 , 9th International Workshop on Data Management on New Hardware, New York, New York, USA, June 24, 201.
ADDICT: Advanced Instruction Chasing for Transactions
Tözün Pinar, Atta Islam, Ailamaki Anastasia, Moshovos Andreas (2014), ADDICT: Advanced Instruction Chasing for Transactions, in Proceedings of the 40th International Conference on Very Large Databases, vol. 7, num. 14 , 41st International Conference on Very Large Databases, Waikoloa, Hawaii, USA, August 31 - September.
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
Atta Islam, Tözün Pinar, Ailamaki Anastasia, Moshovos Andreas (2012), SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads, in Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture , The 45th Annual IEEE/ACM International Symposium on Microarchitecture, Vancouver, BC, Canada, Decemb.
STREX: Boosting Instruction Cache Reuse in OLTP Workloads Through Stratified Transaction Execution
Atta Islam, Tözün Pinar, Tong Xin, Ailamaki Anastasia, Moshovos Andreas (2013), STREX: Boosting Instruction Cache Reuse in OLTP Workloads Through Stratified Transaction Execution, in Proceedings of the 40th International Symposium on Computer Architecture , 40th International Symposium on Computer Architecture, Tel-Aviv, Israel, June 23-27, 2013.
OLTP on Hardware Islands
Porobic Danica, Pandis Ippokratis, Branco Miguel, Tözün Pinar, Ailamaki Anastasia (2012), OLTP on Hardware Islands, in Proceedings of the VLDB Endowment (PVLDB), vol. 5, num. 11, p. 1447-1458 , 38th International Conference on Very Large Databases, Istanbul, Turkey, August 27 - 31, 2012.
ATraPos: Adaptive Transaction Processing on Hardware Islands
Porobic Danica, Liarou Erietta, Tözün Pinar, Ailamaki Anastasia (2014), ATraPos: Adaptive Transaction Processing on Hardware Islands, in Proceedings of the 30th IEEE International Conference on Data Engineering , 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, March 31 - Apr 4, 2014.
A Semi-Analytical Thermal Modeling Framework for Liquid-Cooled ICs
Sridhar Arvind, Aly Mohamed Mostafa Sabry, Atienza Alonso David (2014), A Semi-Analytical Thermal Modeling Framework for Liquid-Cooled ICs, in IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (ISSN: 0278-0070), vo, Piscataway: Institute of Electrical and Electronics Engineers, 2014.
Elastic online analytical processing on RAMCloud
Christian TinnefeldDonald KossmannMartin GrundJoos-Hendrik BoeseFrank RenkesVishal SikkaH (2013), Elastic online analytical processing on RAMCloud, in EDBT '13 Proceedings of the 16th International Conference on Extending Database Technology .
Workload Optimization using SharedDB
Giorgos Giannikis Darko Makreshanski Gustavo Alonso Donald Kossmann (2013), Workload Optimization using SharedDB, in Proceedings of the {ACM} {SIGMOD} International Conference on Management of Data (SIGMOD), New York, NY, USA, June 22-27, 2013.
A generic database benchmarking service
Martin Kaufmann Peter M. Fischery Donald Kossmann Norman May (2013), A generic database benchmarking service, in 29th IEEE International Conference on Data Engineering, {ICDE}, Brisbane Australia.
Global Fan Speed Control Considering Non-Ideal Temperature Measurements in Enterprise Servers
Kim Jungsoo, Aly Mohamed Mostafa Sabry, Atienza Alonso David, Vaidyanathan Kalyan, Gross Kenny (2014), Global Fan Speed Control Considering Non-Ideal Temperature Measurements in Enterprise Servers, in Proceedings of the IEEE/ACM 2014 Design Automation and Test in Europe (DATE) Conference, vol. 1, num, IEEE/ACM 2014 Design Automation and Test in Europe (DATE) Conference, Dresden, Germany, March 24-28.
The Tests-versus-Proofs Conundrum
George Candea (2014), The Tests-versus-Proofs Conundrum, in Ieee Security & Privacy , Vol. 12((ISSN: 154), p. 65-p. 68.
Execution Synthesis: A Technique for Automating the Debugging of Software
Zamfir Cristian (2013), Execution Synthesis: A Technique for Automating the Debugging of Software, EPFL, Lausanne.
Making Automated Testing of Cloud Applications an Integral Component of PaaS
Bucur Stefan, Kinder Johannes, Candea George (2013), Making Automated Testing of Cloud Applications an Integral Component of PaaS, in 4th Asia-Pacific Workshop on Systems, Singapore, July 29-30, 2013 .
A Case for Specialized Processors for Scale-Out Workloads
Ferdman Michael, AdilehAlmutaz, KocberberOnur, VolosStavros, AlisafaeeMohammad, Jevdjic Djordje (2014), A Case for Specialized Processors for Scale-Out Workloads, in IEEE Micro Top Picks, Vol. 34(Num. 3), 31-42.
Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
Ferdman Michael, Adileh Almutaz, Kocberber Onur, Volos Stavros, Alisafaee Mohammad, Jevdjic Dj (2012), Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware, in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Lan, Seventeenth International Conference on Architectural Support for Programming Languages and Operatin.
(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads
Zoltan Majo Thomas R. Gross (2013), (Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads, in Proceedings of IISWC '13.
Matching Memory Access Patterns and Data Placement for NUMA Systems
Zoltan Majo Thomas R. Gross (2012), Matching Memory Access Patterns and Data Placement for NUMA Systems, in International Symposium on Code Generation and Optimization (CGO),.
A Template Library to Integrate Thread Scheduling and Locality Management for NUMA Multiprocessors
Zoltan Majo Thomas R. Gross (2012), A Template Library to Integrate Thread Scheduling and Locality Management for NUMA Multiprocessors, in 4th USENIX Workshop on Hot Topics in Parallelism (HotPar).
On Limitations of Network Acceleration
Animesh Trivedi Bernard Metzler Patrick Stuedi Thomas R. Gross (2013), On Limitations of Network Acceleration, in CoNEXT'13: 9th ACM International Conference on emerging Networking EXperiments and Technologies, ACM.
Unified High-Performance I/O: One Stack to Rule Them All
Animesh Trivedi Patrick Stuedi Bernard Metzler Roman Pletka Blake G. Fitch Thomas R. Gross (2013), Unified High-Performance I/O: One Stack to Rule Them All, in Proceeding HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems .
High availability, elasticity, and strong consistency for massively parallel scans over relational data
Philipp Unterbrunner Gustavo Alonso Philipp Unterbrunner · Gustavo Alonso · (2013), High availability, elasticity, and strong consistency for massively parallel scans over relational data, in The VLDB Journal, s00778-013.
Parallel join executions in RAMCloud
Christian Tinnefeld Donald Kossmann Joos-Hendrik Boese Hasso Plattner (2014), Parallel join executions in RAMCloud, in Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference, Chicago, IL.
A Library for Portable and Composable Data Locality Optimizations for NUMA Systems
Zoltan Majo Thomas R. Gross (accepted), A Library for Portable and Composable Data Locality Optimizations for NUMA Systems, in Proc. 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '15).
Shared Workload Optimization
Georgios Giannikis Darko Makreshanski Gustavo Alonso and Donald Kossmann (accepted), Shared Workload Optimization, in VLDB 2014.
S²E: A Platform for In-Vivo Multi-Path Analysis of Software Systems
Chipounov Vitaly (accepted), S²E: A Platform for In-Vivo Multi-Path Analysis of Software Systems, EPFL, Lausanne.
Modeling of Two-Phase Evaporative Heat Transfer in Three-Dimensional Multicavity High Performance Microprocessor Chip Stacks
Madhour Yassir, D'Entremont Brian P., Marcinichen Jackson Braz, Michel Bruno, Thome John Richard (accepted), Modeling of Two-Phase Evaporative Heat Transfer in Three-Dimensional Multicavity High Performance Microprocessor Chip Stacks, in Journal Of Electronic Packaging (ISSN: 1043-7398), Vol. 136( num. 2 ).
Passive Thermosyphon Cooling System for High Heat
S. Szczukiewicz N. Lamaison J. B. Marcinichen J. R. Thome (accepted), Passive Thermosyphon Cooling System for High Heat, in In final preparation for journal submission.

Auszeichnungen

Titel Jahr
IEEE CEDA Early Career Award 2013
Eurosys Jochen Liedtke Young Researcher Award 2014

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
153560 Fundamentals of Parallel Programming for Platform-as-a-Service Clouds 01.11.2014 Projektförderung (Abt. I-III)
162084 ARM: Advanced High-Performance, Power-Efficient and Application-Specific Resource Management for Big Data Science 01.03.2016 Südkorea
136225 FAN: Foundations of dynamic program ANalysis 01.04.2012 Sinergia

Abstract

Information technology is undergoing a major paradigm shift. In the future, most or all information will be housed in datacenters, and no longer in user’s desktops or corporate servers. Computing will be co-located with information. High-performance low-latency networks will make this information and computation available to end users, anytime anywhere. Users will access this information through a variety of devices, mostly mobile ones, with powerful display capabilities and high-bandwidth, low-latency network access. This vision of the future, called “cloud computing”, will lead to tremendous efficiency gains. Gone will be the need for large, energy-hungry in-house computing and storage facilities with armies of system administrators. All this functionality will be transparently provided by the cloud, leaving the end users to concentrate on their core business. All PIs in this proposal are active in cloud computing research. For our research to be competitive and for our results to be influential in industry we need access to a datacenter of on the order of a few thousand processors and a petabyte of storage. For various reasons, including economies of scale, datacenters in current use are very large, and this will continue and even increase to be the case. Many of the interesting research problems are therefore concerned with scalability: how to build the systems that can scale to this size of installation. Currently, very few academic researchers and none of us have access to this kind of facility. Many of us use smaller clusters in our labs, and these can serve as prototyping facilities, but we need access to a large datacenter for experimentation and measurement. Individually, we could not justify such a facility for our research groups, but collectively we can make extremely good use of a datacenter of this scale. We cannot use commercially available cloud computing services or datacenters. We want to experiment with the hardware, the network, the operating system, and all other levels of the software stack. We could not possibly hope to do experimentation at such low levels of the system without upsetting production use. We propose to acquire 8 racks hosting 180 2-processor servers with 4-core processors, for a total of 1440 processing cores. The system further contains 2.1TB of memory, 180TB of local disk storage, a storage array of 368TB of disks, and 800GB of FLASH for an aggregate of over 0.5PB of storage. The various components will be connected by eight 48-port network switches. The ratio of processing to memory and storage capacity - one core to 1.5GB of memory and 300GB of disk - is representative of existing enterprise datacenter installations and strikes a balance between speed, capacity and power. While relatively small on the scale of industrial datacenters, the system will nonetheless allow us to carry out cloud computing experiments on a scale that is relevant to industrial practice. The equipment will support our research programs in energy-efficient computing, storage and enterprise systems, networks, fault tolerance, program instrumentation, and testing.