Project

Back to overview

ARM: Advanced High-Performance, Power-Efficient and Application-Specific Resource Management for Big Data Science

English title ARM: Advanced High-Performance, Power-Efficient and Application-Specific Resource Management for Big Data Science
Applicant Gross Thomas
Number 162084
Funding scheme Bilateral programmes
Research institution Institut für Computersysteme ETH Zürich
Institution of higher education ETH Zurich - ETHZ
Main discipline Information Technology
Start/End 01.03.2016 - 31.07.2019
Approved amount 183'837.00
Show all

Keywords (6)

parallel programming; resource management; compilation; application-centric resource management; operating system; runtime systems

Lay Summary (German)

Lead
Die effiziente Verwaltung der Masse der digital verfügbaren Daten stellt grosse Herausforderungen an die Informatik. Einmal wachsen die Datenmengen weiterhin ("BigData") und werden noch weiter wachsen da moderne Informations- und Kommunikationsgeräte die in Netzwerke (insbesondere in das Internet) integriert sind, immer grössere Datenmengen generieren, die gespeichert und verarbeitet werden müssen. Auf der anderen Seite muss der Programmieraufwand und der Energiebedarf der Informatik Infrastruktur beschränkt sein, wenn die Gesellschaft, Verwaltungen, und Firmen die Möglichkeiten der "Big Data" vorteilhaft nutzen wollen. In diesem Projekt entwickeln wir Algorithmen, Abstraktionen, und Konzepte für die Basissoftware, die in einem Datenzentrum eingesetzt werden kann. Wir versuchen insbesondere spezifische Anforderungen von Anwendungen wie Daten Analyse oder Modellierung als Grundlage einer effizienten Lösung zu nutzen.
Lay summary

Ziel und Inhalt des Forschungsprojektes

 Unser übergeordnetes Ziel ist es, Konzepte, Algorithmen, und Abstraktionen für den Ressourcen-effizienten Betrieb von Datenzentren zu entwickeln und im praktischen Einsatz zu untersuchen, um ein besseres Verständnis der Methoden zu gewinnen, die zur Verarbeitung und Speicherung grosser Datenmengen eingesetzt werden können.  Die effiziente Verwaltung dieser Masse der digital verfügbaren Daten ("Big Data") stellt grosse Herausforderungen an die Informatik, denn wenn die Verwaltung dieser Daten nicht effizient ist (d.h., sowohl die Programmierung als auch der Energiebedarf für den Betrieb kann mit vernünftigem Aufwand realisiert werden), dann werden die vielfältigen Vorteile, die "Big Data" in Bereichen wie z.B. personalisierter Medizin, Umwelt Monitoring, und Steuerung von Verkehrsströmen ermöglichen, nicht realisiert werden.  Die Anwendungen, die mit "Big Data" arbeiten, werden alle parallele Rechner nutzen, die in Datenzentren installiert sind.  Unser Ansatz ist es, Eigenschaften der Anwendungen (z.B., Muster  im Zugriff auf Datensätze) als Grundlage von Anpassungen im Betriebssystem dieser Datenzentren zu untersuchen.

 

Wissenschaftlicher und gesellschaftlicher Kontext des Forschungsprojekts

Unsere Arbeit berührt die Schnittstelle zwischen Entwicklung und Betrieb der Datenzentren, die eine immer wichtigere Rolle in der Verarbeitung der Massen an digital verfügbaren Daten spielen. Unsere Software wird Open Source sein, so dass Interessierte sie lesen können und sich überzeugen können, dass Daten nur auf die angekündigte Weise verwendet werden.  Durch die Kollaboration (in einer Koreanisch-Schweizerischen Forschungsgruppe) können wir frühzeitig auf Technologieentwicklungen reagieren, die sowohl beim Erheben als auch bei der Verarbeitung/Speicherung von "Big Data" neue Fragestellungen aufwerfen.

 

 

 

 

 

 

Direct link to Lay Summary Last update: 10.02.2016

Responsible applicant and co-applicants

Employees

Publications

Publication
Understanding Parallelization Tradeoffs for Linear Pipelines
Mastoras Aristeidis, Gross Thomas (2018), Understanding Parallelization Tradeoffs for Linear Pipelines, ACM, New York, N.Y..
Understanding Parallelization Tradeoffs for Linear Pipelines
Mastoras Aristeidis, Gross Thomas R. (2018), Understanding Parallelization Tradeoffs for Linear Pipelines, in the 9th International Workshop, Vienna, AustriaACM, New York, NY.

Collaboration

Group / person Country
Types of collaboration
Purdue University United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
EPFL and USI Switzerland (Europe)
- Research Infrastructure
Seoul National University / ROSAEC Center Korean Republic (South Korea) (Asia)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Research Infrastructure
- Exchange of personnel

Scientific events



Self-organised

Title Date Place
AI inference accelerators in practice 05.04.2018 ETH Zurich, Switzerland
SNU-ETH Workshop 04.07.2016 ETH Zurich, Switzerland

Associated projects

Number Title Start Funding scheme
136225 FAN: Foundations of dynamic program ANalysis 01.04.2012 Sinergia
133835 The Datacenter Observatory 01.10.2012 R'EQUIP

Abstract

The platforms for big data science (modern datacenters with support for analytics and information storage and high-performance computing (HPC) systems with floating-point accelerators) are converging: common to all such systems is that clusters of nodes (multicore multiprocessors) are connected via high-bandwidth low-latency networks. Large-scale systems are expensive to build and to operate so a number of research efforts have targeted increasing the performance and/or improving energy efficiency. This research project aims to exploit the application/operating system interface to improve program execution (using a smaller number of cores, or the same number of cores for a shorter time, may reduce energy consumption or allow handling larger datasets or problems).The starting point of our investigation is the observation that current systems contain too many barriers between software layers. The application developer does not know the details of the target platform (nor should s/he be required to know those). So even for applications that are easy to partition (into some number of work units, e.g., tasks or threads) it is difficult to determine where to allocate the data for each work unit. Nor is it clear what should be the number of work units. If the number is too large, then the application may have to wait a long time until the appropriate hardware resources are available, or the application is executed by a virtualized system (cores or processors are shared, either through software or hardware features like hyper-threading) - but virtualization may annihilate all efforts to obtain data locality. The situation is worse when the operating system’s concerns are included. An operating system wants to improve resource utilization by sharing cores or processors. But such sharing and reclaiming of resources incurs a significant overhead - and this overhead could be avoided in many situations if the operation system had a better understanding of the application’s needs. E.g., reclaiming a core shortly before the end of a parallel region should be avoided. At the same time, an application may benefit from better information from the operating system: if there are not enough cores, an application may run more efficiently if a smaller number of work units is produced.The management of resources (cores, processors, memory, I/O and network bandwidth) remains a central topic of computer science. There is always a tension between the desire to manage resources statically (by the compiler, user hints, or analysis of past executions) and the necessity to manage resources dynamically (by the runtime or operating system, or by the hardware). Static management incurs no runtime overhead, dynamic management lacks the global view afforded by static approaches. As processor technologies and programming language concepts evolve, the balance between static and dynamic approaches must be (re)considered anew. Our research effort investigates how “widening” the interfaces between an application, the compiler and (language) runtime system, and the operating system allows us to increase performance and/or reduce resource consumption. By passing the results of an off-line compiler analysis, or by collecting profile information, an application can assist the operating system in providing better resource management. At the same time, the operating system can provide information on actual resource usage, and interact gracefully with an application-specific scheduler, to improve the efficiency of application execution.The research will be conducted jointly by a team from SNU and by a team from ETH. The two teams have complimentary expertise: the Swiss team has worked on compilation techniques as well as on parallel programming. For example, the Swiss side has made significant contributions to reconcile NUMA features of modern multi-processors with common libraries like TBB and developed TBB-NUMA, the first library that supports composable and portable data-locality optimizations. The Korean team researched techniques for distributed and manycore systems and focused on adaptive process migration, both on distributed and manycore systems, and dynamic and application-specific resource allocation under consideration of a given hardware system for big data systems. The complimentary background of these teams, as well as the common focus on working with real hardware systems, ensure that the research will be both ground breaking and yield practical results. The solutions developed in the context of this project provide the insights and approaches that allow programming and management of future systems for big data science.Looking ahead, we observe that large-scale systems cannot provide a coherent shared data space, so a central problem of resource management (where to schedule a task or thread) will continue to challenge system designers and operators. At the same time, as we move from multicore to manycore processors, the cores’ capabilities diverge: the address space may be partitioned and/or the cost of memory accesses may be different for some of the cores. So it is important to investigate tools and abstractions that assist application developers in designing portable, efficient applications.As science embraces data-driven approaches (in addition to experimentation and simulation) and big data science plays a major role in many disciplines, it is important to use available platforms efficiently and to understand their bottlenecks to guide the design of the next generation of big data science platforms. Big data science crucially depends on advances in computing systems. This effort aims at improving resource utilization by leveraging the interface between the application and the runtime system/operating system. The end result of this collaborative research will consist of tools and techniques for big data science. It will help big data scientists to execute applications more efficiently in terms of performance and/or cost. At the same time, the techniques will also enable providers of the computational platforms to better utilize the hardware resources thereby reducing operational cost.
-