Project

Back to overview

Building Flexible Large-Graph Processing Systems on Commodity Hardware

Applicant Zwaenepoel Willy
Number 167157
Funding scheme NRP 75 Big Data
Research institution Laboratoire de systèmes d'exploitation EPFL - IC - IIF - LABOS
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.01.2017 - 30.06.2020
Approved amount 547'904.00
Show all

Keywords (1)

Graph processing, commodity systems, out-of-core

Lay Summary (German)

Lead
Die heutigen Verfahren zur Aufbereitung der in Netzwerken enthaltenen Informationen sind teuer und nicht sehr leistungsfähig. Mit der Entwicklung neuer Rechenplattformen könnte sich das ändern: Die Verfahren würden effizienter und stünden mehr Menschen zur Verfügung.
Lay summary

In der Regel werden Berechnungen für Netzwerke durchgeführt, die sich während der gesamten Rechenoperation nicht verändern. Kontinuierlich neu hinzukommende Knotenpunkte werden dann erst beim Start der nächsten Rechenoperation berücksichtigt. Somit hinkt das Rechenergebnis der Realität ständig hinterher.

Dieses Projekt soll dagegen neue Knotenpunkte in die bereits laufende Berechnung einbeziehen und, so dass sie sich praktisch in Echtzeit auf das Ergebnis auswirken. Erhöht sich etwa aufgrund eines Unfalls die Fahrtzeit auf einem bestimmten Autobahnabschnitt, kann dieser Umstand direkt berücksichtigt und rasch eine Ausweichstrecke vorgeschlagen werden. Die Rechenoperationen sollen nicht nur für Supercomputer, sondern auch für herkömmliche Server verfügbar gemacht werden.

Netzwerkbasierte Berechnungen extrahieren Informationen aus den zwischen den Einheiten eines Netzwerks bestehenden Verbindungen. Welche Werbung ein Facebook-Nutzer angezeigt bekommt, wird durch die Gesamtheit seiner Kontakte innerhalb dieses sozialen Netzwerks bestimmt. Diese Thematik wird seit Beginn des IT-Zeitalters untersucht und erlebt seit dem Aufkommen von Big-Data-Netzwerken eine Renaissance. Allerdings werden zur Auswertung dieser Datenmengen neue Rechenplattformen benötigt.

Ziel dieses Projekts ist die Entwicklung einer flexiblen Plattform für Hochleistungsberechnungen in Big-Data-Netzwerken. Die Plattform soll insbesondere die sogenannten dynamischen Netzwerke unterstützen, deren Struktur sich während der Rechenoperationen ändern kann, und sie soll auf herkömmlichen Rechnern laufen können.

Durch den Einsatz von kostengünstigen Plattformen, die auf ganz normalen Computern laufen, werden dynamische Berechnungen in Big-Data-Netzwerken demokratisiert und in Zukunft für eine grössere Anzahl von Laboren oder Unternehmen erschwinglich.


Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (French)

Lead
Rendre intelligible l’information contenue dans les réseaux est actuellement un processus coûteux et relativement peu performant. Le développement de nouvelles plateformes de calcul aurait pour avantage d’améliorer les performances du processus et de le rendre accessible au plus grand nombre
Lay summary

Le calcul se fait habituellement avec un réseau dont l’état ne varie pas durant l’entier du processus. Mais de nouvelles connexions apparaissent constamment et ces dernières ne sont prises en compte que lors de l’initiation d’un nouveau calcul. Elles ne sont ainsi reflétées que tardivement dans le résultat.

Ce projet vise à inclure de nouvelles connexions lors du processus de calcul. Leur effet devient ainsi visible quasi immédiatement : si, par exemple, la durée de voyage sur un tronçon d’autoroute augmente suite à un accident, on peut immédiatement en tenir compte et indiquer une voie alternative plus rapide. Les chercheurs travaillent également à rendre ce mode de calcul possible non seulement sur de coûteux super-ordinateurs mais aussi sur des serveurs conventionnels.

Le calcul sur les réseaux cherche à extraire de l’intelligence à partir des connexions qu’établissent entre elles les entités d’un réseau. Le choix des annonces publicitaires présentées, par exemple, à un utilisateur de Facebook se base sur l’ensemble de ses connexions au sein de ce réseau. Sujet de recherche dès les premiers jours de l’informatique, ce domaine connait une renaissance depuis l’arrivée de réseaux de grande taille, dont l’exploitation des données nécessite de nouvelles plateformes de calcul.

Ce projet vise à développer une plateforme flexible pour le calcul à haute performance sur les réseaux de grande taille. Elle soutiendra, en particulier, des réseaux dits dynamiques, dont la structure peut évoluer lors du processus de calcul et fonctionnera, de plus, sur des ordinateurs conventionnels.

La démocratisation du calcul dynamique sur les réseaux de très grandes tailles, par le biais d’une plateforme fonctionnant sur des ordinateurs conventionnels, rend, de par son coût réduit, ce mode de calcul accessible à un plus grand nombre de laboratoires ou de sociétés.

Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (English)

Lead
Making sense of the information contained in networks is currently an expensive and relatively inefficient process. The development of new computing platforms would have the advantage of improving the process’s performance and making it accessible to as many people as possible.
Lay summary

Computing is usually performed with a network whose state does not vary during the entire process. Yet new connections are constantly appearing, and these are only taken into account when a new computing process is launched. So there is a delay before they are reflected in the result. This project aims to include new connections in currently running computing processes. Their effect therefore becomes apparent almost immediately: if, for example, the journey time on a section of motorway increases following an accident, this can be taken into account immediately and the system can suggest a faster alternative route. Furthermore, the researchers are working to make this mode of computing available on conventional servers and not only on expensive supercomputers.

Network computing seeks to retrieve information using the connections established between the entities in a network. The selection of advertisements presented to Facebook users, for example, is based on their total connections within this network. Having been the subject of research from the earliest days of information technology, this field has been experiencing a revival since the advent of large networks. But new computing platforms are needed to exploit their data.

The aim of this project is to develop a flexible platform for high-performance computing on large networks. In particular, it will support what are known as dynamic networks, whose structure can evolve during the computing process, and will also function on conventional computers.

If dynamic computing on very large networks becomes available to more people by means of a platform that works on conventional computers, it will be more affordable and accessible to more laboratories and companies.


Direct link to Lay Summary Last update: 26.07.2017

Responsible applicant and co-applicants

Employees

Publications

Publication
Fewer Cores, More Hertz: Leveraging High-Fequency Cores in the OS Scheduler for Improved Application Performance
GouicemRedha, CarverDamien, LoziJean-Pierre, SopenaJulien, LepersBaptiste, ZwaenepoelWilly, PalixNicolas, LawallJulia, MullerGilles (2020), Fewer Cores, More Hertz: Leveraging High-Fequency Cores in the OS Scheduler for Improved Application Performance, in Proceedings ATC 2020, Usenix ATC 2020, USA.
Provable Multicore Schedulers with Ipanema: Application to Work Conservation
Lepers Baptiste, Gouicem Redha, Carver Damien, Lozi Jean-Pierre, Palix Nicolas, Apont Maria-Virginia, Zwaenepoel Willy, Sopena Julien, Lawall Julia, Muller Gilles (2020), Provable Multicore Schedulers with Ipanema: Application to Work Conservation, in EuroSys 2020, EuroSys 2020, Heraklion, Crete, Greece.
Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases
Bindschaedler Laurent, Goel Ashvin, Zwaenepoel Willy (2020), Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases, in ASPLOS 2020, ASPLOS 2020 25th ACM International Conference on Architectural Support for Programming Languages and, Lausanne, Switzerland.
Kvell: The Design and Implementation of a Fast Persistent Key-Value Store
LepersBaptiste, BalmauOana, GuptaKaran, ZwaenepoelWilly (2019), Kvell: The Design and Implementation of a Fast Persistent Key-Value Store, in Proceedings SOSP, SOSP 2019 - The 27th ACM Symposium on Operating Systems Principles, Huntsville, Ontario, Canada.
The Battle of the Schedulers: FreeBSD ULE vs Linux CFS
BouronJustinien, ChevalleySébastien, LepersBaptiste, ZwaenepoelWilly (2018), The Battle of the Schedulers: FreeBSD ULE vs Linux CFS, in Proceedings Usenix 2018, 1.
Rock You like a Hurricane: Taming Skew in Large Scale Analytics
Laurent Bindschaedler Jasmina Malicevic Nicolas Schiper Ashvin Goel and Willy Zwaenepoel (2018), Rock You like a Hurricane: Taming Skew in Large Scale Analytics, in Eurosys 2018, Eurosys 2018, Eurosys 2018.
Everything you always wanted to know about multicore graph processing but were afraid to ask
Jasmina Malicevic Baptiste Lepers and Willy Zwaenepoel (2017), Everything you always wanted to know about multicore graph processing but were afraid to ask, in USENIX ATC'17, Conference USENIX ATC'17, Santa Clara, California, USA.

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
EuroSys 2020 Talk given at a conference Provable Multicore Schedulers with Ipanema: Application to Work Conservation 27.04.2020 Heraklion, Greece Lepers Baptiste;
ASPLOS 2020 Talk given at a conference Hailstorm: Disaggregated Compute and Storage Deployment for LSM-Based Key-Value Stores 16.03.2020 Lausanne, Switzerland Bindschaedler Laurent;
SOSP 2019 The 27th ACM Symposium on Operating Systems Principles Talk given at a conference KVell: The Design and Implementation of a Fast Persistent Key-Value Store 27.10.2019 Huntsville, Ontario, Canada Lepers Baptiste;
WOS7: 7th INRIA/Technicolor Workshop On Scalable computing Individual talk Everything You Always Wanted to Know about Multicore Graph Processing but Were Afraid to Ask 30.11.2017 Rennes, France Malicevic Jasmina;
Usenix ATC'17 Talk given at a conference Everything You Always Wanted to Know about Multicore Graph Processing but Were Afraid to Ask 13.07.2017 Santa Clara, United States of America Malicevic Jasmina;
Seminar Individual talk Analytics on Graphs with Trillions of Edges 15.06.2017 Kyunghee, Korean Republic (South Korea) Zwaenepoel Willy;
Seminar Individual talk Analytics on Graphs with Trillions of Edges 12.05.2017 Crete, Greece Zwaenepoel Willy;


Associated projects

Number Title Start Funding scheme
195136 Fast, Scalable Graph Pattern Mining on Evolving Graphs 01.09.2020 Early Postdoc.Mobility

Abstract

Analytics over large graphs is attracting increasing attention. Part of the reason for this upsurge of interest is the great variety of information that is naturally encoded as graphs. Large graphs are obviously present in social networks, but they also occur naturally in many other applications, for example, in biology, forensics, or logistics. Graph processing poses an interesting systems challenge: graph algorithms tend to exhibit little locality, making it difficult to build platforms that exhibit good performance. Much progress has been made in recent years in building such systems on platforms ranging from supercomputers over large clusters to single (multicore) machines, using either in-memory or out-of-core approaches. Many of these first-generation systems are, however, rather inflexible, restricting users to a particular environment and computation on static graphs.Based on our earlier work on out-of-core graph processing systems, we propose to advance the state of the art in graph processing by building systems that gracefully scale between memory and storage and that are capable of dealing with dynamic graphs. In addition, we intend to further optimize out-of-core performance, both in terms of performance and capacity.
-