Projekt

Zurück zur Übersicht

Polypheny-DB: Cost- and Workload-aware Adaptive Data Management

Titel Englisch Polypheny-DB: Cost- and Workload-aware Adaptive Data Management
Gesuchsteller/in Schuldt Heiko
Nummer 172763
Förderungsinstrument Projektförderung (Abt. I-III)
Forschungseinrichtung Fachbereich Informatik Departement Mathematik und Informatik Universität Basel
Hochschule Universität Basel - BS
Hauptdisziplin Informatik
Beginn/Ende 01.05.2017 - 30.04.2021
Bewilligter Betrag 464'422.00
Alle Daten anzeigen

Keywords (8)

Data Partitioning, Dynamic adaptation, Data storage, Data replication, Data access, Cost-based data management, Data distribution, Cloud data management

Lay Summary (Deutsch)

Lead
Traditionell werden Datenbanksysteme unabhängig von den Anwendungen entwickelt, in denen sie eingesetzt werden. Mit den Anforderungen, die neuartige Anwendungen an die Datenverwaltung stellen, passt dieser "one-size-fits-all" Ansatz jedoch nicht mehr. Entweder benötigt man eine Vielzahl an spezialisierten Datenbanksystemen, die auf die jeweiligen Bedürfnisse zugeschnitten sind, oder man löst das Problem mit einem neuartigen Ansatz, bei dem sich die Datenverwaltung dynamisch an sich verändernde Anforderungen anpasst. Letzteres ist die Grundidee des Polypheny-DB-Projekts.
Lay summary

Unser Ziel ist es, die Grundlagen für die Entwicklung neuartiger, sich dynamisch an verändernde Anforderungen anpassende Datenbanksysteme zu legen, solche Systeme zu implementieren und zu testen. Aufbauend auf einer Abschätzung des erwarteten Zugriffsmusters wollen wir ein Kostenmodell entwickeln, das diese dynamischen Anpassungen ermöglicht. Dabei wollen wir zwei Bereiche genauer untersuchen: i.) Datenspeicherung und –zugriff, und ii.) Datenverteilung. Die Datenspeicherung untersucht, wie unterschiedliche Speichertechnologien, Datenmodelle und Datenbankarchitekturen verwendet werden können. Die Datenverteilung untersucht die dynamische Datenpartitionierung, Replikation, sowie eine Mischform aus beiden Ansätzen.

Diese Arbeiten sind gerade im Kontext von Cloud Computing höchst relevant. Dort geht es darum, dass Cloud-Anbieter mit begrenzten Ressourcen ein breites Spektrum an Kunden und damit heterogenen Kundenanforderungen so gut wie möglich bedienen können.

Direktlink auf Lay Summary Letzte Aktualisierung: 04.04.2017

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
150061 ClouDMan: Cost-based Data Management in Cloud Environments 01.11.2013 Projektförderung (Abt. I-III)

Abstract

In the last few years, it has become obvious that the “one-size-fits-all” paradigm, according to which database systems have been designed for several decades, has come to an end. The reason is that in the very broad spectrum of applications, ranging from business over science to the private life of individuals, demands are more and more heterogeneous. As a consequence, the data to be considered significantly differs in many ways, for example from immutable data to data that is frequently updated; from highly structured data to unstructured data; from applications that need precise data to applications that are fine with approximate and/or outdated results; from applications which demand always consistent data to applications that are fine with lower levels of consistency. Even worse, many applications feature heterogeneous data and/or workloads, i.e., they intrinsically come with data (sub-collections) with different requirements for data storage, access, consistency, etc. that have to be dealt with in the same system. This development can either be coped with a large zoo of specialized systems (for each subset of data with different properties), or by a new type of flexible database system that automatically adapts to the –potentially dynamically changing– needs and characteristics of applications. Such behaviour is especially important in the Cloud where several applications need to be hosted on a shared infrastructure, in order to make economic use of the available resources. In addition, the infrastructure costs that incur in a Cloud are made transparent in a very fine-grained way. The Polypheny-DB project will address these challenges by dynamically optimizing the data management layer, by taking into account the resources needed and an estimation of the expected workload of applications. The core will be a comprehensive cost model that seamlessly addresses these criteria and that will be used, in two subprojects, to optimize i.) data storage and access, and ii.) data distribution. Each of the two subprojects will be addressed by one PhD student. Data Storage and Access: different physical storage media such as spinning discs, flash storage, or main memory come with different properties and performance guarantees for data access, but also significantly differ in terms of the costs for the necessary hardware and the volume of data that can be managed at a certain price. In addition, the access characteristics of applications (e.g., read-only, mostly reads, balanced read/write ratio, high update frequencies) may favour one solution over the other or even require hybrid solutions. The same also applies to the data model to be used and the choice between structured data, semi-structured data, or schema-less data management. Similarly, applications might require data to be approximated to speed up queries or to be compressed to save storage space. In Polypheny-DB, we will jointly address these choices and alternatives by devising a comprehensive cost model for data storage that allows to decide, if necessary at the level of sub-collections, how they can be best stored given an available budget and the requirements for data access. The costs for data storage and access will be continuously monitored, together with an analysis of the expected workload of applications and their storage strategy will be updated dynamically, if necessary. Data Distribution: a high degree of data availability necessitates that data is replicated across different sites in a distributed system. For read-only applications, this also allows to balance the (read) load across sites. However, in the presence of updates, additional costs for distributed transactions incur to make replicas consistent – unless weaker consistency models can be tolerated. In our previous work, we have designed, implemented, and evaluated a protocol that manages data in a distributed environment either by dynamically replicating data and managing consistency based on the costs that incur and the available budget or by partitioning data, in particular by co-locating data items that are frequently accessed jointly to minimize the number of distributed transactions. While both approaches are very effective, none of them alone can address dynamic workloads. In Polypheny-DB, we will develop a comprehensive cost model that seamlessly combines replication and partitioning of (subsets of) data. Again, the cost model will be used to dynamically re-assess replication and/or partitioning decisions and to adapt data distribution from a cost-effectiveness point of view.