Project

Back to overview

COLLOC 2: Local Replication Strategies for Global Content Management in Large-Scale Distributed Systems

English title COLLOC 2: Local Replication Strategies for Global Content Management in Large-Scale Distributed Systems
Applicant Garbinato Benoit
Number 120188
Funding scheme Project funding
Research institution Département des systèmes d'information Faculté des Hautes Etudes Commerciales
Institution of higher education University of Lausanne - LA
Main discipline Information Technology
Start/End 01.06.2008 - 30.11.2009
Approved amount 107'786.00
Show all

Keywords (6)

adaptive systems; replication protocols; large-scale distributed systems; peer-to-peer content management; content management; replication strategy

Lay Summary (English)

Lead
Lay summary
This project is the direct follow-up to the SNF project~200021-108191, a joint research effort between the University of Lausanne (Unil) and the University of Lugano (USI).

Peer-to-Peer Content Management Systems (P2P-CMS) aim at supporting the collaborative work of scattered and fluctuating communities, and thus face many challenges. The challenges facing P2P-CMS include (1) finding the right level of consistency to achieve optimal performance, as the need for fast data access often supersedes the need for data consistency, (2) dealing with read and write requests constantly changing origins and frequencies, as the interest in specific pieces of content is continuously shifting, (3) managing the distributed storage of enormous volumes of data, as it is difficult, if not impossible, to store all the data in one single place. A promising approach consists in tapping idle computing power and storage, since most computers in large organizations happen to be underused.

The objective of the Colloc project consists in laying down the basis of a P2P-CMS facing the challenges sketched above. This objective implies for a P2P-CMS to constantly adapt its answers to the following questions: "When to replicate?" (replication condition), "What to replicate?" (replica granularity), "Where to replicate?" (replica location), and "How to replicate?" (replication scheme). More specifically, a P2P-CMS must be able (a) to accommodate dynamic changes in availability of resources when going large-scale, and (b) to dynamically place replicas according to read/write access patterns. Solutions to these problems can then be seen as distinct layers in the P2P-CMS protocol stack.

In the first phase of the project, we mainly focus on Problem (a), by proposing solutions encapsulated in an environment modeling layer and in a probabilistically reliable communication layer. Yet a significant amount of work is still required to complete this research and incidentally to conclude the two PhD theses initiated in the first phase. More precisely, we plan to have one student further explore the impact of locality on the performance and propose solutions to mitigate this impact, while having the other student addressing Problem (b) by taking advantage of the layers built to solve Problem (a). Our approach to achieve these goals is detailed in the rest of this project proposal.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Associated projects

Number Title Start Funding scheme
108191 Colloc: A Replication Engine for Peer-to-Peer Content Management in Large-Scale Distributed Systems 01.06.2005 Project funding

Abstract

This project is the direct follow-up to the SNF project 200021-108191, entitled "Colloc: A Replication Engine for Peer-to-Peer Content Management in Large-Scale Distributed Systems"", a joint research effort between the University of Lausanne (Unil) and the University of Lugano (USI). We refer to this first project as Colloc 1 in the rest of this proposal; we also simply use the term Colloc when discussing the overall approach followed throughout both projects.Colloc 1, which began in 2005 and is planned to end in 2008, was submitted as a 3-year project for two PhD students, but SNF funding was only granted for one PhD student. In spite of this fact, Unil and USI agreed to share the funding and made the effort to complement what was missing in order to support two PhD students, namely Mouna Allani and Marija Stamenkovic, one in each institution. After about 2.5 years, Colloc 1 has gained momentum, with several papers either accepted, submitted or in preparation. The Colloc 2 project described hereafter will complete this research, by capitalizing on the work already invested, and allow the two aforementioned students to complete their PhD theses. The results obtained so far also compelled us to partially redefine the initial project, which accounts for the 2 years we are requesting for the follow-up project described in this proposal.Peer-to-Peer Content Management Systems (P2P-CMS) aim at supporting the collaborative work of scattered and fluctuating communities, and thus face many challenges. Researchers in bioinformatics for instance, which shares and collaboratively maintains information related to DNA and amino acid sequences worldwide, constitute such a highly distributed yet highly focused community. The challenges facing P2P-CMS include (1) finding the right level of consistency to achieve optimal performance, as the need for fast data access often supersedes the need for data consistency, (2) dealing with read and write requests constantly changing origins and frequencies, as the interest in specific pieces of content is continuously shifting, (3) managing the distributed storage of enormous volumes of data, as it is difficult, if not impossible, to store all the data in one single place. A promising approach consists in tapping idle computing power and storage, since most computers in large organizations happen to be underused.The objective of the Colloc 2 project, in line with that of Colloc 1, consists in laying down the basis of peer-to-peer content management systems facing the challenges sketched above. This objective implies for a P2P-CMS to constantly adapt its answers to the following questions: "When to replicate?" (replication condition), "What to replicate?" (replica granularity), "Where to replicate?" (replica location), and "How to replicate?" (replication scheme). More specifically, a P2P-CMS must be able (a) to accommodate dynamic changes in availability of resources when going large-scale, and (b) to dynamically place replicas according to read/write access patterns. Solutions to these problems can then be seen as distinct layers in the P2P-CMS protocol stack.In Colloc 1, we mainly focus on Problem (a), by proposing solutions encapsulated in an environment modeling layer and in a probabilistically reliable communication layer. In particular, we proposed various probabilistic communication strategies aimed at approaching optimal communication in terms of reliability, given a quota of messages, when only limited (local) information about the neighborhood is available. We also proposed various strategies to build up such limited information about the neighborhood in ways that allow the system to scale, i.e., to keep memory and bandwidth usage under strict control. Yet a significant amount of work is still required to complete this research and incidentally to conclude the two PhD theses initiated in Colloc 1. More precisely, we plan to have one student further explore the impact of locality on the performance and propose solutions to mitigate this impact, while having the other student addressing Problem (b) by taking advantage of the layers built to solve Problem (a). Our approach to achieve these goals is detailed in the rest of this project proposal.
-