Project

Back to overview

INSTINCT: Improving Database Interactions in NoSQL Applications

English title INSTINCT: Improving Database Interactions in NoSQL Applications
Applicant Lanza Michele
Number 190113
Funding scheme Project funding (Div. I-III)
Research institution Istituto del Software (SI) Facoltà di scienze informatiche
Institution of higher education Università della Svizzera italiana - USI
Main discipline Information Technology
Start/End 01.09.2020 - 31.08.2024
Approved amount 572'575.00
Show all

Keywords (5)

visual analytics; software maintenance; data engineering; reverse engineering; software evolution

Lay Summary (German)

Lead
NoSQL Datenbanken werden heute vielfach gebraucht. eine prominente Feature ist, dass solche Datenbanken ohne Schema daherkommen, was auf der einen Seite grösseren Frieden bietet, auf der anderen Seite macht das solche Datenbaken schwieriger zu unterhalten. Hauptziel des Projektes ist es zu erforschen wie Entwickler mit solchen Datenbanken interagieren vom source code eines Softwaresystems aus, und ebensolche Interaktionen zu verbessern.
Lay summary

NoSQL Datenbanken werden heute vielfach gebraucht. eine prominente Feature ist, dass solche Datenbanken ohne Schema daherkommen, was auf der einen Seite grösseren Frieden bietet, auf der anderen Seite macht das solche Datenbaken schwieriger zu unterhalten. 
Hauptziel des Projektes ist es zu erforschen wie Entwickler mit solchen Datenbanken interagieren vom source code eines Softwaresystems aus, und ebensolche Interaktionen zu verbessern. 

Direct link to Lay Summary Last update: 15.09.2020

Responsible applicant and co-applicants

Employees

Name Institute

Abstract

NoSQL technologies have co-existed with relational databases since the first Database Management Systems appeared in the 1960s. The term “NoSQL” has only recently gained popularity as modern Big Data and Web 2.0 technologies triggered the need for other database solutions. In particular, they address greater scalability, fulfill a widespread preference for free and open source software, support special query operations that are not well supported in a relational database, and overcome the restrictiveness of strict schemas. Many database systems are considered as NoSQL databases resulting in a retroactive reinterpretation of the term as “Not Only SQL”. Examples of NoSQL databases are key-value stores (e.g., MongoDB, CouchDB), graph databases (e.g., Neo4J) or column family databases (e.g., HBase, Cassandra). According to DB-Engines ranking9, at the time of writing our proposal, half of the Top-10 most popular database management systems are NoSQL technologies, and their popularity is increasing. Many of the world’s largest tech companies are known to use these technologies such as LinkedIn, Amazon, or eBay. This is also the case for big metropolises, Swiss banks, and insurance companies. Despite the clear benefits of NoSQL, it poses new and unique challenges both for developers and researchers. For example, a prominent feature of such databases is that they are “schema-less”, offering greater flexibility to handle data without the limitations of a strict data model. This freedom often strikes back when it comes to the maintenance of an evolving data-intensive application. An advantage of relational databases is that they represent an established technology, and when it comes to maintenance tasks, many tools are readily available. This is not (yet) the case for NoSQL today. The main research goal of our project is to fulfill this gap by examining how developers interact with NoSQL databases from the application code and by developing techniques and tools to help developers improve these interactions. Database interactions play a crucial role in data-centric applications, as they determine how the system communicates with its database(s). When the application sends a query to its database, it is the database’s responsibility to handle the query with its best performance, and the developer has very limited control over it. However, if the query is not well-formed or not handled correctly in the program code, it will generate extra load on the database side what will affect the performance of the application. In the worst case, it can lead to errors, bugs, or even security vulnerabilities such as code injection. This is exactly what we target with our research. We address our goal from multiple directions: (i) we develop a method to identify the interaction points through which an application communicates with its underlying NoSQL database and we extract/recover the dynamically generated NoSQL queries of these locations; (ii) we analyze the extracted/recovered queries to infer (non-unique) database schemas, (iii) we identify frequent/critical antipatterns that can lead to potential vulnerabilities, bugs, or performance issues, and (iv) we develop analytics solutions (e.g., visualization techniques) to ease maintenance and evolution tasks for NoSQL database applications. We aim to achieve scalable, fully automatic analysis of an application intensively interacting with a NoSQL database, and provide developers different ways to improve the code of the application and easily perform related maintenance tasks.
-