Project

Back to overview

Data-driven Contemporary Code Review

Applicant Bacchelli Alberto
Number 170529
Funding scheme SNSF Professorships
Research institution Institut für Informatik Universität Zürich
Institution of higher education University of Zurich - ZH
Main discipline Other disciplines of Engineering Sciences
Start/End 01.08.2017 - 31.05.2023
Approved amount 1'390'655.00
Show all

Keywords (7)

software analytics; software metrics; human-centered software engineering; peer code review; mining software repositories; empirical software engineering; code inspection

Lay Summary (Italian)

Lead
La nostra società funziona grazie al software: Nelle transazioni economiche, nella sicurezza negli aeroporti, nei trattamenti sanitari e in molti altri contesti, ci affidiamo a sistemi software. Una grossa porzione dell’affidabilità ed estensibilità di questi sistemi è basata sull’esito di revisioni manuali del codice (peer code review). Tuttavia questa pratica di revisione manuale non è basata su fondamenta scientifiche ed è strettamente legata alle capacità dei singoli ingegneri del software che svolgono il ruolo di revisori. Questo progetto propone di creare le basi scientifiche per comprendere, migliorare e sfruttare al meglio la revisione manuale del codice.
Lay summary

Soggetto e obbiettivo

I ricercatori hanno mostrato che code review ha un grande potenziale nel supportare la qualità e l’affidabilità del software. Tuttavia, la mie ricerche recenti hanno mostrato come questo potenziale non sia raggiunto nella pratica contemporanea e come l’efficacia delle code review sia basata solo sul zelo dei revisori e del loro tempo limitato. Il mio obiettivo è di investigare come trasformare code review in un processo sistematico, preciso, e quantificabile, per sfruttare a pieno il suo potenziale nello sviluppo di software affidabile e di qualità.

Questo progetto si pone l’obiettivo di creare una nuova base scientifica, con fondamenta empiriche, per la code review e di investigare come facilitare i compiti dei revisori attraverso l’automazione, sfruttando la natura software di questo processo. 
 
Contesto socio-scientifico
 
I benefici a lungo termine di questo progetto riguardano una conoscenza scientifica più profonda del processo di revisione del software, maggiore efficienza nella revisione, e lo sviluppo di software con migliore qualità, a beneficio di tutti gli aspetti della nostra società che fanno affidamento su applicazioni software.

Direct link to Lay Summary Last update: 07.07.2017

Responsible applicant and co-applicants

Employees

Publications

Publication
The indolent lambdification of Java: Understanding the support for lambda expressions in the Java ecosystem
Petrulio Fernando, Sawant Anand Ashok, Bacchelli Alberto (2021), The indolent lambdification of Java: Understanding the support for lambda expressions in the Java ecosystem, in Empirical Software Engineering, 26(6), 134-134.
Why Don’t Developers Detect Improper Input Validation? '; DROP TABLE Papers; --
Braz Larissa, Fregnan Enrico, Calikli Gul, Bacchelli Alberto (2021), Why Don’t Developers Detect Improper Input Validation? '; DROP TABLE Papers; --, in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, ESIEEE, Washington, DC, United States.
Do Explicit Review Strategies Improve Code Review Performance?
Gonçalves Pavlína Wurzel, Fregnan Enrico, Baum Tobias, Schneider Kurt, Bacchelli Alberto (2020), Do Explicit Review Strategies Improve Code Review Performance?, in MSR '20: 17th International Conference on Mining Software Repositories, Seoul Republic of KoreaACM, New York, NY, United States.
Investigating Severity Thresholds for Test Smells
Spadini Davide, Schvarcbacher Martin, Oprescu Ana-Maria, Bruntink Magiel, Bacchelli Alberto (2020), Investigating Severity Thresholds for Test Smells, in MSR '20: 17th International Conference on Mining Software Repositories, Seoul Republic of KoreaACM, New York, NY, United States.
Primers or reminders? The effects of existing review comments on code review
Spadini Davide, Çalikli Gül, Bacchelli Alberto (2020), Primers or reminders? The effects of existing review comments on code review, in ICSE '20: 42nd International Conference on Software Engineering, Seoul South KoreaACM, New York, NY, United States.
UI Dark Patterns and Where to Find ThemA Study on Mobile Applications and User Perception
Di Geronimo Linda, Braz Larissa, Fregnan Enrico, Palomba Fabio, Bacchelli Alberto (2020), UI Dark Patterns and Where to Find ThemA Study on Mobile Applications and User Perception, in CHI '20: CHI Conference on Human Factors in Computing Systems, Honolulu HI USAAssociation for Computing Machinery, New York, NY, USA.
On the performance of method-level bug prediction: A negative result
Pascarella Luca, Palomba Fabio, Bacchelli Alberto (2020), On the performance of method-level bug prediction: A negative result, in Journal of Systems and Software, 161, 110493-110493.
Understanding Flaky Tests: The Developer's Perspective
Eck Moritz, Palomba Fabio, Castelluccio Marco, Bacchelli Alberto (2019), Understanding Flaky Tests: The Developer's Perspective, in 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software, Tallin, Estonia830-840, ACM, New York, NY, USA830-840.
Classifying code comments in Java software systems
Pascarella Luca, Bruntink Magiel, Bacchelli Alberto (2019), Classifying code comments in Java software systems, in Empirical Software Engineering, 24(3), 1499-1537.
A Survey on Software Engineering Coupling Relations and Tools
Fregnan Enrico, Baum Tobias, Palomba Fabio, Bacchelli Alberto (2019), A Survey on Software Engineering Coupling Relations and Tools, in Information and Software Technology, 107, 159-178.
A Large-Scale Empirical Exploration on Refactoring Activities in Open Source Software Projects
Vassallo Carmine, Grano Giovanni, Palomba Fabio, Gall Harald, Bacchelli Alberto (2019), A Large-Scale Empirical Exploration on Refactoring Activities in Open Source Software Projects, in Science of Computer Programming, 180, 1-15.
Associating Working Memory Capacity and Code Change Ordering with Code Review Performance
Baum Tobias, Schneider Kurt, Bacchelli Alberto (2019), Associating Working Memory Capacity and Code Change Ordering with Code Review Performance, in Empirical Software Engineering, 24(4), 1762-1798.
Characterizing Women (Not) Contributing To Open-Source
Wurzelova Pavlina, Palomba Fabio, Bacchelli Alberto (2019), Characterizing Women (Not) Contributing To Open-Source, in 2nd Workshop on Gender Equality in Software Engineering, 5-8, IEEE Press, Piscataway, NJ, USA5-8.
Does reviewer recommendation help developers?
Kovalenko Vladimir, Tintarev Nava, Pasynkov Evgeny, Bird Christian, Bacchelli Alberto (2019), Does reviewer recommendation help developers?, in IEEE Transactions on Software Engineering.
Fine-Grained Just-In-Time Defect Prediction
Pascarella Luca, Palomba Fabio, Bacchelli Alberto (2019), Fine-Grained Just-In-Time Defect Prediction, in Journal of Systems and Software, 150, 22-36.
Mock objects for testing Java systems: Why and how developers use them, and how they evolve
Spadini Davide, Aniche Mauricio, Bruntink Magiel, Bacchelli Alberto (2019), Mock objects for testing Java systems: Why and how developers use them, and how they evolve, in Empirical Software Engineering, 24(3), 1461-1498.
On the Effectiveness of Manual and Automatic Unit Test Generation: Ten Years Later
Serra Domenico, Grano Giovanni, Palomba Fabio, Ferrucci Filomena, Gall Harald C., Bacchelli Alberto (2019), On the Effectiveness of Manual and Automatic Unit Test Generation: Ten Years Later, in 16th International Conference on Mining Software Repositories, 121-125, IEEE Press, Piscataway, NJ, USA121-125.
PathMiner: A Library for Mining of Path-Based Representations of Code
Kovalenko Vladimir, Bogomolov Egor, Bryksin Timofey, Bacchelli Alberto (2019), PathMiner: A Library for Mining of Path-Based Representations of Code, in 16th International Conference on Mining Software Repositories, 13-17, IEEE Press, Piscataway, NJ, USA13-17.
Test-Driven Code Review: An Empirical Study
Spadini Davide, Palomba Fabio, Baum Tobias, Hanenberg Stefan, Bruntink Magiel, Bacchelli Alberto (2019), Test-Driven Code Review: An Empirical Study, in 41st ACM/IEEE International Conference on Software Engineering, 1061-1072, IEEE Press, Piscataway, NJ, USA1061-1072.
To react, or not to react: Patterns of reaction to API deprecation
Sawant Anand, Robbes Romain, Bacchelli Alberto (2019), To react, or not to react: Patterns of reaction to API deprecation, in Empirical Software Engineering.
When Code Completion Fails: A Case Study On Real-World Completions
Hellendoorn Vincent, Proksch Sebastian, Gall Harald C., Bacchelli Alberto (2019), When Code Completion Fails: A Case Study On Real-World Completions, in 41st ACM/IEEE International Conference on Software Engineering, 960-970, IEEE Press, Piscataway, NJ, USA960-970.
What Makes A Code Change Easier To Review? An Empirical Investigation On Code Change Reviewability
Ram Achyudh, SawantAnand, CastelluccioMarco, BacchelliAlberto (2018), What Makes A Code Change Easier To Review? An Empirical Investigation On Code Change Reviewability, in 26th ACM Joint European SE Conference and Symposium on the Foundations of SE, ACM, New York, NY, USA.
Information Needs in Contemporary Code Review
Pascarella Luca, Spadini Davide, Palomba Fabio, Bruntink Magiel, Bacchelli Alberto (2018), Information Needs in Contemporary Code Review, in 21st ACM conference on Computer-Supported Cooperative Work and Social Computing, 2(CSCW), 135:1-135:27.
PyDriller: Python framework for mining software repositories
Spadini Davide, Aniche Maurício, Bacchelli Alberto (2018), PyDriller: Python framework for mining software repositories, in ESEC/FSE '18: 26th ACM Joint European Software Engineering Conference and Symposium on the Foundatio, Lake Buena Vista FL USAACM, New York.
On the Relation of Test Smells to Software Code Quality
Spadini Davide, Palomba Fabio, Zaidman Andy, Bruntink Magiel, Bacchelli Alberto (2018), On the Relation of Test Smells to Software Code Quality, in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), MadridIEEE, Washington, DC, United States.
Why are Features Deprecated? An Investigation Into the Motivation Behind Deprecation
Sawant Anand Ashok, Huang Guangzhe, Vilen Gabriel, Stojkovski Stefan, Bacchelli Alberto (2018), Why are Features Deprecated? An Investigation Into the Motivation Behind Deprecation, in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), MadridIEEE, Washington, DC, United States.
A graph-based dataset of commit history of real-world Android apps
Geiger Franz-Xaver, Malavolta Ivano, Pascarella Luca, Palomba Fabio, Di Nucci Dario, Bacchelli Alberto (2018), A graph-based dataset of commit history of real-world Android apps, in the 15th International Conference, Gothenburg, SwedenACM, New York, NY, USA.
Code review for newcomers: is it different?
Kovalenko Vladimir, Bacchelli Alberto (2018), Code review for newcomers: is it different?, in the 11th International Workshop, Gothenburg, SwedenACM, New York, NY, USA.
How is video game development different from software development in open source?
Pascarella Luca, Palomba Fabio, Di Penta Massimiliano, Bacchelli Alberto (2018), How is video game development different from software development in open source?, in the 15th International Conference, Gothenburg, SwedenACM, New York, NY, USA.
Modern code review: a case study at Google
Sadowski Caitlin, Söderberg Emma, Church Luke, Sipko Michal, Bacchelli Alberto (2018), Modern code review: a case study at Google, in the 40th International Conference, Gothenburg, SwedenACM, New York, NY, USA.
Understanding developers' needs on deprecation as a language feature
Sawant Anand, Aniche Maurício, van Deursen Arie, Bacchelli Alberto (2018), Understanding developers' needs on deprecation as a language feature, in 40th ACM/IEEE International Conference on Software Engineering, ACM, New York, NY, USA.
Investigating type declaration mismatches in Python
Pascarella Luca, Ram Achyudh, Nadeem Azqa, Bisesser Dinesh, Knyazev Norman, Bacchelli Alberto (2018), Investigating type declaration mismatches in Python, in 2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), CampobassoIEEE, Piscataway, NJ, USA.
Re-evaluating method-level bug prediction
Pascarella Luca, Palomba Fabio, Bacchelli Alberto (2018), Re-evaluating method-level bug prediction, in 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), CampobassoIEEE, Piscataway, NJ.
Continuous Code Quality: Are We (Really) Doing That?
Vassallo Carmine, Palomba Fabio, Bacchelli Alberto, Gall Harald C. (2018), Continuous Code Quality: Are We (Really) Doing That?, in 33rd IEEE/ACM International Conference on Automated Software Engineering, New Ideas Track, 790-795, ACM, New York, NY, USA790-795.
Mining file histories: should we consider branches?
Kovalenko Vladimir, Palomba Fabio, Bacchelli Alberto (2018), Mining file histories: should we consider branches?, in the 33rd ACM/IEEE International Conference, Montpellier, FranceACM, New York, NY, USA.
Self-reported activities of Android developers
Pascarella Luca, Geiger Franz-Xaver, Palomba Fabio, Di Nucci Dario, Malavolta Ivano, Bacchelli Alberto (2018), Self-reported activities of Android developers, in the 5th International Conference, Gothenburg, SwedenACM, New York, NY, USA.
When testing meets code review: why and how developers review tests
Spadini Davide, Aniche Maurício, Storey Margaret-Anne, Bruntink Magiel, Bacchelli Alberto (2018), When testing meets code review: why and how developers review tests, in the 40th International Conference, Gothenburg, SwedenACM, New York, NY, USA.
On the Optimal Order of Reading Source Code Changes for Review
Baum Tobias, Schneider Kurt, Bacchelli Alberto (2017), On the Optimal Order of Reading Source Code Changes for Review, in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), ShanghaiIEEE, Piscataway, New Jersey.

Datasets

Dataset of "Test-Driven Code Review: An Empirical Study"

Author Spadini, Davide; Palomba, Fabio; Baum, Tobias; Hanenberg, Stefan; Bruntink, Magiel; Bacchelli, Alberto
Publication date 30.01.2019
Persistent Identifier (PID) https://doi.org/10.5281/zenodo.2553139
Repository Zenodo
Abstract
The content of this replication package is the following: 1 - Interview TDR: script of the interview 2 - regression.R: R script to run the statistical model 3 - studied-project: an anonymized snapshot of the system the experiment is based on 4 - ./extracted-data: it contains a CSV file with all the participants reviews details 5 - Patches: it contains the 2 patches of the experiment. The file contains the version before and after the change. To be displayed in a "git-diff" version, they can be pasted on the website "mergerly". 6 - raw-logs: it contains all the logs we collected on the experiment 7 - TestDrivenReviewAnalysis: tool we have used to analyse the results 8 - TestDrivenReviewExperimentUI: tool we have used to run the experiment

Benchmark Data and Model Description for "When code completion fails: a case study on real-world completions"

Author Hellendoorn, Vincent; Proksch, Sebastian; Gall, Harald; Bacchelli, Alberto
Publication date 14.03.2019
Persistent Identifier (PID) https://doi.org/10.5281/zenodo.2562249
Repository Zenodo
Abstract
This release includes all the processed and compressed benchmark data, as well as a description of (most of) the models used. It can be used for evaluating other code completion tools and comparing their results against those reported in our paper, as well as against those obtained on artificial data. This release only contains descriptions for running the models (we mainly used off-the-shelf implementations); in a subsequent release, we will publish replication scripts for running the models as well.

Continuous Code Quality: Are We (Really) Doing That? Online Appendix

Author Vassallo, Carmine; Palomba, Fabio; Bacchelli, Alberto; Gall, Harald
Publication date 24.07.2018
Persistent Identifier (PID) http://doi.org/10.5281/zenodo.1341015
Repository Zenodo
Abstract
Online appendix for paper "Continuous Code Quality: Are We (Really) Doing That?" by Carmine Vassallo, Fabio Palomba, Alberto Bacchelli, Harald C. Gall. The paper will appear in the proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Montpellier, France, 2018.

Dataset for "What Makes A Code Change Easier To Review: An Empirical Investigation On Code Change Reviewability"

Author Ram, Achyudh; Sawant, Anand; Castelluccio, Marco; Bacchelli, Alberto
Publication date 04.11.2018
Persistent Identifier (PID) https://doi.org/10.5281/zenodo.1323659
Repository Zenodo
Abstract
This is the data and material for the paper: "What Makes A Code Change Easier To Review: An Empirical Investigation On Code Change Reviewability." The paper has been accepted for inclusion in the proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018).

Data and Materials for "Why Don’t Developers Detect Improper Input Validation?'; DROP TABLE Papers; --"

Author Braz, Larissa; Fregnan, Enrico; Çalikli, Gül; Bacchelli, Alberto
Publication date 15.01.2021
Persistent Identifier (PID) https://doi.org/10.5281/zenodo.3996696
Repository zenodo
Abstract
Artifacts Package of the accepted ICSE 21 paper: "Why don’t Developers Detect Improper Input Validation? '; DROP TABLE Papers; --".

Dataset of "Primers or Reminders? The Effects of Existing Review Comments on Code Review"

Author Spadini, Davide; Gül, Çalikli; Bacchelli, Alberto
Publication date 20.01.2020
Persistent Identifier (PID) 10.5281/zenodo.3653856
Repository zenodo
Abstract
Dataset of "Primers or Reminders? The Effects of Existing Review Comments on Code Review".

Collaboration

Group / person Country
Types of collaboration
RMoD Team - INRIA Lille France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
Software Improvement Group Netherlands (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Industry/business/other use-inspired collaboration
Fachgebiet Software Engineering Germany (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Google United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Industry/business/other use-inspired collaboration
JetBrains Research Netherlands (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Industry/business/other use-inspired collaboration
Software Engineering Research Group - Delft University of Technology Netherlands (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
DECAL Lab - University of California, Davis United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
CHISEL Group - University of Victoria, Canada Canada (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
Mozilla Foundation Great Britain and Northern Ireland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Industry/business/other use-inspired collaboration
Microsoft Research, Redmond United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Industry/business/other use-inspired collaboration
s.e.a.l. - University of Zurich Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
17th International Conference on Mining Software Repositories Talk given at a conference Do Explicit Review Strategies Improve Code Review Performance? 29.06.2020 Seoul (online), Korean Republic (South Korea) Fregnan Enrico; Wurzelova Pavlina;
CHI '20: CHI Conference on Human Factors in Computing Systems Talk given at a conference UI Dark Patterns and Where to Find Them - A Study on Mobile Applications and User Perception 25.04.2020 Honolulu (online), United States of America Di Geronimo Linda;
ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering Talk given at a conference Understanding Flaky Tests: The Developer's Perspective 26.08.2019 Tallin, Estonia Palomba Fabio;
IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE) Talk given at a conference Characterizing Women (Not) Contributing To Open-Source 27.05.2019 Montreal, Canada Wurzelova Pavlina;


Knowledge transfer events

Active participation

Title Type of contribution Date Place Persons involved
Contemporary Code Review Practices Talk 22.07.2019 Shenzhen, China Bacchelli Alberto;
What Do Code Reviews at Microsoft and in OSS Have in Common? Talk 16.05.2018 Zurich, Switzerland Bacchelli Alberto;


Awards

Title Year
ACM SIGSOFT Distinguished Paper Award 2021
ACM SIGSOFT Distinguished Artifact Award 2020
Ric Holt Early Career Achievement Award 2020
Best Paper Honorable Mention (CSCW 2018): “Information Needs in Contemporary Code Review” 2018

Use-inspired outputs

Associated projects

Number Title Start Funding scheme
197227 Enhanced Code Review: Using Context and Learning from Review Experience 01.02.2021 Project funding (Div. I-III)

Abstract

A significant portion of the dependability and maintainability of today's software systems is based on the outcome of software peer code review, i.e., a manual inspection of source code by developers other than the author.The academic software engineering community conducted seminal work in the 80s and 90s on code inspection, a formal type of code review. The community identified inspections as a key element to ensure software quality and reliability. Due to their rigidity and low efficiency, however, code inspections progressively lost their ground to informal, asynchronous, tool-based, and change-driven code review practices, which are nowadays widespread in practice. Recent studies, including our work, provided evidence that contemporary code review practices-despite their recognized potential-do not conform to the previous theories formed on code inspections as it creates a new context and challenges. The academic community highlighted the need to develop both an original body of fundamental scientific knowledge on contemporary code review, as opposed to code inspections, and novel techniques to support the practitioners using this process.To fulfill this need, this project aims to develop an original, empirically-based body of fundamental scientific knowledge on contemporary code review and investigate how to support and facilitate the review task through automation, by taking advantage of the tool-based nature of this process. The expected long term benefits of this research are a deeper scientific understanding of contemporary code review, improved review effectiveness, and better quality software.To this end, I propose to investigate two angles: (1) the Laws of Contemporary Code Review, which will describe the review process and its effects, and (1) a Theory of Data-driven Support for Contemporary Code Review, which will investigate how data can be used to support the review process. Our investigation will benefit from the idea of augmenting the software tools used during code review and software development with mechanisms to systematically collect high-dimensional data with high-frequency. As opposed to the current research, which is based on data that is sparsely and not systematically collected (e.g., code changes manually saved by developers), this approach leads to quantitative and qualitative differences in data that will allow us, together with the proposed in-depth investigation approach, to obtain-for the first time-a rigorous and comprehensive understanding and to discover and use new properties and behaviors.The scientific challenges of the proposed work include devising techniques to collect a high-volume and high-dimensional knowledge base of software development and review, modeling the review process and outcome, filtering and exploiting high-volume information relevant to review, creating conceptual models for the hypothesis to test, overcoming low recruitability in controlled experiments, devising experimental prototypes for data-based support for code review to be used for in vivo evaluation, and assessing causal effects within in vivo setting.
-