Back to overview

Makar: A Framework for Multi-source Studies based on Unstructured Data

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Author Birrer Mathias, Rani Pooja, Panichella Sebastiano, Nierstrasz Oscar,
Project Agile Software Assistance
Show all

Proceedings (peer-reviewed)

Page(s) 577 - 581
Title of proceedings 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
DOI 10.1109/saner50967.2021.00069

Open Access

Type of Open Access Repository (Green Open Access)


To perform various development and maintenance tasks, developers frequently seek information on various sources such as mailing lists, Stack Overflow (SO), and Quora. Researchers analyze these sources to understand developer information needs in these tasks. However, extracting and preprocessing unstructured data from various sources, building and maintaining a reusable dataset is often a time-consuming and iterative process. Additionally, the lack of tools for automating this data analysis process complicates the task to reproduce previous results or datasets.To address these concerns we propose Makar, which provides various data extraction and preprocessing methods to support researchers in conducting reproducible multi-source studies. To evaluate Makar, we conduct a case study that analyzes code comment related discussions from SO, Quora, and mailing lists. Our results show that Makar is helpful for preparing reproducible datasets from multiple sources with little effort, and for identifying the relevant data to answer specific research questions in a shorter time compared to state-of-the-art tools, which is of critical importance for studies based on unstructured data. Tool webpage: