Data and Documentation
Open Data Policy
FAQ
EN
DE
FR
Suchbegriff
Advanced search
Publication
Back to overview
Makar: A Framework for Multi-source Studies based on Unstructured Data
Type of publication
Peer-reviewed
Publikationsform
Proceedings (peer-reviewed)
Author
Birrer Mathias, Rani Pooja, Panichella Sebastiano, Nierstrasz Oscar,
Project
Agile Software Assistance
Show all
Proceedings (peer-reviewed)
Page(s)
577 - 581
Title of proceedings
2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
DOI
10.1109/saner50967.2021.00069
Open Access
URL
http://scg.unibe.ch/archive/papers/Rani21c.pdf
Type of Open Access
Repository (Green Open Access)
Abstract
To perform various development and maintenance tasks, developers frequently seek information on various sources such as mailing lists, Stack Overflow (SO), and Quora. Researchers analyze these sources to understand developer information needs in these tasks. However, extracting and preprocessing unstructured data from various sources, building and maintaining a reusable dataset is often a time-consuming and iterative process. Additionally, the lack of tools for automating this data analysis process complicates the task to reproduce previous results or datasets.To address these concerns we propose Makar, which provides various data extraction and preprocessing methods to support researchers in conducting reproducible multi-source studies. To evaluate Makar, we conduct a case study that analyzes code comment related discussions from SO, Quora, and mailing lists. Our results show that Makar is helpful for preparing reproducible datasets from multiple sources with little effort, and for identifying the relevant data to answer specific research questions in a shorter time compared to state-of-the-art tools, which is of critical importance for studies based on unstructured data. Tool webpage: https://github.com/maethub/makar
-