Back to overview

Borders and boundaries in Bosnian, Croatian, Montenegrin and Serbian: Twitter data to the rescue

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Ljubešić Nikola, Miličević Petrović Maja, Samardžić Tanja,
Project Regional Linguistic Data Initiative
Show all

Original article (peer-reviewed)

Journal Journal of Linguistic Geography
Volume (Issue) 6(2)
Page(s) 100 - 124
Title of proceedings Journal of Linguistic Geography
DOI 10.1017/jlg.2018.9


In this paper we deal with the spatial distribution of 16 linguistic features known to vary between Bosnian, Croatian, Montenegrin, and Serbian. We perform our analyses on a dataset of geo-encoded Twitter status messages collected in the period from mid-2013 to the end of 2016. We perform two types of analyses. The first one finds boundaries in the spatial distribution of the linguistic variable levels through the kernel density estimation smoothing technique. These boundaries are then plotted over the state borders for a visual comparison. The second analysis deals with linguistic distance between the states. The groupings of linguistic variables and countries are calculated given the state borders and the Jensen-Shannon divergence between distributions of the 16 variables within each state. This analysis is completed with a measure of variable consistency for each country. These analyses are intended to show the extent to which current state borders correspond to linguistic boundaries. They suggest that Croatia and Serbia still represent the two extremes, reflecting a history of normative divergences, while Bosnia-Herzegovina and Montenegro, depending on the variable, lean to one or the other side.