Back to overview

Modeling Meaning in Text Data for the Social Sciences

English title Modeling Meaning in Text Data for the Social Sciences
Applicant Edelmann Achim
Number 197533
Funding scheme Project funding (Div. I-III)
Research institution Institut für Soziologie Universität Bern
Institution of higher education University of Berne - BE
Main discipline Sociology
Start/End 01.09.2021 - 31.08.2024
Approved amount 634'779.00
Show all

Keywords (5)

computational social science; sociology of culture; sociology of knowledge; text analysis; network analysis

Lay Summary (German)

Moderne Datenquellen, insbesondere grosse Textdatenbanken, sowie statistische Methoden zu deren Analyse eröffnen neue wissenschaftliche Forschungsmöglichkeiten. In den Sozialwissenschaften werden diese Möglichkeiten jedoch noch nicht vollständig ausgeschöpft. Es fehlt insbesondere an Ansätzen, die auf fachspezifische Fragestellungen und Daten hin angepasst sind, wie beispielsweise zur Untersuchung der Bedeutung von Konzepten in Texten.
Lay summary
In diesem Projekt werden methodische Ansätze zur Erforschung von Bedeutungen in Texten für die Sozialwissenschaften entwickelt. Dies betrifft zwei Herausforderungen: Erstens, die Bedeutung einzelner Konzepte und deren Veränderung. Wissenschaftliche Konzepte, wie zum Beispiel "Kausalität", wurden über die Zeit hinweg unterschiedlich verstanden. Datenbanken ermöglichen heutzutage einen systematischen Zugang zu allen wissenschaftlichen Publikationen der letzten 100 Jahre. In diesem Teil dieses Projektes werden Methoden der automatischen Textverarbeitung entwickelt, um mittels solcher Daten den Wandel einzelner Konzepte nachzuverfolgen und zu erforschen. Die zweite Herausforderung betrifft die Untersuchung von Bedeutungen in Interviews. Bedeutungen werden als sinnhafte Verweisungsstrukturen verstanden, die sich auf der semantischen, linguistischen und narrativen Ebene widerspiegeln. Im zweiten Teil des Projektes werden computergestützte Methoden zur Erfassung derartiger Strukturen entwickelt. In verschiedenen empirischen Anwendungen erforscht das Projekt hierbei Fragen der Geschichte, Politik und Kultur. Durch die entwickelten Analyseansätze und deren Anwendung eröffnet das Projekt letztlich zahlreiche Möglichkeiten, neue Textdaten für die Beantwortung sozialwissenschaftlicher Fragestellungen nutzbar zu machen.
Direct link to Lay Summary Last update: 29.04.2021

Responsible applicant and co-applicants



New forms of data and advanced statistical techniques provide enormous research opportunities, yet the social sciences fall short of taking full advantage of them. A major obstacle is the lack of analytical approaches that are appropriate to both the questions and the data pertinent to the social sciences. This especially concerns the analysis of meaning in text-understood as shared reference structures that are reflected at the semantic, linguistic, and narrative level. In two subprojects described below, this project will develop methodological solutions to two core challenges in the analysis of text for the social sciences-(A) analyzing how conceptual meaning changes within time-stamped documents and (B) capturing distinct forms of meaning in interviews. Empirical applications of these solutions will explore important questions of history, politics, and culture and provide methodological guidance for the use of computational techniques and new forms of data more generally. Beyond immediate results, this project will lay the foundation and open new opportunities for leveraging modern forms of data and tools for answering questions in the social sciences.In subproject A, we will develop machine learning methods tailored to the task of tracing the changing meaning and usage of concepts in large corpora of time-stamped documents. Although this is probably the most common format that massive quantities of textual data are and will be released in the near future (e.g. news feeds, media posts, publications, archival data), we still lack tools to analyze the changing meaning of particular concepts and ideas in such data. We will, therefore, develop a novel framework to trace the meaning of concepts through time. We will apply this framework to (i) a corpus of 1,300,000 academic articles to understand and compare changes in how “causality” has been understood in Philosophy, Economics, and Sociology since the 19th Century and to (ii) about 400,000 news reports to provide a data-driven description of the portrayal of “migrants” in the German public media leading up to the Syrian refugee crisis in 2015/16.In subproject B, we will develop computational solutions to capture and analyze distinct forms of meaning in interview transcripts. Researcher-led interviews have long served as a fundamental source of high-quality data for answering social science questions. Many of these interviews, as well as comparable digital conversations on the new media, are readily available for computational analysis. Despite the enormous potential in this data, we still lack approaches that take advantage of the structures that occur “naturally” in interviews, such as narrative flow and turn-taking, to account for distinct forms of meanings in them. We will build on advances in natural language processing and network analysis to develop techniques that are precisely tailored to capturing different aspects of meanings in both structured and unstructured interviews. We will apply these techniques to (i) unstructured interviews with renowned US chefs to explore different levels of what it “means” to make a career as an elite chef and to (ii) structured interviews from the National Study of Youth and Religion to capture adolescents’ understandings of morality in relation to their socio-economic background. We will provide a detailed description of the methodological challenges and step-by-step documentation of respective solutions. Both of the above subprojects will combine new forms of data from various sources and apply hitherto untested algorithms. Providing methodological guidance and best practices for the combination and usage of data from various sources and in various formats has been identified as one of the most pressing needs for the newly emerging field of Computational Social Science. In both subprojects, we will, therefore, detail related challenges, analytic reflections, and solutions. As contributions to textbooks, this will help to improve teaching students in how to use computational tools to collect, combine and utilize both classic and modern, large-scale forms of text corpora for answering social science questions.This project will advance computational solutions to leverage text as data for the analysis of meaning in the social sciences. In doing so, it will explore important questions of history, politics, and culture and provide methodological guidance for scholars in Computational Social Science.