Back to overview

Media Monitoring of the Past

Applicant Kaplan Frédéric
Number 173719
Funding scheme Sinergia
Research institution Laboratoire d'humanités digitales EPFL CDH CDH-CH DHLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Interdisciplinary
Start/End 01.09.2017 - 31.12.2020
Approved amount 1'770'709.00
Show all

All Disciplines (4)

Information Technology
Applied linguistics
General history (without pre-and early history)

Keywords (15)

visualization; mutual learning; multilingual natural language processing; Switzerland; information extraction; access to cultural heritage documents; interface design; computational linguistics; digital history; historical methodology; historical newspapers; digital literacy; diachronic lexical processing; named entity processing; co-design

Lay Summary (French)

Les journaux historiques sont des miroirs de nos sociétés passées. Publiés régulièrement pendant des siècles, ils ont enregistré les guerres comme les faits divers, ont rendu compte des questions internationales, nationales et locales, et documenté la vie au jour le jour. Ils ont, en un mot, suivi la grande et la petite Histoire. Également reflets des environnements politiques, économiques et moraux dans lesquels ils ont été produits, les archives de presse sont ainsi porteuses d’une information multiple, dense et continue dans le temps propre à nous aider à comprendre comment les contemporains ont vécu leur présent.
Lay summary

Contenu et objectifs du travail de recherche

Le projet « Media Monitoring of the Past » poursuivra trois objectifs. Premièrement, le développement et l'évaluation systématique de plusieurs composants de traitement automatique du langage pour l’extraction d’information à partir de textes historiques. Deuxièmement, la réalisation d’interfaces de visualisation pour l'exploration transparente de vastes quantités de données historiques. Enfin, l'évaluation active et continue des outils produits avec un cas d’étude historique - la résistance à l'Europe -, complété d’une réflexion sur l'utilisation des outils numériques dans les sciences historiques avec la considération des aspects méthodologiques, épistémologiques et pédagogiques.

Contexte scientifique et social du projet de recherche

Conservés depuis des décennies sur les rayons de bibliothèques et d'archives, les archives de journaux font actuellement l’objet d’importantes campagnes de numérisation. Si cela représente une avancée majeure en termes de préservation et d'accès aux documents, de nombreux défis restent à relever en vue de fournir un accès élaboré au contenu de ces ressources numériques.

Soutenu par un réseau de partenaires associés mettant en vedette des bibliothèques, des archives, des éditeurs de journaux et des historiens, un consortium composé de linguistes informaticiens, d’humanistes et historiens numériques, et de designers travaillera à relever ces défis et à faire progresser la recherche dans les domaines émergents de l’histoire numérique et des humanités digitales.

Direct link to Lay Summary Last update: 17.07.2017

Responsible applicant and co-applicants


Project partner

Associated projects

Number Title Start Funding scheme
149758 How algorithms shape language 01.05.2014 Project funding (Div. I-III)
187333 Monitoring Task and Skill Profiles in the Digital Economy: Employers' Changing Skill Demand and Workers' Career Outcomes 01.05.2020 NRP 77


Historical newspapers are mirrors of past societies. Published over centuries on a regular basis, they record wars and minor events, report on international, national and local matters, and document the day-to-day life; in a word, they keep track of the great and small history. They reflect the political, moral, and economic environments in which they were produced and they hold dense, continuous, and multi-level information which can help us understand how contemporaries experienced their present. This makes them indispensable for historians.How can newspapers help understanding the past? How to explore them? For long held on library and archive shelving, newspapers are currently undergoing mass digitization, and millions of facsimiles, along with their machine-readable content acquired via Optical Character Recognition, are becoming accessible via a variety of online portals. If this represents a major step forward in terms of preservation of and access to documents, much remains to be done in order to provide an extensive and sophisticated access to the content of these digital resources. In this regard, we are still facing many challenges.To begin with, not all historical newspapers are digitized, and heterogeneous schemes of availability and accessibility lead to an opaque and complex landscape of ‘historical media silos’. Next, the quality of OCR outputs often makes subsequent automatic text processing difficult and unreliable. This content accessibility issue closely relates to the more fundamental -- and promising -- challenge of content exploitation and exploration: how to make sense of the vast amount of available unstructured text? To achieve this, we need to semantically enrich the contents of historical newspapers, i.e. to extract, process, and link the information they contain. Another challenge relates to data visualization and exploration which need to accompany enhanced text analysis capacities and comply with historical research imperatives. Finally, these challenges can only be met through the close interplay between computer sciences and history, an essential factor for enabling new and methodologically reflected digital history scholarship.In this context, the proposed interdisciplinary project “Media Monitoring of the Past” aims at the development of a methodologically reflected technological framework to enable new ways of engaging with multilingual digital content of historical newspapers and new approaches to address historical questions. More precisely, the project will apply text mining techniques to transform noisy and unstructured textual content into semantically indexed, structured, and linked data; develop innovative visualization interfaces to enable the seamless exploration of complex and vast amounts of historical data; identify needs on the side of historians which may also translate into new text mining applications and new ways to study history; and synergistically reflect on the usage of digital tools in historical sciences from a practical, methodological, and epistemological point of view.The proposed project lies at the interface of several scientific disciplines -- computer sciences and humanities -- and teams with different skills and expertise will work hand in hand to achieve the desired objectives. Supported by a network of 8 associated partners, featuring libraries, archives, newspaper editors and historians, a consortium composed of computational linguists, digital humanists, digital historians and designers will jointly and concurrently work on three main tasks. First, the development and systematic evaluation of several natural language processing components for innovative historical text mining capacities at lexical, referential, and conceptual levels, resulting in a fully traceable and interoperable historical semantic knowledge base. Second, the co-design of novel visualization interfaces to accommodate text analysis research tools and their usage by humanities scholars. Finally, the active and continuous assessment of the produced historical media monitoring tool suite, with the exploration of a historical use case -- resistance against Europe --, the consideration of methodological and epistemological aspects, and with its pedagogical usage in the classroom. While benefiting each involved field on specific research aspects, such endeavour will greatly foster the development of scholarship in the emerging field of digital history.