Project

Back to overview

Media Monitoring of the Past

Applicant Kaplan Frédéric
Number 173719
Funding scheme Sinergia
Research institution Laboratoire d'humanités digitales EPFL CDH CDH-CH DHLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Interdisciplinary
Start/End 01.09.2017 - 31.03.2021
Approved amount 1'770'709.00
Show all

All Disciplines (4)

Discipline
Interdisciplinary
Information Technology
Applied linguistics
General history (without pre-and early history)

Keywords (15)

visualization; mutual learning; multilingual natural language processing; Switzerland; information extraction; access to cultural heritage documents; interface design; computational linguistics; digital history; historical methodology; historical newspapers; digital literacy; diachronic lexical processing; named entity processing; co-design

Lay Summary (French)

Lead
Les journaux historiques sont des miroirs de nos sociétés passées. Publiés régulièrement pendant des siècles, ils ont enregistré les guerres comme les faits divers, ont rendu compte des questions internationales, nationales et locales, et documenté la vie au jour le jour. Ils ont, en un mot, suivi la grande et la petite Histoire. Également reflets des environnements politiques, économiques et moraux dans lesquels ils ont été produits, les archives de presse sont ainsi porteuses d’une information multiple, dense et continue dans le temps propre à nous aider à comprendre comment les contemporains ont vécu leur présent.
Lay summary

Contenu et objectifs du travail de recherche

Le projet « Media Monitoring of the Past » poursuivra trois objectifs. Premièrement, le développement et l'évaluation systématique de plusieurs composants de traitement automatique du langage pour l’extraction d’information à partir de textes historiques. Deuxièmement, la réalisation d’interfaces de visualisation pour l'exploration transparente de vastes quantités de données historiques. Enfin, l'évaluation active et continue des outils produits avec un cas d’étude historique - la résistance à l'Europe -, complété d’une réflexion sur l'utilisation des outils numériques dans les sciences historiques avec la considération des aspects méthodologiques, épistémologiques et pédagogiques.

Contexte scientifique et social du projet de recherche

Conservés depuis des décennies sur les rayons de bibliothèques et d'archives, les archives de journaux font actuellement l’objet d’importantes campagnes de numérisation. Si cela représente une avancée majeure en termes de préservation et d'accès aux documents, de nombreux défis restent à relever en vue de fournir un accès élaboré au contenu de ces ressources numériques.

Soutenu par un réseau de partenaires associés mettant en vedette des bibliothèques, des archives, des éditeurs de journaux et des historiens, un consortium composé de linguistes informaticiens, d’humanistes et historiens numériques, et de designers travaillera à relever ces défis et à faire progresser la recherche dans les domaines émergents de l’histoire numérique et des humanités digitales.

Direct link to Lay Summary Last update: 17.07.2017

Responsible applicant and co-applicants

Employees

Project partner

Publications

Publication
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
BarmanRaphaël, EhrmannMaud, ClematideSimon, OliveiraSofia, KaplanFrédéric (2021), Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers, in Journal of Data Mining & Digital Humanities, 2021( Special I), 1-26.
Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
EhrmannMaud, RomanelloMatteo, FlückigerAlex, ClematideSimon (2020), Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers, in Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, CEUR-WS, Online.
The impresso system architecture in a nutshell
RomanelloMatteo, EhrmannMaud, ClematideSimon, GuidoDaniele (2020), The impresso system architecture in a nutshell, Europeana Pro, The Hague.
Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers.
Ehrmann Maud, Romanello Matteo, Flückiger Alex, Clematide Simon (2020), Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers., in Experimental IR Meets Multilinguality, Multimodality, and Interaction, Thessaloniki, GreeceSpringer International Publishing, Cham.
How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR
Ströbel Phillip Benjamin, Clematide Simon, Volk Martin (2020), How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR, in Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France3551-3559, European Language Resources Association, Paris, France3551-3559.
Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers
Ehrmann Maud, Romanello Matteo, Bircher Stefan, Clematide Simon (2020), Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers, in 42nd European Conference on IR Research, Proceedings, Part II, Lisbon, PortugalSpringer International Publishing, Cham.
HIPE - Shared Task Participation Guidelines
EhrmannMaud, RomanelloMatteo, ClematideSimon, FluckigerAlex (2020), HIPE - Shared Task Participation Guidelines, (report), Switzerland.
Impresso Named Entity Annotation Guidelines
EhrmannMaud, WatterCamille, RomanelloMatteo, ClematideSimon, FlückigerAlex (2020), Impresso Named Entity Annotation Guidelines, Self published guidelines, Switzerland.
CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion
Makarov Peter, Clematide Simon (2020), CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion, in Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and M, Online171-176, Association for Computational Linguistics, Online171-176.
Semi-supervised Contextual Historical Text Normalization
Makarov Peter, Clematide Simon (2020), Semi-supervised Contextual Historical Text Normalization, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online Association for Computational Linguistics, Online.
Collections of Digitised Newspapers as Historical Resources
Bunout Estelle (2019), Collections of Digitised Newspapers as Historical Resources, Digital Humanities Research Questions and Methods, PARTHENOS.
Historical Newspaper User Interfaces: A Review
EhrmannMaud, BunoutEstelle, DüringMarten (2019), Historical Newspaper User Interfaces: A Review, Proceedings of 85th IFLA General Conference and Assembly, Athens, Greece.
Les stratégies de légitimation de la présence de l’Allemagne et de la Pologne en Europe Orientale (1939-1945)
Bunout Estelle (2019), Les stratégies de légitimation de la présence de l’Allemagne et de la Pologne en Europe Orientale (1939-1945), in Guerres mondiales et conflits contemporains, N° 275(3), 31-42.
Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus
Clematide Simon, Furrer Lenz, Volk Martin (2018), Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus, in Journal for Language Technology and Computational Linguistics (JLCL), 33(1), 25-47.
Neural Transition-based String Transduction for Limited-Resource Setting in Morphology
Makarov Peter, Clematide Simon (2018), Neural Transition-based String Transduction for Limited-Resource Setting in Morphology, in Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA83-93, Association for Computational Linguistics, Online83-93.
Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods
Amrhein Chantal, Clematide Simon (2018), Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods, in Journal for Language Technology and Computational Linguistics (JLCL), 33(1), 49-76.
UZH at CoNLL-SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection
Makarov Peter, Clematide Simon (2018), UZH at CoNLL-SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection, in CoNLL-SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, Brussels69-75, Association for Computational Linguistics, Online69-75.

Datasets

Survey of digitized newspaper interfaces

Author Ehrmann, Maud; Bunout, Estelle; Düring, Marten
Publication date 16.08.2019
Persistent Identifier (PID) 10.5281/zenodo.3369875
Repository Zenodo
Abstract
Data set related to the survey of digitized newspaper interfaces conducted in the frame of the impresso project.

Ground truth for Neue Zürcher Zeitung black letter period

Author Ströbel, Phillip; Clematide, Simon
Publication date 12.07.2019
Persistent Identifier (PID) 10.5281/zenodo.3333627
Repository Zenodo
Abstract
The Neue Zürcher Zeitung (NZZ) has been publishing in black letter from its very first issue in 1780 until 1947. From this time period, we randomly sampled one frontpage per year, resulting in a total of 167 pages. We chose frontpages because they typically contain highly relevant material and because we want to make sure not to sample pages containing exclusively advertisements or stock information. During certain periods, the NZZ was published several times a day, and there were supplements, too. Due to incomplete metadata, the sampling included frontpages from supplements.We then manually corrected the pages, so it can be used as a ground truth to improve the OCR of black letter in historical newspapers.

Datasets and Models for Historical Newspaper Article Segmentation

Author Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Oliveira, Sofia
Publication date 30.01.2021
Persistent Identifier (PID) 10.5281/zenodo.3706863
Repository Zenodo
Abstract
This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link).Please see information on:- Zenodo: https://zenodo.org/record/3706863)- EPFL infoscience: https://infoscience.epfl.ch/record/283048?&ln=en- Github repository:https://github.com/dhlab-epfl/dhSegment-text

German-French Parallel Corpus of the Swiss Federal Gazette (19c.-20c.)

Author Fluckiger, Alex; Clematide, Simon; Meyer Broyn, Adriano
Publication date 17.12.2020
Persistent Identifier (PID) 10.5281/zenodo.4748988
Repository Zenodo
Abstract
The Federal Gazette is a journal published by the Swiss Government. The journal is a political newsletter concerned with resolutions and laws of the Swiss Confederation. First published in 1849, briefly after the foundation of the Swiss Federal State, it is provided in the following official languages: German ("Bundesblatt"), French ("Feuille fédérale"), and Italian ("Foglio federale").The present parallel corpus covers the period between 1849 and 2017 for French and German.The alignment of articles across languages was done using an improved version of an UZH-ICL in-house script written by Chantal Amrhein, which leverages the Bleualign for sentences to the level of documents.The parallel corpus *FedGaz German-French* consists of two line-aligned files that also preserves the boundaries of documents. The resulting corpus contains more than 3.5 million sentences from over 53k documents.For more information, please refer to explanations (https://github.com/impresso/federal-gazette#aligning-documents) and related code (https://github.com/impresso/federal-gazette).

Annotation of press articles with topic modelling and naïve Bayes classifiers

Author Bunout, Estelle
Publication date 01.03.2021
Persistent Identifier (PID) 10.5281/zenodo.4571431
Repository Zenodo
Abstract
Annotation of historic press articles, organised per titles, with topic modelling and naïve Bayes classifiers (NBC2+4).List of the 100 most significant words for the 4th round of NBC for the antimodern conception of Europe.List of topics with labels and distribution per title, trained on the articles collection of each selected Swiss press title.A related publication is accepted and about to be published.

Collaboration

Group / person Country
Types of collaboration
Journal "Die Tat" (Migros) Switzerland (Europe)
- Research Infrastructure
UNIA (various newspaper archives) Switzerland (Europe)
- Research Infrastructure
Journal le "Freibuger Nachrichten" Switzerland (Europe)
- Research Infrastructure
Bibliothèque de la Ville de la Chaux-de-Fond Switzerland (Europe)
- Research Infrastructure
Numapress project (http://www.numapresse.org/) France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Journal "Le Confédéré" Switzerland (Europe)
- Research Infrastructure
RERO, Réseau des bibliothèques de Suisse occidentale Switzerland (Europe)
- Research Infrastructure
Bibliothèque Cantonale et Universitaire de Lausanne Switzerland (Europe)
- Research Infrastructure
Oceanic Exchanges project United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
Médiathèque du Valais Switzerland (Europe)
- Research Infrastructure
Journal "L'Essor" Switzerland (Europe)
- Research Infrastructure
Bibliothèque Cantonale et Universitaire de Fribourg Switzerland (Europe)
- Research Infrastructure
Journal "La Liberté" Switzerland (Europe)
- Research Infrastructure
Schweizerisches Sozialarchiv Switzerland (Europe)
- Research Infrastructure
Journal "Le Peuple / La Sentinelle" (Parti Socialiste de Neuchatel) Switzerland (Europe)
- Research Infrastructure
Groupe ArcInfo Switzerland (Europe)
- Research Infrastructure
NewsEye project (https://www.newseye.eu/) France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Das Wienerische Diarium, Oesterreichische Akademie der Wissenschaften Austria (Europe)
- in-depth/constructive exchanges on approaches, methods or results
Inception - TU Darmstadt Germany (Europe)
- Research Infrastructure

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Applying and deploying AI in GLAMs Individual talk Historical Newspaper Content Mining: findings from the impresso project 31.03.2021 Online, Netherlands Ehrmann Maud; Bunout Estelle;
Open Community Calls of the AI4LAM Individual talk Named Entity Processing on Historical Documents: Challenges and insights from CLEF-HIPE-2020 16.02.2021 Online, United States of America Ehrmann Maud;
"Info am Mittag": Internal Seminar of the Swiss National Library and Federal Cultural Office (invitation from Ms Liliane Regamey) Individual talk 200 Jahre Zeitungsarchiv 03.12.2020 Bern, Switzerland Ehrmann Maud;
Invited Seminar (3h) in the context of the ERC project "Mapmodern – Social Networks of the Past" at the Open University of Catalunya (invitation from Ms Diana Roig Sanz) Individual talk The impresso Factory 01.12.2020 Online , Spain Ehrmann Maud;
CLEF 2020 Conference and Labs of the Evaluation Forum Talk given at a conference CLEF-HIPE-2020 Named Entity Recognition and Linking on Historical Newspapers 22.09.2020 Thessaloniki (Online), Greece Romanello Matteo; Clematide Simon; Flückiger Alex; Ehrmann Maud;
Informationen – digital verpackt Talk given at a conference Quantitative Ideengeschichte: Zeitungen, Parlamentsdebatten und medizinische Forschung vergangener Jahrhunderte 22.09.2020 Zurich (online), Switzerland Schneider Gerold;
Workshop zu Natural Language Processing Talk given at a conference NLP in "Impresso – Media Monitoring of the Past" 14.08.2020 Bern, Switzerland Clematide Simon;
Digital Humanities Conference 2020 Talk given at a conference Historical Newspaper Content Mining: Revisiting the impresso Project’s Challenges in Text and Image Processing, Design and Historical Scholarship 22.07.2020 Ottawa (online due to Covid-19), Canada Ehrmann Maud; Bunout Estelle;
DH2020 Talk given at a conference Historical Newspaper Content Mining: Revisiting the impresso Project’s Challenges in Text and Image Processing, Design and Historical Scholarship 20.07.2020 Ottawa (online), Canada Bunout Estelle; Fickers Andreas; Kaplan Frédéric; Romanello Matteo; Schroeder Paul; Clematide Simon; Volk Martin; Ehrmann Maud; Ströbel Phillip;
DH Benelux 2020 Talk given at a conference Deep-diving in NLP enhanced digitised newspapers: A hands-on session with the impresso interface. 03.06.2020 Leiden (online), Netherlands Bunout Estelle;
42nd European Conference on IR Research Talk given at a conference Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers 14.04.2020 Lisbon, Portugal Clematide Simon; Ehrmann Maud; Romanello Matteo;
OCR - Herausforderungen und Lösungen für Zeitungen & Zeitschriften Talk given at a conference Bericht über das Training und die Transferierbarkeit des deutschen Frakturschriftmodells innerhalb von Transkribus 11.11.2019 Frankfurt, Germany Clematide Simon;
C2DH Research Seminar Individual talk Grasping the anti-modern: how to identify anti-modern discourses on Europe in a digitized newspaper collection (using a naïve bayes classifier and topic modelling) 23.10.2019 Esch-sur-Alzette, Luxembourg Bunout Estelle;
Current advances in text mining Talk given at a conference Introducing the impresso project: How NLP tools and interface design support historians with the exploration of large-scale digitized newspaper collections 25.09.2019 Belval, Luxembourg van Beek Thijs;
DHBenelux 2019 Talk given at a conference Grasping the anti-modern: training a naïve bayes classifier to expand a sub-corpus of Swiss newspaper articles (1939-1945) on anti-modern discourses 11.09.2019 Liège, Belgium Bunout Estelle;
CLEF 2019 Conference Poster Shared Task on Named Entity Recognition and Linking on Historical Newspapers 09.09.2019 Lugano, Switzerland Ehrmann Maud; Romanello Matteo; Clematide Simon;
World Library and Information Congress. 85th INFLA Congress and Assembly Talk given at a conference Historical Newspaper User Interfaces: A Review 24.08.2019 Athens, Greece Ehrmann Maud; Bunout Estelle;
Digital Humanities Conference, Utrecht Talk given at a conference Improving OCR of Black Letter in Historical Newspapers: The Unreasonable Effectiveness of HTR Models on Low-Resolution Images 10.07.2019 Utrecht, Netherlands Clematide Simon; Ströbel Phillip;
Digital Humanities 2019 Conference Talk given at a conference Complexities in the Use, Analysis, and Representation of Historical Digital Periodicals 10.07.2019 Utrecht, Netherlands Bunout Estelle;
Digital Humanities 2019 Conference Talk given at a conference Tutorial on Named Entities processing for Digital Humanities 09.07.2019 Utrecht, Netherlands Ehrmann Maud; Romanello Matteo; Clematide Simon;
Digital Humanities 2019 Conference Talk given at a conference The Past, Present and Future of Digital Scholarship with Newspaper Collections 09.07.2019 Utrecht, Netherlands Ströbel Phillip; Volk Martin; Fickers Andreas; van Beek Thijs; Clematide Simon; Bunout Estelle; Romanello Matteo; Schroeder Paul; Kaplan Frédéric; Ehrmann Maud;
Tensions of Europe conference - Decoding Europe: Technological Pasts in the Digital Age Talk given at a conference Can the digitised newspapers enable the reconstruction of the European debates in Switzerland and Luxembourg, from 1918-1945? 27.06.2019 Belval, Luxembourg Bunout Estelle;
Journée d'étude annuelle du projet Numapresse Individual talk Media Monitoring of the Past – Fouiller deux siècles de journaux historiques 20.06.2019 Nîmes, France Ehrmann Maud;
Swiss Texts 2019 4th Swiss Text Analytics Conference Poster Historical Media Monitoring with impresso 19.06.2019 Winterthur, Switzerland Ströbel Phillip;
Stronger Multilateralism through Knowledge Heritage and Culture Talk given at a conference Text Mining of Cultural Heritage Archives: the Case of Newspapers 17.06.2019 Geneva, Switzerland Ehrmann Maud;
15th annual HEIRS Conference:Experts, knowledge and the (de)legitimization of European politics Talk given at a conference The European debate in the press of Switzerland (1900-1945): who were the experts on Europe? 23.05.2019 Paris, France Bunout Estelle;
DH Nord Conference Individual talk Beyond Keyword Search - Semantic Indexing and Exploration of Large Collections of Historical Newspapers 06.03.2019 Copenhagen, Denmark Ehrmann Maud;
An Asymmetric Europe: Ideologies during the Second World War Individual talk Regards croisés d’experts sur la place de l’Europe orientale dans les imaginaires collectifs allemands et polonais (1939-1945) : une asymétrie évolutive ? 06.12.2018 Moscow, Russia Bunout Estelle;
Les magazines illustrés de la première moitié du 20e siècle à l’ère des humanités numériques – Allemagne / France en regard et acteurs en dialogue Talk given at a conference Impresso project – From historic newspaper facsimiles to knowledge base, information extraction in the service of digital history 29.11.2018 Paris, France Ehrmann Maud;
Workshop DARIAH-CH Poster impresso​ – Media Monitoring of the Past – Mining 200 years of historical newspapers 29.11.2018 Neuchâtel, Switzerland Ehrmann Maud; Volk Martin; Fickers Andreas; van Beek Thijs; Schroeder Paul; Bunout Estelle; Ströbel Phillip; Clematide Simon; Kaplan Frédéric; Romanello Matteo;
Time Machine Conference Talk given at a conference Spotlight presentation of impresso project 31.10.2018 Lausanne, Switzerland Ehrmann Maud;
Digital Hermeneutics in History: Theory and Practice Talk given at a conference Implementing Transparency 26.10.2018 Esch-sur-Alzette, Luxembourg Bunout Estelle;
Journée d’étude - L’histoire contemporaine à l’ère numérique : sources, méthodologies, critiques, Lausanne Talk given at a conference “Une Recherche plus Fouillée Dans Un Corpus Imparfait? L’étude de La Question Européenne Dans La Presse Numérisée Suisse et Luxembourgeoise (1848-1945). 04.07.2018 Lausanne, Switzerland Bunout Estelle;
DH Benelux 2018 Talk given at a conference Corpus Creation and Digitised Newspapers: Perspectives from Research and Libraries 08.06.2018 Amsterdam, Netherlands Bunout Estelle;
DH Benelux 2018 Talk given at a conference Transparency as a Prerequisite for Digital Source Criticism of Digitized Newspapers 08.06.2018 Amsterdam, Netherlands Bunout Estelle;
Vom DIARIUM zum DIGITARIUM Talk given at a conference Computerlinguistische Methoden Für Bessere Zugänglichkeit von Historischen Zeitungsberichten. Die NZZ Im Wandel Der Zeit 24.04.2018 Vienna, Austria Ströbel Phillip;
C2DH/LCSB Data Visualisation Workshop Talk given at a conference Impresso. Media Monitoring of the Past 24.04.2018 Esch-sur-Alzette, Luxembourg Bunout Estelle; Schroeder Paul;
INCEpTION: Towards Interactive Semantic Annotation Talk given at a conference Annotating Named Entities in Historical Newspapers and Scholarly Publications. 13.03.2018 Darmstadt, Germany Romanello Matteo;
Hacking the News Workshop in conjunction with DHN 2018 Talk given at a conference Detecting Text Reuse in Newspapers Data with Passim 06.03.2018 Helsinki, Finland Romanello Matteo;
Digital Humanities Austria Talk given at a conference Bridging Literature and Information Science 04.12.2017 Innsbruck, Austria Ströbel Phillip;
Digital Humanities Austria Individual talk Bridging Literature and Information Science 04.12.2017 Innsbruck, Austria Ströbel Phillip;


Self-organised

Title Date Place
Online Final Meeting 30.09.2020 Online, Switzerland
CLEF-HIPE-2020 Workshop 23.09.2020 Thessaloniki (Online), Greece
impresso workshop: Digitised newspapers - a new Eldorado for historians ? 23.04.2020 Lausanne (online due to covid19), Switzerland
Community call #2: Topic Modeling in the impresso web application 17.05.2019 Zürich, Switzerland
Community call #1: Introducing the impresso Interface 09.11.2018 Esch-sur-Alzette, Luxembourg
impresso Talks Series: The Omnipresence of the Nation: Preliminary Remarks in Studying Nationhood through Digitized Newspapers 30.10.2018 Esch-sur-Alzette, Luxembourg
Topic modelling applied to the European Commission digitised Archives 30.10.2018 Esch-sur-Alzette, Luxembourg
Sequential pattern mining for robust event detection 11.10.2018 Lausanne, Switzerland
impresso workshop #1bis 'Edelweiss' Interface co-design 23.02.2018 Basel, Swiss Economic Archives, Switzerland

Knowledge transfer events

Active participation

Title Type of contribution Date Place Persons involved
La fabrique d'une interface d'exploration de journaux, UNINE Talk 22.03.2021 Neuchatel University (online), Switzerland Ehrmann Maud;
Information retrieval and text mining Workshop 19.03.2021 Zentralbibliothek Zürich (ZB) (online), Switzerland Clematide Simon;
La fabrique d'une interface d'exploration de journaux, EPFL Talk 09.03.2021 EPFL (online) , Switzerland Ehrmann Maud;
Entretien avec Maud Ehrmann sur le projet impresso / UNIL-EPFL class Performances, exhibitions (e.g. for education institutions) 30.11.2020 Lausanne, Switzerland Ehrmann Maud;
La fabrique d'une interface d'exploration de journaux Talk 25.11.2020 Online (EPFL), Switzerland Ehrmann Maud;
Introduction to Digital History (on the usage of the impresso application, Master in European Contemporary History) Talk 11.11.2020 Online (University of Luxembourg), Luxembourg Bunout Estelle;
Introduction to Digital History (on the usage of the impresso application, Bachelor in European Cultures) Talk 05.11.2020 Online (University of Luxembourg), Luxembourg Bunout Estelle;
Data preparation and guest lecture for EPFL SHS Master class on Digital History, with Martin Grandjean and Sandra Bott Talk 25.09.2019 Lausanne, EPFL, Switzerland Ehrmann Maud; Romanello Matteo;
1. Workshop mit der wissenschaftlichen Begleitgruppe des Zeitungsportals der Deutschen Digitalen Bibliothek Talk 13.06.2019 Berlin, Germany Bunout Estelle;
A 90-minute crash course on the study of digitized historical newspapers Talk 18.02.2019 Luxembourg, Switzerland Bunout Estelle;
Lost and found. Netwerkdag Oorlogsbronnen 2018 Talk 15.11.2018 Amsterdam, Netherlands van Beek Thijs;
The digitisation of newspapers: how to turn a page Performances, exhibitions (e.g. for education institutions) 25.10.2018 Esch-sur-Alzette, Luxembourg van Beek Thijs;
Guest lecture for the seminar "History in a Digitized World" of Prof. Martin Dusinberre and Dr. Tobias Hodel, Zurich History department. Talk 23.10.2018 Zurich, Switzerland Ehrmann Maud; Ströbel Phillip;
Data preparation and guest lecture for EPFL SHS Master class on Digital History, with Martin Grandjean and Sandra Bott Talk 03.10.2018 Lausanne, Switzerland Ehrmann Maud; Romanello Matteo;


Communication with the public

Communication Title Media Place Year
Talks/events/exhibitions Reading yesterday's news in the digital age International 2020
Talks/events/exhibitions Carnotzet Scientifique: Technologies et patrimoine: l’Histoire 2.0 Western Switzerland 2019
Media relations: print media, online media Cela ouvre des horizons La Liberté Western Switzerland 2019
Talks/events/exhibitions dhCenter: Comment explorer 200 ans d'archives de journaux Western Switzerland 2019
New media (web, blogs, podcasts, news feeds etc.) Mining and exploring 200 years of newspapers: the impresso project Europeana Pro International 2019
New media (web, blogs, podcasts, news feeds etc.) Nibbling at text: identifying discourses on Europe in a large collection of historical newspapers us C2DH Blog International 2019
Talks/events/exhibitions Soirée de la Rotonde BCUF - Comment explorer 200ans d'archives de journaux Western Switzerland 2019
Media relations: radio, television Am C2DH geet et ëm d'Lëtzebuerger Zäitgeschicht an nei innovativ Weeër an der Fuerschung. radio 100,7 International 2017
Media relations: print media, online media Analyse critique de 200 ans de journaux historiques luxembourgeois, suisses, français et allemands innovation.public.lu International 2017
Media relations: print media, online media C2DH Collaborative Project Looks at 200 Years of International Newspapers Chronicle.lu International 2017
New media (web, blogs, podcasts, news feeds etc.) Critical text mining in historical newspapers from Luxembourg, Germany, France and Sw EurekAlert! International 2017
Media relations: print media, online media Mit einem Klick durch 200 Jahre Zeitung Tageblatt.lu International 2017
Media relations: print media, online media Neue Auswertungsmethoden entwickeln. Forschungsprojekt: "Textmining" von historischen Zeitungen aus Lëtzebuerger Journal International 2017

Use-inspired outputs

Associated projects

Number Title Start Funding scheme
149758 How algorithms shape language 01.05.2014 Project funding (Div. I-III)
187333 Monitoring Task and Skill Profiles in the Digital Economy: Employers' Changing Skill Demand and Workers' Career Outcomes 01.05.2020 NRP 77 Digital Transformation

Abstract

Historical newspapers are mirrors of past societies. Published over centuries on a regular basis, they record wars and minor events, report on international, national and local matters, and document the day-to-day life; in a word, they keep track of the great and small history. They reflect the political, moral, and economic environments in which they were produced and they hold dense, continuous, and multi-level information which can help us understand how contemporaries experienced their present. This makes them indispensable for historians.How can newspapers help understanding the past? How to explore them? For long held on library and archive shelving, newspapers are currently undergoing mass digitization, and millions of facsimiles, along with their machine-readable content acquired via Optical Character Recognition, are becoming accessible via a variety of online portals. If this represents a major step forward in terms of preservation of and access to documents, much remains to be done in order to provide an extensive and sophisticated access to the content of these digital resources. In this regard, we are still facing many challenges.To begin with, not all historical newspapers are digitized, and heterogeneous schemes of availability and accessibility lead to an opaque and complex landscape of ‘historical media silos’. Next, the quality of OCR outputs often makes subsequent automatic text processing difficult and unreliable. This content accessibility issue closely relates to the more fundamental -- and promising -- challenge of content exploitation and exploration: how to make sense of the vast amount of available unstructured text? To achieve this, we need to semantically enrich the contents of historical newspapers, i.e. to extract, process, and link the information they contain. Another challenge relates to data visualization and exploration which need to accompany enhanced text analysis capacities and comply with historical research imperatives. Finally, these challenges can only be met through the close interplay between computer sciences and history, an essential factor for enabling new and methodologically reflected digital history scholarship.In this context, the proposed interdisciplinary project “Media Monitoring of the Past” aims at the development of a methodologically reflected technological framework to enable new ways of engaging with multilingual digital content of historical newspapers and new approaches to address historical questions. More precisely, the project will apply text mining techniques to transform noisy and unstructured textual content into semantically indexed, structured, and linked data; develop innovative visualization interfaces to enable the seamless exploration of complex and vast amounts of historical data; identify needs on the side of historians which may also translate into new text mining applications and new ways to study history; and synergistically reflect on the usage of digital tools in historical sciences from a practical, methodological, and epistemological point of view.The proposed project lies at the interface of several scientific disciplines -- computer sciences and humanities -- and teams with different skills and expertise will work hand in hand to achieve the desired objectives. Supported by a network of 8 associated partners, featuring libraries, archives, newspaper editors and historians, a consortium composed of computational linguists, digital humanists, digital historians and designers will jointly and concurrently work on three main tasks. First, the development and systematic evaluation of several natural language processing components for innovative historical text mining capacities at lexical, referential, and conceptual levels, resulting in a fully traceable and interoperable historical semantic knowledge base. Second, the co-design of novel visualization interfaces to accommodate text analysis research tools and their usage by humanities scholars. Finally, the active and continuous assessment of the produced historical media monitoring tool suite, with the exploration of a historical use case -- resistance against Europe --, the consideration of methodological and epistemological aspects, and with its pedagogical usage in the classroom. While benefiting each involved field on specific research aspects, such endeavour will greatly foster the development of scholarship in the emerging field of digital history.
-