Project

Back to overview

HisDoc 2.0 : Towards Computer-Assisted Paleography

English title An Integrated Approach Incorporating Text Localization, Scribe Identification, and Semantics for Historical Documents
Applicant Ingold Rolf
Number 150173
Funding scheme Project funding (Div. I-III)
Research institution Département d'Informatique Université de Fribourg
Institution of higher education University of Fribourg - FR
Main discipline Information Technology
Start/End 01.01.2014 - 31.03.2017
Approved amount 337'092.00
Show all

Keywords (7)

semantic data; ontology modeling; script discrimination; historical documents; text localization; document image analysis; scribe identification

Lay Summary (German)

Lead
Ein essentieller Teil unseres kulturellen Erbes sind schriftlich fixierte Texte, wie sie zum Beispiel in mittelalterlichen Handschriften überliefert sind. Diese werden in Bibliotheken aufbewahrt und sind meist nur unter besonderen Voraussetzungen einsehbar. Um diese Texte und Bücher einem breitem Publikum verfügbar zu machen, gibt es zunehmend Bestrebungen, Handschriften zu digitalisieren und im Web zu präsentieren. Ein prominentes Beispiel hierfür ist die Schweizer Plattform e-codices.
Lay summary

Inhalt und Ziel des Forschungsprojektes

Das Hauptziel des Forschungsprojektes ist die Entwicklung computerbasierter Methoden, welche KatalogisatorInnen bei der paleographischen Analyse von Handschriften unterstützen. Dabei entwickeln wir einen für die informatische Verarbeitung historischer Dokumente neuartigen integrierten Ansatz, der (i) Textsegmentierung, (ii) Schriftbestimmung und (iii) Schreiberidentifikation in ein generisches holistisches Verfahren zusammenführt. Zusätzlich binden wir bereits existierende semantische Informationen in unsere automatischen Analyseprozesse ein (iv), so dass die Katalogeinträge noch präziser und korrekter erstellt werden können.

Wissenschaftlicher und gesellschaftlicher Kontext des Forschungsprojekts

Unsere Arbeit wird neue integrierte Methoden zur computerunterstützten Dokumentbildanalyse generieren, die insbesondere für das komplexe Erscheinungsbild  historischer Handschriften zugeschnitten sind. Dies erleichtert die Erschliessung des kulturellen Erbes sowohl für Fachleute als auch für interessierte Laien. Die Anwendung computerunterstützter Methoden bei der wissenschaftlichen Katalogisierung von Handschriften kann die Qualität der Katalogisate besonders hinsichtlich der Bewältigung grosser Datenmengen verbessern. Daraus profitieren Fachleute sowie interessierte Laien, die sich über Plattformen wie e-codices einen Einblick in unser kulturelles Erbe verschaffen und neue Querverbindungen zwischen den einzelnen Handschriften erschliessen möchten.

Keywords

Bewahrung kulturellen Erbes, Computerunterstützte Paläographie, Dokumentbild-Analyse, Schrifterkennung, Schreiberidentifikation, elektronische Handschriftenkataloge, Web 2.0

Direct link to Lay Summary Last update: 20.11.2013

Responsible applicant and co-applicants

Employees

Publications

Publication
A User-Centered Segmentation Method for Complex Historical Manuscripts Based on Document Graphs
Garz Angelika, Seuret Mathias, Fischer Andreas, Ingold Rolf (2017), A User-Centered Segmentation Method for Complex Historical Manuscripts Based on Document Graphs, in IEEE Transactions on Human-Machine Systems, 47(2), 181-193.
Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction
Garz Angelika, Seuret Mathias, Simistira Fotini, Fischer Andreas, Ingold Rolf (2016), Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction, in {2016 12th IAPR Workshop on Document Analysis Systems (DAS), IEEE, Santorini, Greece.
DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts
Simistira Foteini, Seuret Mathias, Eichenberger Nicole, Garz Angelika, Liwicki Marcus, Ingold Rolf (2016), DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts, in 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, Shenzhen, China.
DivaServices—A RESTful web service for Document Image Analysis methods
Würsch Marcel, Ingold Rolf, Liwicki Marcus (2016), DivaServices—A RESTful web service for Document Image Analysis methods, in Digital Scholarship in the Humanities, 051-051.
DIVAServices-Spotlight – Experimenting with Document Image Analysis Methods in the Web
Würsch Marcel, Bärtschi Michael, Ingold Rolf, Liwicki Marcus (2016), DIVAServices-Spotlight – Experimenting with Document Image Analysis Methods in the Web, in Digital Humanities, Digital Humanities, Krakow, Poland.
GraphManuscribble: Interact Intuitively with Digital Facsimiles
Garz Angelika, Seuret Mathias, Fischer Andreas, Ingold Rolf (2016), GraphManuscribble: Interact Intuitively with Digital Facsimiles, in Second International Conference on Natural Sciences and Technology in Manuscript Analysis, University of Hamburg, Hamburg, Germany.
N-light-N
Seuret Mathias, Alberti Michele, Liwicki Marcus (2016), N-light-N, University of Fribourg, University of Fribourg.
N-Light-N: A Highly-Adaptable Java Library for Document Analysis with Convolutional Auto-Encoders and Related Architectures
Seuret Mathias, Ingold Rolf, Liwicki Marcus (2016), N-Light-N: A Highly-Adaptable Java Library for Document Analysis with Convolutional Auto-Encoders and Related Architectures, in 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, Shenzhen, China.
Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs
Katsouros Vassilis, Papavassiliou Vassilis, Simistira Fotini, Gatos Basilis (2016), Recognition of Greek Polytonic on Historical Degraded Texts Using HMMs, in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), IEEE, Santorini, Greece.
SDK Reinvented: Document Image Analysis Methods as RESTful Web Services
Würsch Marcel, Ingold Rolf, Liwicki Marcus (2016), SDK Reinvented: Document Image Analysis Methods as RESTful Web Services, in 2016 12th IAPR Workshop on Document Analysis Systems (DAS), IEEE, Santorini, Greece.
Simple and Fast Geometrical Descriptors for Writer Identification
Garz Angelika, Würsch Marcel, Fischer Andreas, Ingold Rolf (2016), Simple and Fast Geometrical Descriptors for Writer Identification, in Electronic Imaging, 2016(17), 1-12.
Clustering Historical Documents Based on the Reconstruction Error of Autoencoders
Seuret Mathias, Fischer Andreas, Garz Angelika, Liwicki Marcus, Ingold Rolf (2015), Clustering Historical Documents Based on the Reconstruction Error of Autoencoders, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), ACM Press, Nancy, France.
Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction
Pastor-Pellicer Joan, Garz Angelika, Ingold Rolf, Castro-Bleda Mara-Jos (2015), Combining Learned Script Points and Combinatorial Optimization for Text Line Extraction, in Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing - HIP '1, ACM Press, Nancy, France.
DIVADesk – A Holistic Digital Workspace for Analyzing Historical Document Images
Eichenberger Nicole, Garz Angelika, Chen Kai, Wei Hao, Ingold Rolf, Liwicki Marcus (2015), DIVADesk – A Holistic Digital Workspace for Analyzing Historical Document Images, in Manuscript Cultures, 7, 69-82.
DIVADIAWI - A Web-based Interface for Semi-automatic Labeling of Historical Document Images
Wei Hao, Chen Kai, Seuret Mathias, Würsch Marcel, Liwicki Marcus, Ingold Rolf (2015), DIVADIAWI - A Web-based Interface for Semi-automatic Labeling of Historical Document Images, in Digital Humanities, Digital Humanities, Sydney, Australia.
DIVAServices – A RESTful Web Service for Document Image Analysis Methods
Würsch Marcel, Ingold Rolf, Liwicki Marcus (2015), DIVAServices – A RESTful Web Service for Document Image Analysis Methods, in Digital Humanities, SPIE Digital Library, Sydney, Australia.
Gradient-domain degradations for improving historical documents images layout analysis
Seuret Mathias, Chen Kai, Eichenberger Nicole, Liwicki Marcus, Ingold Rolf (2015), Gradient-domain degradations for improving historical documents images layout analysis, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, Nancy, France.
Ground truth model, tool, and dataset for layout analysis of historical documents
Chen Kai, Seuret Mathias, Wei Hao, Liwicki Marcus, Hennebert Jean, Ingold Rolf (2015), Ground truth model, tool, and dataset for layout analysis of historical documents, in Document Recognition and Retrieval XXII, SPIE Digital Library, San Francisco.
HisDoc 2.0: Toward Computer-assisted Paleography
Garz Angelika, Eichenberger Nicole, Liwicki Marcus, Ingold Rolf (2015), HisDoc 2.0: Toward Computer-assisted Paleography, in Manuscript Cultures, 7, 19-28.
Recognition of historical Greek polytonic scripts using LSTM networks
Simistira Fotini, Ul-Hassan Adnan, Papavassiliou Vassilis, Gatos Basilis, Katsouros Vassilis, Liwicki Marcus (2015), Recognition of historical Greek polytonic scripts using LSTM networks, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, Nancy, France.
Selecting Autoencoder Features for Layout Analysis of Historical Documents
Wei Hao, Seuret Mathias, Chen Kai, Fischer Andreas, Liwicki Marcus, Ingold Rolf (2015), Selecting Autoencoder Features for Layout Analysis of Historical Documents, in Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing - HIP '1, ACM Press, Nancy, France.
Training- and Segmentation-Free Intuitive Writer Identification with Task-Adapted Interest Points
Garz Angelika, Würsch Marcel, Ingold Rolf (2015), Training- and Segmentation-Free Intuitive Writer Identification with Task-Adapted Interest Points, in 17th Biennial Conference of the International Graphonomics Society, None, Pointe-à-Pitre, Guadeloupe.
A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents
Fischer Andreas, Baechler Micheal, Garz Angelika, Liwicki Marcus, Ingold Rolf (2014), A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents, in 2014 11th IAPR International Workshop on Document Analysis Systems, IEEE, Tours, France.

Collaboration

Group / person Country
Types of collaboration
University of Science and Technology Houari Boumediene Algeria (Africa)
- in-depth/constructive exchanges on approaches, methods or results
- Exchange of personnel
EPHE France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Exchange of personnel
laboratory L3i, La Rochelle France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Exchange of personnel
DFKI, Kaiserslautern Germany (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Exchange of personnel
Loria, Nancy, France France (Europe)
- in-depth/constructive exchanges on approaches, methods or results
e-Codices, University of Fribourg Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
Cenparmi, Concordia Univ., Montrael Canada (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Texterfassung historischer Dokumente Talk given at a conference Recent Advances in Historical Document Analysis 06.09.2016 Berlin, Germany Eichenberger-Liwicki Marcus;


Self-organised

Title Date Place
on Historical Document Analysis in Switzerland 18.11.2016 Fribourg, Switzerland
HisDoc Workshop and Hackathon 21.09.2015 Kaiserslautern, Germany

Communication with the public

Communication Title Media Place Year
Media relations: radio, television Il Giardino di Albert Radiotelevisione svizzera Italian-speaking Switzerland 2014
Media relations: print media, online media Professor Rolf Ingold: An integrated Approach International Innovation, Dissemintating Science, Research, and Technology International 2014

Awards

Title Year
Best Student Paper Award @ DAS2016 2016
Best Student Forensic Paper Award @ IGS 2015 2015
The master’s thesis “Geometric Relations in Interest Point Based Writer Identification” by Marcel Würsch won prize of the Joint Alumni Association in Computer Science (JAACS) and has been nominated for the Informatikpreis 2014 of the University of Fribourg. 2015
At the International Summer School on Document Image Processing in Fourni, Greece, we have won the best paper award and an IAPR grant covering part of the registration cost on a paper about our text line localization method 2014

Associated projects

Number Title Start Funding scheme
125220 HisDoc: Historical Document Analysis, Recognition, and Retrieval 01.05.2009 Sinergia
169618 HisDoc III : Large-Scale Historical Document Classification 01.01.2017 Project funding (Div. I-III)

Abstract

In HisDoc 2.0 we will investigate the yet missing ingredients for automatic large-scale analysis of historical documents, and how to make the results useful for historians. It will build upon the foundations laid in the HisDoc project and continue research on textual heritage preservation in a novel direction. While the previous project aimed at layout and textual content analysis of historical documents - i.e., focusing on philological studies - HisDoc 2.0 will take the approach a step further: it will be dedicated to paleographical studies and incorporate semantic domain knowledge automatically extracted from existing document databases into Document Image Analysis (DIA) methods in order to facilitate large-scale processing. In HisDoc 2.0, we formulate several novel research perspectives, propose new contributions for document image analysis, and bring research one step further towards holistic systems rather than sub-task solving in restricted environments. This integrated approach opens new doors towards self-improving DIA methods and beyond.
-