Project

Back to overview

HisDoc: Historical Document Analysis, Recognition, and Retrieval

Applicant Ingold Rolf
Number 125220
Funding scheme Sinergia
Research institution Département d'Informatique Université de Fribourg
Institution of higher education University of Fribourg - FR
Main discipline Information Technology
Start/End 01.05.2009 - 30.06.2013
Approved amount 494'864.00
Show all

Keywords (3)

text recognition; information retrieval; document image analysis

Lay Summary (English)

Lead
Lay summary
The project brings together the expertise and experience of three different research groups in the fields of document image analysis, handwriting recognition, and information retrieval, respectively. The objective is to develop tools to support cultural heritage preservation. In particular, its aim is to make historical documents electronically available for access via the Internet.A lot of historical documents become available in form of digital images. However, to become searchable via the Internet, two major analysis steps are required: first the extraction of meta-information and second the document transcription. The extraction of meta-information includes segmentation, which serves the purpose of locating individual text lines on a page. This step is a necessary requirement for automatic document transcription, which goal is to extract strings of Unicode characters from its corresponding image. The tools being developed in this project aim at performing these two steps automatically. Once the transcription of a document has been created, its content and meta-information can be made available for being searched via a browser. However to take into account possible misrecognized words in the transcription, novel retrieval and browsing paradigms must be investigated.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Publications

Publication
A novel word spotting method based on recurrent neural networks
Frinken V., Fischer A., Manmatha R., Bunke H. (2012), A novel word spotting method based on recurrent neural networks, in IEEE Trans. PAMI, 34(2), 211-224.
Binarization-free text line segmentation for historical documents based on interest point clustering
Garz A., Fischer A., Sablatnig R., Bunke H. (2012), Binarization-free text line segmentation for historical documents based on interest point clustering, in Proc. 10th Int. Workshop on Document Analysis Systems.
Etude comparative de l'efficacité du dépistage de l'information dans des manuscrits médiévaux
Naji N., Savoy J. (2012), Etude comparative de l'efficacité du dépistage de l'information dans des manuscrits médiévaux, in Proc. 11th Int. Conf. on Statistical Analysis of Textual Data.
Lexicon-free handwritten word spotting using character {HMM}s
Fischer A., Keller A., Frinken V., Bunke H. (2012), Lexicon-free handwritten word spotting using character {HMM}s, in Pattern Recognition Letters, 33(7), 934-942.
A keyword spotting approach using blurred shape model-based descriptors
Fornés A., Frinken V., Fischer A., Almazan J., Jackson G., Bunke H. (2011), A keyword spotting approach using blurred shape model-based descriptors, in Proc. 1st Int. Workshop on Historical Document Imaging and Processing.
Character prototype selection for handwriting recognition in historical documents with graph similarity features
Fischer A., Bunke H. (2011), Character prototype selection for handwriting recognition in historical documents with graph similarity features, in Proc. 19th European Signal Processing Conference.
Comparative Information Retrieval Evaluation for Scanned Documents
Naji N., Savoy J. (2011), Comparative Information Retrieval Evaluation for Scanned Documents, in Proc. 15th WSEAS Int. Conf. on Computers.
Co-training for handwritten word recognition
Frinken V., Fischer A., Bunke H., Fornés A. (2011), Co-training for handwritten word recognition, in Proc. 11th Int. Conf. on Document Analysis and Recognition.
HMM-based alignment of inaccurate transcriptions for historical documents
Fischer A., Indermühle E., Frinken V., Bunke H. (2011), HMM-based alignment of inaccurate transcriptions for historical documents, in Proc. 11th Int. Conf. on Document Analysis and Recognition.
Information Retrieval Strategies for Digitized Handwritten Medieval Documents
Naji N., Savoy J. (2011), Information Retrieval Strategies for Digitized Handwritten Medieval Documents, in Proc. 7th Asia Information Retrieval Societies Conference.
Recherche d'information dans un corpus bruité ({OCR})
Naji N., Savoy J., Dolamic L. (2011), Recherche d'information dans un corpus bruité ({OCR}), in Proc. 8ème Conférence en Recherche d’Information et Applications,.
Transcription alignment of Latin manuscripts using hidden Markov models
Fischer A., Frinken V., Fornés A., Bunke H. (2011), Transcription alignment of Latin manuscripts using hidden Markov models, in Proc. 1st Int. Workshop on Historical Document Imaging and Processing.
A Binarization-Free Clustering Approach to Segment Curved Text Lines in Historical Manuscripts.
Garz A, Fischer A, Bunke H, Ingold R, A Binarization-Free Clustering Approach to Segment Curved Text Lines in Historical Manuscripts., in Int. Conf. on Document Analysis and Recognition, Washington.
Back to Our Roots for Retrieving Very Short Passages
Naji N, Savoy J, Back to Our Roots for Retrieving Very Short Passages, in Annual Meeting of the Association for Information Science and Technology.
Evaluation of SVM, MLP and GMM Classifiers for Layout Analysis of Historical Documents
Wei H, Baechler B, Slimane F, Ingold R, Evaluation of SVM, MLP and GMM Classifiers for Layout Analysis of Historical Documents, in Int. Conf. on Document Analysis and Recognition, Washington.
HisDoc: Historical Document Analysis, Recognition, and Retrieval
Fischer A., Bunke H., Baechler M., Naji N., Ingold R., Savoy J., HisDoc: Historical Document Analysis, Recognition, and Retrieval, in Proc. Digital Humanities.
Long-Short Term Memory Neural Networks Language Modeling for Handwriting Recognition
Frinken V., Zamora-Martinez F., España-Boquera S., Castro-Bleda M. J., Fischer A., Bunke H., Long-Short Term Memory Neural Networks Language Modeling for Handwriting Recognition, in Proc. 21st Int. Conf. on Pattern Recognition.
Multi Resolution Layout Analysis of Medieval Manuscripts Using Dynamic {MLP}
Baechler M., Ingold R., Multi Resolution Layout Analysis of Medieval Manuscripts Using Dynamic {MLP}, in Proc. 11th Int. Conf. on Document Analysis and Recognition.
Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting
Frinken V., Baumgartner M., Fischer A., Bunke H., Semi-Supervised Learning for Cursive Handwriting Recognition using Keyword Spotting, in Proc. 13th Int. Conf. on Frontiers in Handwriting Recognition.
Text Line Extraction using DMLP Classifiers for Historical Manuscripts
Baechler M, Liwicki M, Ingold R, Text Line Extraction using DMLP Classifiers for Historical Manuscripts, in Int. Conf. on Document Analysis and Recognition, Washington.
The HisDoc Project: Automatic Analysis, Recognition, and Retrieval of Handwritten Historical Documents for Digital Libraries
Fischer A., Bunke H., Baechler M., Naji N., Ingold R., Savoy J., The HisDoc Project: Automatic Analysis, Recognition, and Retrieval of Handwritten Historical Documents for Digital Libraries, in Proc. InterNational and InterDisciplinary Aspects of Scholarly Editing.

Scientific events



Self-organised

Title Date Place
HisDoc Seminar III 22.06.2012 Fribourg

Associated projects

Number Title Start Funding scheme
150173 HisDoc 2.0 : Towards Computer-Assisted Paleography 01.01.2014 Project funding
169618 HisDoc III : Large-Scale Historical Document Classification 01.01.2017 Project funding
157931 Wolfram von Eschenbach, ›Parzival‹. Eine neue textkritische Ausgabe in elektronischer und gedruckter Form 01.12.2017 Editionen
152977 Die Fassungen von Wolframs ›Parzival‹ in Bezug zur Textgenese und zur französischen Vorlage. Eine Ausgabe in synoptischer Form 01.12.2014 Project funding (special)
134925 Die Fassung *m im Kontext der Fassungen von Wolframs ›Parzival‹. Eine Ausgabe in synoptischer Form (D-A-CH) 01.12.2011 Project funding

Abstract

This research proposal brings together the expertise and experience of three different research groups in the fields of document image analysis, handwriting recognition, and information retrieval, respectively. The objective of the proposal is to develop tools that are needed to build systems that will aid in cultural heritage preservation. In particular, we aim at the problem of making historical documents electronically available for access via the Internet.
-