Lead


Lay summary
The project brings together the expertise and experience of three different research groups in the fields of document image analysis, handwriting recognition, and information retrieval, respectively. The objective is to develop tools to support cultural heritage preservation. In particular, its aim is to make historical documents electronically available for access via the Internet.A lot of historical documents become available in form of digital images. However, to become searchable via the Internet, two major analysis steps are required: first the extraction of meta-information and second the document transcription. The extraction of meta-information includes segmentation, which serves the purpose of locating individual text lines on a page. This step is a necessary requirement for automatic document transcription, which goal is to extract strings of Unicode characters from its corresponding image. The tools being developed in this project aim at performing these two steps automatically. Once the transcription of a document has been created, its content and meta-information can be made available for being searched via a browser. However to take into account possible misrecognized words in the transcription, novel retrieval and browsing paradigms must be investigated.