Brief Description & Project Goals

Coordinated Funding Initiative for the Further Development of Methods for Optical Character Recognition (OCR) “

Short name: OCR-D

Project Partners

Duke August Library Wolfenbüttel](http://www.hab.de/), Berlin-Brandenburg Academy of Sciences and Humanities, State Library of Berlin Prussian Cultural Heritage, Karlsruhe Institute of Technology
(Project managers and contact persons see Contact)

Project Duration

2015–2020

Funded by

German Research Foundation Scientific Literature and Information Systems (LIS)

Project Progress

The “Coordinated Funding Initiative for the Further Development of Optical Character Recognition Methods” (OCR-D) began its first project phase in the third quarter of 2015. Requirements for the further development of automatic text recognition were collected and analyzed in six work packages. The work finally led to the DFG’s call for proposals “Scalable methods of text and structure recognition for the full text digitisation of historical prints” in March 2017. The approval of eight (module) projects by the DFG at the end of December 2017 marks the end of the first and the beginning of the second project phase, in which the module projects are coordinated and supported and their project results are tested and integrated. In order to be able to fulfill the tasks of the coordination project over the entire duration of all module projects, the DFG approved an extension of the project until July 2020.

Goals

An essential main goal of OCR-D is the conceptual preparation of the transformation of VD-prints (16th–19th century) into machine-readable form and the provision of the necessary tools.

In order to achieve this, the coordination project and the module projects aim to meet the following objectives:

the creation of reference corpora for training and testing
the development of standards in the areas of metadata, documentation and ground truth
the further development of individual processing steps, with a particular focus on Optical Layout Recognition (OLR)
the analysis of existing tools and their further development
the creation of an OCR-D framework
the establishment of quality assurance procedures

**At the end of the overall project, a software package for the OCR processing of digital copies of the printed German cultural heritage of the 16th to 19th centuries will be made available. Furthermore an accompanying concept will provide answers to technical, information scientifical and organisational questions regarding the possible mass processing of these data.