Call for OCR-D module project proposals

Mar 6, 2017

The call for module projects within the framework of OCR-D can now be found online on the website of the German Research Foundation (DFG) (link to the call)

The aim of the OCR-D coordination project, which was launched in autumn 2015, is to describe procedures and develop guidelines in order to achieve an optimal workflow and the greatest possible standardisation of OCR-related processes and metadata. Furthermore, the complete transformation of the written German cultural heritage into a machine-readable form (structured full text) is to be prepared conceptually. Primarily, works from the Union Catalogue of Books Printed in German Speaking Countries in the 16th-18th century (VD) as well as books published in the 19th century in the German language area will be considered. The VD projects comprise about 1 million titles that are currently being digitized and are to be processed by means of OCR in the future.

In the first project phase of OCR-D, development needs for automatic text recognition processes were identified. Based on this, the DFG is now issuing calls for proposals for six module project topics, which will be managed in terms of content and technology by the OCR-D coordination project. The following topics are announced:

Image presorting Layout recognition Text optimization Model training Long-term archiving and persistence Quality assurance

In order to get an impression of the material to be treated, we provide Ground-Truth data (link to the data).