OCR-D Phase III started

Lena Hinrichsen Aug 6, 2021 🏷 Phase 3

On 30 July, our kick-off workshop took place, heralding phase III of OCR-D.

The day before, the project participants met internally to get to know each other and coordinate their work. On the public workshop day, the team of the Coordination Project gave an introduction into the objectives in phase III and public communication channels of OCR-D, the current status and plans of the OCR-D software, the Web API and the handling of Ground Truth Data in OCR-D. Also, the Coordination Project gave an insight into Best Practices of Software Developing in the past phase of OCR-D, as well as ideas for the community, how to contribute.

In addition, the implementation and module projects presented themselves in short presentations to the interested community and our cooperation partners

UB Braunschweig, SLUB Dresden UB Mannheim are extending both OCR-D and Kitodo for productive mass digitisation; SUB Göttingen and GWDG are working on Performance Optimisation and Integration, deploying OCR-D on a High Performance Cluster; GEI Braunschweig, HCI and ZPD of the University of Würzburg will implement OCR-D features in OCR4all, making OCR-D available via their software; the ULB Sachsen-Anhalt will implement OCR-D in their Open Source mass digitization infrastructure . While these project partners will work on four implementation scenarios, we have three module projects, improving OCR-D processors: UB Mannheim enabling work-specific training with Tesseract and Calamari; JGU Mainz and FAU Erlangen-Nürnberg improving font group recognition for better fitting OCR-models; and OLA-HD by SUB Göttingen and GWDG, optimising reliability, searchability and fine-grained referencing of the OLA-HD long-term archiving repository.

In our chat channel, the gitter lobby, we always keep you informed about public OCR-D events. Further information about how to stay in touch and contribute to OCR-D can be found in our overview of platforms.