Decisions in OCR-D

In a software project, especially a highly distributed one like OCR-D, decisions need to be made on the technology used, how interfaces should interoperate and how the software as a whole is designed.

In this document, such decisions on key aspects of OCR-D are discussed for the benefit of all OCR-D stakeholders.

Terminology

General decisions

Workflow format

Web API

QUIVER

Benchmarking

OCR-D/core

METS server

The current approach to file management requires processors accessing a single METS file on disk, which turns file management into a bottleneck for workflows.

To alleviate this, we will develop an HTTP server that provides asynchronous and parallel access to the METS in Q4 2022.

Decentralized resource list

We currently maintain a list of processor resources centrally in OCR-D/core.

In Q3 2022, to allow processor developers to maintain their own separate list of resources, we have implemented mechanisms to store resource lists in a processor’s ocrd-tool.json and bundle resources in their own module directory.

By Q1 2023 we should have updated all the processors and whittled down the central list to a mostly empty list.

Page-wise processing

Currently, processors iterate through the files of a workspace by looping through all the files in the input file group(s) themselves.

In Q1 2023 we will refactor the processor API, deprecate the current approach of processors iterating in a process method and enable processors to process individual pages in a process_page method.

ocrd_all Docker deployment

Supported Python versions

Base OS image

Software libraries

calamari

pillow

tensorflow

torch

bash