Decisions in OCR-D

[2022] To execute the benchmarking, we will create several corpora with different characteristics (font, creation date, layout, …) and run different workflows with these as input. The result is then displayed in the QUIVER workflow tab. The corpora will be publicly available for better transparency.

[2022] Relevant metrics for the minimum viable product (MVP) will be:

CER
WER
Bag of Words
Reading order
IoU
CPU time
wall time
I/O
Memory Usage
Disc usage

[2022] The benchmarking will be executed automatically in a regular interval to measure if changes in the processors improve the result. This might be done via CI, GitHub Actions or as a CRON job on a separate server.

Decisions in OCR-D

Terminology

General decisions

Workflow format

Web API

QUIVER

Benchmarking

OCR-D/core

METS server

Decentralized resource list

Page-wise processing

ocrd_all Docker deployment

Supported Python versions

Base OS image

Software libraries

calamari

pillow

tensorflow

torch

bash