OCR-D Quick Start Guide

Open your Ubuntu Terminal

On Ubuntu, open your Terminal.
On Windows, install WSL, Ubuntu and Docker Desktop by following these steps:

Install WSL 2 by opening the PowerShell and running:
```
wsl --install
```
Download and install Ubuntu 22.04.2 LTS from Microsoft App Store.
Open Ubuntu 22.04.2 LTS and follow the instructions.
Install Docker Desktop and set it up for WSL 2.
Make sure, Docker Desktop is running.

Install and set up Docker

In the Ubuntu shell, run:

docker ps

If the command is not found, you may need to install Docker first.

Further Requisites

Set up Docker :

sudo systemctl enable docker
sudo usermod -aG docker $USER

Install OCR-D via Docker
```
docker pull ocrd/all:maximum
```

Download example data from Github

mkdir ocr-d
wget https://github.com/OCR-D/gt_structure_text/releases/download/v1.5.0/euler_rechenkunst01_1738.ocrd.zip
unzip euler_rechenkunst01_1738.ocrd.zip -d ocr-d/euler_rechenkunst01_1738

Start interactive shell in Docker

docker run --volume $PWD/ocr-d:/data --volume ocrd-resources:/models -it ocrd/all:maximum bash

Download some models:

ocrd resmgr download ocrd-tesserocr-recognize '*'

First minimal workflow with OCR-D

ocrd-tesserocr-recognize -w euler_rechenkunst01_1738 -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021

Congratulations! You ran your first (minimal) OCR-D Workflow.

You will find the results in the directory /data/ocr-d/euler_rechenkunst01_1738 (in the container) or ocr-d/euler_rechenkunst01_1738 (on the host side).

Consult the Setup Guide for more details and other installation methods or jump into the User Guide to learn more about OCR‑D.

Next we will explain the above ocrd-tesserocr-recognize command.

Explanation

The command that called the recognition processor consists of the following parts:

ocrd-tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021
╰─────── 1 ────────────╯ ╰─── 2 ────╯ ╰───── 3 ───────╯ ╰────────── 4 ─────────────╯ ╰────── 5 ────────╯ ╰───── 6 ───────╯

ocrd-tesserocr-recognize is the name of the processor executable used.
-I is followed by the name of the input file group (and directory); here: images.
-O is followed by the name of the output file group (and directory); here: binarised images and PAGE-XML files with the recognised text.
-P segmentation_level region is a parameter name/value pair; here: tells the processor to start segmentation on the level of regions (so no prior layout analysis annotating text lines in PAGE-XML is required).
-P find_tables true … here: enables layout detection of tables.
-P model frak2021 … here: use the named resource frak2021.traineddata for recognition.