OCR-D Quick Start Guide

Open your Ubuntu Terminal

On Ubuntu, open your Terminal.
On Windows, install WSL, Ubuntu and Docker Desktop by following these steps:

  1. Install WSL 2 by opening the PowerShell and running:
    wsl --install
    
  2. Download and install Ubuntu 22.04.2 LTS from Microsoft App Store.

  3. Open Ubuntu 22.04.2 LTS and follow the instructions.

  4. Install Docker Desktop and set it up for WSL 2.

  5. Make sure, Docker Desktop is running.

Install and set up Docker

In the Ubuntu shell, run:

docker ps

If the command is not found, you may need to install Docker first.

Further Requisites

  1. Set up Docker :
    sudo systemctl enable docker
    sudo usermod -aG docker $USER
    
  2. Install OCR-D via Docker
    docker pull ocrd/all:maximum
    
  3. Download example data from Github
    mkdir ocr-d
    wget https://github.com/OCR-D/gt_structure_text/releases/download/v1.5.0/euler_rechenkunst01_1738.ocrd.zip
    unzip euler_rechenkunst01_1738.ocrd.zip -d ocr-d/euler_rechenkunst01_1738
    
  4. Start interactive shell in Docker
    docker run --volume $PWD/ocr-d:/data --volume ocrd-resources:/models -it ocrd/all:maximum bash
    
  5. Download some models:
    ocrd resmgr download ocrd-tesserocr-recognize '*'
    

First minimal workflow with OCR-D

ocrd-tesserocr-recognize -w euler_rechenkunst01_1738 -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021

Congratulations! You ran your first (minimal) OCR-D Workflow.

You will find the results in the directory /data/ocr-d/euler_rechenkunst01_1738 (in the container) or ocr-d/euler_rechenkunst01_1738 (on the host side).

Consult the Setup Guide for more details and other installation methods or jump into the User Guide to learn more about OCR‑D.

Next we will explain the above ocrd-tesserocr-recognize command.

Explanation

The command that called the recognition processor consists of the following parts:

ocrd-tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021
╰─────── 1 ────────────╯ ╰─── 2 ────╯ ╰───── 3 ───────╯ ╰────────── 4 ─────────────╯ ╰────── 5 ────────╯ ╰───── 6 ───────╯
  1. ocrd-tesserocr-recognize is the name of the processor executable used.
  2. -I is followed by the name of the input file group (and directory); here: images.
  3. -O is followed by the name of the output file group (and directory); here: binarised images and PAGE-XML files with the recognised text.
  4. -P segmentation_level region is a parameter name/value pair; here: tells the processor to start segmentation on the level of regions (so no prior layout analysis annotating text lines in PAGE-XML is required).
  5. -P find_tables true … here: enables layout detection of tables.
  6. -P model frak2021 … here: use the named resource frak2021.traineddata for recognition.