OCR-D Quick Start Guide
Open your Ubuntu Terminal
On Ubuntu, open your Terminal.
On Windows, install WSL, Ubuntu and Docker Desktop by following these steps:
- Install WSL 2 by opening the PowerShell and running:
wsl --install
-
Download and install Ubuntu 22.04.2 LTS from Microsoft App Store.
-
Open Ubuntu 22.04.2 LTS and follow the instructions.
- Make sure, Docker Desktop is running.
Install and set up Docker
In the Ubuntu shell, run:
docker ps
If the command is not found, you may need to install Docker first.
Further Requisites
- Set up Docker :
sudo systemctl enable docker sudo usermod -aG docker $USER
- Install OCR-D via Docker
docker pull ocrd/all:maximum
- Download example data from Github
mkdir ocr-d wget https://github.com/OCR-D/gt_structure_text/releases/download/v1.5.0/euler_rechenkunst01_1738.ocrd.zip unzip euler_rechenkunst01_1738.ocrd.zip -d ocr-d/euler_rechenkunst01_1738
- Start interactive shell in Docker
docker run --volume $PWD/ocr-d:/data --volume ocrd-resources:/models -it ocrd/all:maximum bash
- Download some models:
ocrd resmgr download ocrd-tesserocr-recognize '*'
First minimal workflow with OCR-D
ocrd-tesserocr-recognize -w euler_rechenkunst01_1738 -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021
Congratulations! You ran your first (minimal) OCR-D Workflow.
You will find the results in the directory
/data/ocr-d/euler_rechenkunst01_1738
(in the container) or
ocr-d/euler_rechenkunst01_1738
(on the host side).
Consult the Setup Guide for more details and other installation methods or jump into the User Guide to learn more about OCR‑D.
Next we will explain the above ocrd-tesserocr-recognize
command.
Explanation
The command that called the recognition processor consists of the following parts:
ocrd-tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR-TESS -P segmentation_level region -P find_tables true -P model frak2021
╰─────── 1 ────────────╯ ╰─── 2 ────╯ ╰───── 3 ───────╯ ╰────────── 4 ─────────────╯ ╰────── 5 ────────╯ ╰───── 6 ───────╯
ocrd-tesserocr-recognize
is the name of the processor executable used.-I
is followed by the name of the input file group (and directory); here: images.-O
is followed by the name of the output file group (and directory); here: binarised images and PAGE-XML files with the recognised text.-P segmentation_level region
is a parameter name/value pair; here: tells the processor to start segmentation on the level of regions (so no prior layout analysis annotating text lines in PAGE-XML is required).-P find_tables true
… here: enables layout detection of tables.-P model frak2021
… here: use the named resourcefrak2021.traineddata
for recognition.