About
OCR-D Phase III The OCR-D-project Get in touch! Blog Publications and Presentations Module Projects Data Initial tests User Survey Imprint
Developers
Ground Truth Guidelines PAGE-XML format documentation OCR-D development best practices Specifications OCR-D/core API Documentation
Users
Setup Guide User Guide Workflows Models Glossary
FAQ
  de  
Page-number
Richtlinien zur Transkription für Ground Truth Transcription Guidelines for Ground Truth
OCR-D: DFG-funded Initiative for Optical Character Recognition Development

  • The Ground-Truth-Guidelines
  • Conventions for these Guidelines
  • Transcription
  • Layout and Structure
    • General Information
    • Print Space
    • Page Margin
    • ReadingOrder
    • Typographical Peculiarities
    • First Step : The Page Types
    • Second Step: Page Regions
      • Level 1
      • Level 2
      • Relations
      • TextRegion
        • Paragraph
        • Heading
        • Column header (header)
        • Page-number
        • Marginalia
        • Footnote (footnote / footnote-continued / endnote)
        • Initial (drop-capital)
        • Signature mark
        • Catch-word
        • Floating Elements in the Print Space (floating)
        • Table of Content (TOC-entry)
      • Illustrations, photos (ImageRegion)
      • Book decoration, drawings (GraphicRegion)
      • Separation Lines, Separators (SeparatorRegion)
      • Tables (TableRegion)
      • Mathematical characters (MathsRegion)
      • Chemical symbols (ChemRegion)
      • Notes (MusicRegion)
      • Advertisement (AdvertRegion)
      • Damage, Dirt, Stains, Noise (NoiseRegion)
      • Other (UnknownRegion)
  • Documentation of the OCR-D Structure Ground Truth
  • Documentation of the PAGE XML Format for Page Content
  • Page XML Extensions
  • Imprint

Page-number

Page numbers are treated as a distinct text region and marked as page numbers no matter where they are placed on the page. Although it is actually a dead column header, in the context of ground truth trancription it is always treated as a page number, separately from any other column headers.

Parent topic: TextRegion
Related information
  • Column header (header)
  • Complex Type pc:PageType
  • Simple Type pc:TextTypeSimpleType
Published by OCR-D.

The guidelines for Ground Truth transcription are based on the OCR-D specs v3.4.0

DFG logo
GitHub | gitter | Wiki | Docker Hub | Technology Watch | sitemap.xml