Transcription Guidelines for Ground Truth OCR-D: DFG-funded Initiative for Optical Character Recognition Development
How to Transcribe in Level 2
If the text to be transcribed can be recorded with Unicode characters, these must be used
If the character can only be formed by combining two characters, this combination must be
Apart from the vocal ligatures, all ligatures are split.
Typographical peculiarities are to be documented as formatting details. This
includes all non-vocal ligatures.
If the character cannot be formed from the combination of several characters and if a MUFI
equivalent exists, use MUFI.
If options 1, 2, 4 are not possible, a code definition shall be used in consultation with the
OCR-D Coordination project following the joint agreements reached on major
international projects such as IMPACT, EEBO, ECCO.