The corpus contains only structural data. On the digital copy individual regions are labelled according to the PAGE scheme.
The OCR-D structure Ground Truth corpus contains publications from the period 1500 to 1900. On the digital copy individual regions are lebelled according to the PAGE scheme. In addition, individual pages are categorized according to their content.