ocrd_models.ocrd_page module

API to PAGE-XML, generated with generateDS from XML schema.

ocrd_models.ocrd_page.parse(inFileName, silence=False, print_warnings=True)[source]

Parse a file, create the object tree, and export it.

Parameters:
  • inFileName (str)

  • print_warnings (boolean)

Returns:

The root object in the tree.

ocrd_models.ocrd_page.parseEtree(inFileName, silence=False, print_warnings=True, mapping=None, reverse_mapping=None, nsmap=None)[source]

Parse a file, create the object tree, and export it. Return tree and mappings, too.

Parameters:
  • inFileName (str)

  • print_warnings (boolean)

Returns:

A tuple of
  • The root object in the tree.

  • The full node tree.

  • A mapping from object IDs to tree nodes.

  • A reverse mapping from tree nodes to object IDs.

ocrd_models.ocrd_page.parseString(inString, silence=False, print_warnings=True)[source]

Parse a string, create the object tree, and export it.

Parameters:

inString (str)

Returns:

The root object in the tree.

class ocrd_models.ocrd_page.OcrdPage(pcgts: PcGtsType, etree: _Element, mapping: Dict[str, _Element], revmap: Dict[_Element, Any])[source]

Bases: object

Proxy object for ocrd_models.PcGtsType (i.e. PRImA PAGE-XML for page content, rendered as object model by generateDS) that also offers access to the underlying etree, element-node mapping and reverse mapping, too (cf. ocrd_models.ocrd_page.parseEtree())

class ocrd_models.ocrd_page.AdvertRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, bgColour=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

AdvertRegionType – Regions containing advertisements.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • bgColour – The background colour of the region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='AdvertRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='AdvertRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.AlternativeImageType(filename=None, comments=None, conf=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

conf – Confidence value (between 0 and 1)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_filename()[source]
set_filename(filename)[source]
get_comments()[source]
set_comments(comments)[source]
get_conf()[source]
set_conf(conf)[source]
validate_ConfSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='AlternativeImageType', pretty_print=True)[source]
to_etree(parent_element=None, name_='AlternativeImageType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.BaselineType(points=None, conf=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

conf – Confidence value (between 0 and 1)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_points()[source]
set_points(points)[source]
get_conf()[source]
set_conf(conf)[source]
validate_PointsType(value)[source]
validate_PointsType_patterns_ = [['^(([0-9]+,[0-9]+ )+([0-9]+,[0-9]+))$']]
validate_ConfSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='BaselineType', pretty_print=True)[source]
to_etree(parent_element=None, name_='BaselineType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.BorderType(Coords=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

BorderType – Border of the actual page (if the scanned image contains parts not belonging to the page).

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Coords()[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='BorderType', pretty_print=True)[source]
to_etree(parent_element=None, name_='BorderType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_Coords(Coords)[source]

Set coordinate polygon by given CoordsType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

class ocrd_models.ocrd_page.ChartRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, type_=None, numColours=None, bgColour=None, embText=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

ChartRegionType – Regions containing charts or graphs of any type, should be marked as chart regions.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • type – The type of chart in the region

  • numColours – An approximation of the number of colours used in the region

  • bgColour – The background colour of the region

  • embText – Specifies whether the region also contains text

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_type()[source]
set_type(type_)[source]
get_numColours()[source]
set_numColours(numColours)[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
get_embText()[source]
set_embText(embText)[source]
validate_ChartTypeSimpleType(value)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='ChartRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='ChartRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.ChemRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, bgColour=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

ChemRegionType – Regions containing chemical formulas.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • bgColour – The background colour of the region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='ChemRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='ChemRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.CoordsType(points=None, conf=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

points – Polygon outline of the element as a path of points. No points may lie outside the outline of its parent, which in the case of Border is the bounding rectangle of the root image. Paths are closed by convention, i.e. the last point logically connects with the first (and at least 3 points are required to span an area). Paths must be planar (i.e. must not self-intersect).

  • conf – Confidence value (between 0 and 1)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_points()[source]
get_conf()[source]
set_conf(conf)[source]
validate_PointsType(value)[source]
validate_PointsType_patterns_ = [['^(([0-9]+,[0-9]+ )+([0-9]+,[0-9]+))$']]
validate_ConfSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='CoordsType', pretty_print=True)[source]
to_etree(parent_element=None, name_='CoordsType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_points(points)[source]

Set coordinate polygon by given string. Moreover, invalidate the parent’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

class ocrd_models.ocrd_page.CustomRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, type_=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

CustomRegionType – Regions containing content that is not covered by the default types (text, graphic, image, line drawing, chart, table, separator, maths, map, music, chem, advert, noise, unknown).

  • type – Information on the type of content represented by this region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_type()[source]
set_type(type_)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='CustomRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='CustomRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GlyphType(id=None, ligature=None, symbol=None, script=None, production=None, custom=None, comments=None, AlternativeImage=None, Coords=None, Graphemes=None, TextEquiv=None, TextStyle=None, UserDefined=None, Labels=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

script – The script used for the glyph

  • production – Overrides the production attribute of the parent word / text line / text region.

  • custom – For generic use

  • AlternativeImage – Alternative glyph images (e.g. black-and-white)

  • Graphemes – Container for graphemes, grapheme groups and non-printing characters

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_AlternativeImage()[source]
set_AlternativeImage(AlternativeImage)[source]
add_AlternativeImage(value)[source]
insert_AlternativeImage_at(index, value)[source]
replace_AlternativeImage_at(index, value)[source]
get_Coords()[source]
get_Graphemes()[source]
set_Graphemes(Graphemes)[source]
get_TextEquiv()[source]
set_TextEquiv(TextEquiv)[source]
add_TextEquiv(value)[source]
insert_TextEquiv_at(index, value)[source]
replace_TextEquiv_at(index, value)[source]
get_TextStyle()[source]
set_TextStyle(TextStyle)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_ligature()[source]
set_ligature(ligature)[source]
get_symbol()[source]
set_symbol(symbol)[source]
get_script()[source]
set_script(script)[source]
get_production()[source]
set_production(production)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_ScriptSimpleType(value)[source]
validate_ProductionSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GlyphType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GlyphType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
invalidate_AlternativeImage(feature_selector=None)[source]

Remove derived images from this segment (due to changed coordinates).

If feature_selector is not none, remove only images with matching @comments, e.g. feature_selector=cropped,deskewed.

set_Coords(Coords)[source]

Set coordinate polygon by given CoordsType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

class ocrd_models.ocrd_page.GraphemeBaseType(id=None, index=None, ligature=None, charType=None, custom=None, comments=None, TextEquiv=None, extensiontype_=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

GraphemeBaseType – Base type for graphemes, grapheme groups and non-printing characters.

  • index – Order index of grapheme, group, or non-printing character within the parent container (graphemes or glyph or grapheme group).

  • charType – Type of character represented by the grapheme, group, or non-printing character element.

  • custom – For generic use

  • comments – For generic use

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_TextEquiv()[source]
set_TextEquiv(TextEquiv)[source]
add_TextEquiv(value)[source]
insert_TextEquiv_at(index, value)[source]
replace_TextEquiv_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_index()[source]
set_index(index)[source]
get_ligature()[source]
set_ligature(ligature)[source]
get_charType()[source]
set_charType(charType)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
get_extensiontype_()[source]
set_extensiontype_(extensiontype_)[source]
validate_indexType2(value)[source]
validate_charTypeType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GraphemeBaseType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GraphemeBaseType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GraphemeGroupType(id=None, index=None, ligature=None, charType=None, custom=None, comments=None, TextEquiv=None, Grapheme=None, NonPrintingChar=None, gds_collector_=None, **kwargs_)[source]

Bases: GraphemeBaseType

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of GraphemeBaseType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Grapheme()[source]
set_Grapheme(Grapheme)[source]
add_Grapheme(value)[source]
insert_Grapheme_at(index, value)[source]
replace_Grapheme_at(index, value)[source]
get_NonPrintingChar()[source]
set_NonPrintingChar(NonPrintingChar)[source]
add_NonPrintingChar(value)[source]
insert_NonPrintingChar_at(index, value)[source]
replace_NonPrintingChar_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GraphemeGroupType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GraphemeGroupType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GraphemeType(id=None, index=None, ligature=None, charType=None, custom=None, comments=None, TextEquiv=None, Coords=None, gds_collector_=None, **kwargs_)[source]

Bases: GraphemeBaseType

GraphemeType – Represents a sub-element of a glyph. Smallest graphical unit that can be assigned a Unicode code point.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of GraphemeBaseType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Coords()[source]
set_Coords(Coords)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GraphemeType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GraphemeType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GraphemesType(Grapheme=None, NonPrintingChar=None, GraphemeGroup=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

GraphemesType – Container for graphemes, grapheme groups and non-printing characters.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Grapheme()[source]
set_Grapheme(Grapheme)[source]
add_Grapheme(value)[source]
insert_Grapheme_at(index, value)[source]
replace_Grapheme_at(index, value)[source]
get_NonPrintingChar()[source]
set_NonPrintingChar(NonPrintingChar)[source]
add_NonPrintingChar(value)[source]
insert_NonPrintingChar_at(index, value)[source]
replace_NonPrintingChar_at(index, value)[source]
get_GraphemeGroup()[source]
set_GraphemeGroup(GraphemeGroup)[source]
add_GraphemeGroup(value)[source]
insert_GraphemeGroup_at(index, value)[source]
replace_GraphemeGroup_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GraphemesType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GraphemesType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GraphicRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, type_=None, numColours=None, embText=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

GraphicRegionType – Regions containing simple graphics, such as a company logo, should be marked as graphic regions.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • type – The type of graphic in the region

  • numColours – An approximation of the number of colours used in the region

  • embText – Specifies whether the region also contains text.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_type()[source]
set_type(type_)[source]
get_numColours()[source]
set_numColours(numColours)[source]
get_embText()[source]
set_embText(embText)[source]
validate_GraphicsTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GraphicRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GraphicRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.GridPointsType(index=None, points=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

GridPointsType – Points with x,y coordinates. index – The grid row index

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_index()[source]
set_index(index)[source]
get_points()[source]
set_points(points)[source]
validate_PointsType(value)[source]
validate_PointsType_patterns_ = [['^(([0-9]+,[0-9]+ )+([0-9]+,[0-9]+))$']]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GridPointsType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GridPointsType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.GridType(GridPoints=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

GridType – Matrix of grid points defining the table grid on the page.

  • GridPoints – One row in the grid point matrix. Points with x,y coordinates. (note: for a table with n table rows there should be n+1 grid rows)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_GridPoints()[source]
set_GridPoints(GridPoints)[source]
add_GridPoints(value)[source]
insert_GridPoints_at(index, value)[source]
replace_GridPoints_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='GridType', pretty_print=True)[source]
to_etree(parent_element=None, name_='GridType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.ImageRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, colourDepth=None, bgColour=None, embText=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

ImageRegionType – An image is considered to be more intricate and complex than a graphic. These can be photos or drawings.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • colourDepth – The colour bit depth required for the region

  • bgColour – The background colour of the region

  • embText – Specifies whether the region also contains text

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_colourDepth()[source]
set_colourDepth(colourDepth)[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
get_embText()[source]
set_embText(embText)[source]
validate_ColourDepthSimpleType(value)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='ImageRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='ImageRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.LabelType(value=None, type_=None, comments=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

LabelType – Semantic label value – The label / tag (e.g. ‘person’). Can be an RDF resource identifier (e.g. object of an RDF triple).

  • type – Additional information on the label (e.g. ‘YYYY-mm-dd’ for a date label). Can be used as predicate of an RDF triple.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_value()[source]
set_value(value)[source]
get_type()[source]
set_type(type_)[source]
get_comments()[source]
set_comments(comments)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='LabelType', pretty_print=True)[source]
to_etree(parent_element=None, name_='LabelType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.LabelsType(externalModel=None, externalId=None, prefix=None, comments=None, Label=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

externalModel – Reference to external model / ontology / schema

  • externalId – E.g. an RDF resource identifier (to be used as subject or object of an RDF triple)

  • prefix – Prefix for all labels (e.g. first part of an URI)

  • Label – A semantic label / tag

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Label()[source]
set_Label(Label)[source]
add_Label(value)[source]
insert_Label_at(index, value)[source]
replace_Label_at(index, value)[source]
get_externalModel()[source]
set_externalModel(externalModel)[source]
get_externalId()[source]
set_externalId(externalId)[source]
get_prefix()[source]
set_prefix(prefix)[source]
get_comments()[source]
set_comments(comments)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='LabelsType', pretty_print=True)[source]
to_etree(parent_element=None, name_='LabelsType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.LayerType(id=None, zIndex=None, caption=None, RegionRef=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_RegionRef()[source]
set_RegionRef(RegionRef)[source]
add_RegionRef(value)[source]
insert_RegionRef_at(index, value)[source]
replace_RegionRef_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_zIndex()[source]
set_zIndex(zIndex)[source]
get_caption()[source]
set_caption(caption)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='LayerType', pretty_print=True)[source]
to_etree(parent_element=None, name_='LayerType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.LayersType(Layer=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

LayersType – Can be used to express the z-index of overlapping regions. An element with a greater z-index is always in front of another element with lower z-index.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Layer()[source]
set_Layer(Layer)[source]
add_Layer(value)[source]
insert_Layer_at(index, value)[source]
replace_Layer_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='LayersType', pretty_print=True)[source]
to_etree(parent_element=None, name_='LayersType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.LineDrawingRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, penColour=None, bgColour=None, embText=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

LineDrawingRegionType – A line drawing is a single colour illustration without solid areas.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • penColour – The pen (foreground) colour of the region

  • bgColour – The background colour of the region

  • embText – Specifies whether the region also contains text

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_penColour()[source]
set_penColour(penColour)[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
get_embText()[source]
set_embText(embText)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='LineDrawingRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='LineDrawingRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.MapRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

MapRegionType – Regions containing maps.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='MapRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='MapRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.MathsRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, bgColour=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

MathsRegionType – Regions containing equations and mathematical symbols should be marked as maths regions.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • bgColour – The background colour of the region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='MathsRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='MathsRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.MetadataItemType(type_=None, name=None, value=None, date=None, Labels=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

type – Type of metadata (e.g. author)

  • name – E.g. imagePhotometricInterpretation

  • value – E.g. RGB

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_type()[source]
set_type(type_)[source]
get_name()[source]
set_name(name)[source]
get_value()[source]
set_value(value)[source]
get_date()[source]
set_date(date)[source]
validate_typeType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='MetadataItemType', pretty_print=True)[source]
to_etree(parent_element=None, name_='MetadataItemType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.MetadataType(externalRef=None, Creator=None, Created=None, LastChange=None, Comments=None, UserDefined=None, MetadataItem=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

externalRef – External reference of any kind Created – The timestamp has to be in UTC (Coordinated Universal Time) and not local time.

  • LastChange – The timestamp has to be in UTC (Coordinated Universal Time) and not local time.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Creator()[source]
set_Creator(Creator)[source]
get_Created()[source]
set_Created(Created)[source]
get_LastChange()[source]
set_LastChange(LastChange)[source]
get_Comments()[source]
set_Comments(Comments)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_MetadataItem()[source]
set_MetadataItem(MetadataItem)[source]
add_MetadataItem(value)[source]
insert_MetadataItem_at(index, value)[source]
replace_MetadataItem_at(index, value)[source]
get_externalRef()[source]
set_externalRef(externalRef)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:None="http://www.w3.org/2001/XMLSchema" ', name_='MetadataType', pretty_print=True)[source]
to_etree(parent_element=None, name_='MetadataType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.MusicRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, bgColour=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

MusicRegionType – Regions containing musical notations.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • bgColour – The background colour of the region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='MusicRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='MusicRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.NoiseRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

NoiseRegionType – Noise regions are regions where no real data lies, only false data created by artifacts on the document or scanner noise.

member_data_items_ = []
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='NoiseRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='NoiseRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.NonPrintingCharType(id=None, index=None, ligature=None, charType=None, custom=None, comments=None, TextEquiv=None, gds_collector_=None, **kwargs_)[source]

Bases: GraphemeBaseType

NonPrintingCharType – A glyph component without visual representation but with Unicode code point. Non-visual / non-printing / control character. Part of grapheme container (of glyph) or grapheme sub group.

member_data_items_ = []
subclass = None
superclass

alias of GraphemeBaseType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='NonPrintingCharType', pretty_print=True)[source]
to_etree(parent_element=None, name_='NonPrintingCharType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.OrderedGroupIndexedType(id=None, regionRef=None, index=None, caption=None, type_=None, continuation=None, custom=None, comments=None, UserDefined=None, Labels=None, RegionRefIndexed=None, OrderedGroupIndexed=None, UnorderedGroupIndexed=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

OrderedGroupIndexedType – Indexed group containing ordered elements

  • regionRef – Optional link to a parent region of nested regions. The parent region doubles as reading order group. Only the nested regions should be allowed as group members.

  • index – Position (order number) of this item within the current hierarchy level.

  • continuation – Is this group a continuation of another group (from previous column or page, for example)?

  • custom – For generic use

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_RegionRefIndexed()[source]
set_RegionRefIndexed(RegionRefIndexed)[source]
add_RegionRefIndexed(value)[source]
insert_RegionRefIndexed_at(index, value)[source]
replace_RegionRefIndexed_at(index, value)[source]
get_OrderedGroupIndexed()[source]
set_OrderedGroupIndexed(OrderedGroupIndexed)[source]
add_OrderedGroupIndexed(value)[source]
insert_OrderedGroupIndexed_at(index, value)[source]
replace_OrderedGroupIndexed_at(index, value)[source]
get_UnorderedGroupIndexed()[source]
set_UnorderedGroupIndexed(UnorderedGroupIndexed)[source]
add_UnorderedGroupIndexed(value)[source]
insert_UnorderedGroupIndexed_at(index, value)[source]
replace_UnorderedGroupIndexed_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
get_index()[source]
set_index(index)[source]
get_caption()[source]
set_caption(caption)[source]
get_type()[source]
set_type(type_)[source]
get_continuation()[source]
set_continuation(continuation)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_GroupTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupIndexedType', pretty_print=True)[source]
to_etree(parent_element=None, name_='OrderedGroupIndexedType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
get_AllIndexed(classes=None, index_sort=True)[source]

Get all indexed children sorted by their @index.

Parameters:
  • classes (list) – Type of children (sans Indexed) to return. Default: ['RegionRef', 'OrderedGroup', 'UnorderedGroup']

  • index_sort (boolean) – Whether to sort by @index

Returns:

a list of RegionRefIndexedType, OrderedGroupIndexedType, and UnorderedGroupIndexedType

clear_AllIndexed()[source]
extend_AllIndexed(elements, validate_continuity=False)[source]

Add all elements in list elements, respecting @index order. With validate_continuity, check that all new elements come after all old elements (or raise an exception). Otherwise, ensure this condition silently (by increasing @index accordingly).

sort_AllIndexed(validate_uniqueness=True)[source]

Sort all indexed children in-place.

class ocrd_models.ocrd_page.OrderedGroupType(id=None, regionRef=None, caption=None, type_=None, continuation=None, custom=None, comments=None, UserDefined=None, Labels=None, RegionRefIndexed=None, OrderedGroupIndexed=None, UnorderedGroupIndexed=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

OrderedGroupType – Numbered group (contains ordered elements)

  • regionRef – Optional link to a parent region of nested regions. The parent region doubles as reading order group. Only the nested regions should be allowed as group members.

  • continuation – Is this group a continuation of another group (from previous column or page, for example)?

  • custom – For generic use

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_RegionRefIndexed()[source]
set_RegionRefIndexed(RegionRefIndexed)[source]
add_RegionRefIndexed(value)[source]
insert_RegionRefIndexed_at(index, value)[source]
replace_RegionRefIndexed_at(index, value)[source]
get_OrderedGroupIndexed()[source]
set_OrderedGroupIndexed(OrderedGroupIndexed)[source]
add_OrderedGroupIndexed(value)[source]
insert_OrderedGroupIndexed_at(index, value)[source]
replace_OrderedGroupIndexed_at(index, value)[source]
get_UnorderedGroupIndexed()[source]
set_UnorderedGroupIndexed(UnorderedGroupIndexed)[source]
add_UnorderedGroupIndexed(value)[source]
insert_UnorderedGroupIndexed_at(index, value)[source]
replace_UnorderedGroupIndexed_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
get_caption()[source]
set_caption(caption)[source]
get_type()[source]
set_type(type_)[source]
get_continuation()[source]
set_continuation(continuation)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_GroupTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='OrderedGroupType', pretty_print=True)[source]
to_etree(parent_element=None, name_='OrderedGroupType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
get_AllIndexed(classes=None, index_sort=True)[source]

Get all indexed children sorted by their @index.

Parameters:
  • classes (list) – Type of children (sans Indexed) to return. Default: ['RegionRef', 'OrderedGroup', 'UnorderedGroup']

  • index_sort (boolean) – Whether to sort by @index

Returns:

a list of RegionRefIndexedType, OrderedGroupIndexedType, and UnorderedGroupIndexedType

clear_AllIndexed()[source]
extend_AllIndexed(elements, validate_continuity=False)[source]

Add all elements in list elements, respecting @index order. With validate_continuity, check that all new elements come after all old elements (or raise an exception). Otherwise, ensure this condition silently (by increasing @index accordingly).

sort_AllIndexed(validate_uniqueness=True)[source]

Sort all indexed children in-place.

class ocrd_models.ocrd_page.PageType(imageFilename=None, imageWidth=None, imageHeight=None, imageXResolution=None, imageYResolution=None, imageResolutionUnit=None, custom=None, orientation=None, type_=None, primaryLanguage=None, secondaryLanguage=None, primaryScript=None, secondaryScript=None, readingDirection=None, textLineOrder=None, conf=None, AlternativeImage=None, Border=None, PrintSpace=None, ReadingOrder=None, Layers=None, Relations=None, TextStyle=None, UserDefined=None, Labels=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, MapRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

imageFilename – Contains the image file name including the file extension.

  • imageWidth – Specifies the width of the image.

  • imageHeight – Specifies the height of the image.

  • imageXResolution – Specifies the image resolution in width.

  • imageYResolution – Specifies the image resolution in height.

  • imageResolutionUnit – Specifies the unit of the resolution information referring to a standardised unit of measurement (pixels per inch, pixels per centimeter or other).

  • custom – For generic use

  • orientation – The angle the rectangle encapsulating the page (or its Border) has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). (The rotated image can be further referenced via “ AlternativeImage ” .) Range: -179.999,180

  • type – The type of the page within the document (e.g. cover page).

  • primaryLanguage – The primary language used in the page (lower-level definitions override the page-level definition).

  • secondaryLanguage – The secondary language used in the page (lower-level definitions override the page-level definition).

  • primaryScript – The primary script used in the page (lower-level definitions override the page-level definition).

  • secondaryScript – The secondary script used in the page (lower-level definitions override the page-level definition).

  • readingDirection – The direction in which text within lines should be read (order of words and characters), in addition to “ textLineOrder ” (lower-level definitions override the page-level definition).

  • textLineOrder – The order of text lines within a block, in addition to “ readingDirection ” (lower-level definitions override the page-level definition).

  • conf – Confidence value for whole page (between 0 and 1)

  • AlternativeImage – Alternative document page images (e.g. black-and-white).

  • ReadingOrder – Order of blocks within the page.

  • Layers – Unassigned regions are considered to be in the (virtual) default layer which is to be treated as below any other layers.

  • TextStyle – Default text style

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_AlternativeImage()[source]
set_AlternativeImage(AlternativeImage)[source]
add_AlternativeImage(value)[source]
insert_AlternativeImage_at(index, value)[source]
replace_AlternativeImage_at(index, value)[source]
get_Border()[source]
get_PrintSpace()[source]
set_PrintSpace(PrintSpace)[source]
get_ReadingOrder()[source]
set_ReadingOrder(ReadingOrder)[source]
get_Layers()[source]
set_Layers(Layers)[source]
get_Relations()[source]
set_Relations(Relations)[source]
get_TextStyle()[source]
set_TextStyle(TextStyle)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_TextRegion()[source]
set_TextRegion(TextRegion)[source]
add_TextRegion(value)[source]
insert_TextRegion_at(index, value)[source]
replace_TextRegion_at(index, value)[source]
get_ImageRegion()[source]
set_ImageRegion(ImageRegion)[source]
add_ImageRegion(value)[source]
insert_ImageRegion_at(index, value)[source]
replace_ImageRegion_at(index, value)[source]
get_LineDrawingRegion()[source]
set_LineDrawingRegion(LineDrawingRegion)[source]
add_LineDrawingRegion(value)[source]
insert_LineDrawingRegion_at(index, value)[source]
replace_LineDrawingRegion_at(index, value)[source]
get_GraphicRegion()[source]
set_GraphicRegion(GraphicRegion)[source]
add_GraphicRegion(value)[source]
insert_GraphicRegion_at(index, value)[source]
replace_GraphicRegion_at(index, value)[source]
get_TableRegion()[source]
set_TableRegion(TableRegion)[source]
add_TableRegion(value)[source]
insert_TableRegion_at(index, value)[source]
replace_TableRegion_at(index, value)[source]
get_ChartRegion()[source]
set_ChartRegion(ChartRegion)[source]
add_ChartRegion(value)[source]
insert_ChartRegion_at(index, value)[source]
replace_ChartRegion_at(index, value)[source]
get_MapRegion()[source]
set_MapRegion(MapRegion)[source]
add_MapRegion(value)[source]
insert_MapRegion_at(index, value)[source]
replace_MapRegion_at(index, value)[source]
get_SeparatorRegion()[source]
set_SeparatorRegion(SeparatorRegion)[source]
add_SeparatorRegion(value)[source]
insert_SeparatorRegion_at(index, value)[source]
replace_SeparatorRegion_at(index, value)[source]
get_MathsRegion()[source]
set_MathsRegion(MathsRegion)[source]
add_MathsRegion(value)[source]
insert_MathsRegion_at(index, value)[source]
replace_MathsRegion_at(index, value)[source]
get_ChemRegion()[source]
set_ChemRegion(ChemRegion)[source]
add_ChemRegion(value)[source]
insert_ChemRegion_at(index, value)[source]
replace_ChemRegion_at(index, value)[source]
get_MusicRegion()[source]
set_MusicRegion(MusicRegion)[source]
add_MusicRegion(value)[source]
insert_MusicRegion_at(index, value)[source]
replace_MusicRegion_at(index, value)[source]
get_AdvertRegion()[source]
set_AdvertRegion(AdvertRegion)[source]
add_AdvertRegion(value)[source]
insert_AdvertRegion_at(index, value)[source]
replace_AdvertRegion_at(index, value)[source]
get_NoiseRegion()[source]
set_NoiseRegion(NoiseRegion)[source]
add_NoiseRegion(value)[source]
insert_NoiseRegion_at(index, value)[source]
replace_NoiseRegion_at(index, value)[source]
get_UnknownRegion()[source]
set_UnknownRegion(UnknownRegion)[source]
add_UnknownRegion(value)[source]
insert_UnknownRegion_at(index, value)[source]
replace_UnknownRegion_at(index, value)[source]
get_CustomRegion()[source]
set_CustomRegion(CustomRegion)[source]
add_CustomRegion(value)[source]
insert_CustomRegion_at(index, value)[source]
replace_CustomRegion_at(index, value)[source]
get_imageFilename()[source]
set_imageFilename(imageFilename)[source]
get_imageWidth()[source]
set_imageWidth(imageWidth)[source]
get_imageHeight()[source]
set_imageHeight(imageHeight)[source]
get_imageXResolution()[source]
set_imageXResolution(imageXResolution)[source]
get_imageYResolution()[source]
set_imageYResolution(imageYResolution)[source]
get_imageResolutionUnit()[source]
set_imageResolutionUnit(imageResolutionUnit)[source]
get_custom()[source]
set_custom(custom)[source]
get_orientation()[source]
get_type()[source]
set_type(type_)[source]
get_primaryLanguage()[source]
set_primaryLanguage(primaryLanguage)[source]
get_secondaryLanguage()[source]
set_secondaryLanguage(secondaryLanguage)[source]
get_primaryScript()[source]
set_primaryScript(primaryScript)[source]
get_secondaryScript()[source]
set_secondaryScript(secondaryScript)[source]
get_readingDirection()[source]
set_readingDirection(readingDirection)[source]
get_textLineOrder()[source]
set_textLineOrder(textLineOrder)[source]
get_conf()[source]
set_conf(conf)[source]
validate_imageResolutionUnitType(value)[source]
validate_PageTypeSimpleType(value)[source]
validate_LanguageSimpleType(value)[source]
validate_ScriptSimpleType(value)[source]
validate_ReadingDirectionSimpleType(value)[source]
validate_TextLineOrderSimpleType(value)[source]
validate_ConfSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='PageType', pretty_print=True)[source]
to_etree(parent_element=None, name_='PageType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
property id
get_AllRegions(classes=None, order='document', depth=0)[source]

Get all the *Region elements, or only those provided by classes. Return in document order, unless order is reading-order.

Parameters:
  • classes (list) – Classes of regions that shall be returned, e.g. ['Text', 'Image']

  • order ("document"|"reading-order"|"reading-order-only") – Whether to return regions sorted by document order (document, default) or by reading order with regions not in the reading order at the end of the returned list (reading-order) or regions not in the reading order omitted (reading-order-only)

  • depth (int) – Recursive depth to look for regions at, set to 0 for all regions at any depth. Default: 0

Returns:

a list of TextRegionType, ImageRegionType, LineDrawingRegionType, GraphicRegionType, TableRegionType, ChartRegionType, MapRegionType, SeparatorRegionType, MathsRegionType, ChemRegionType, MusicRegionType, AdvertRegionType, NoiseRegionType, UnknownRegionType, and/or CustomRegionType

For example, to get all text anywhere on the page in reading order, use:

'\n'.join(line.get_TextEquiv()[0].Unicode
          for region in page.get_AllRegions(classes=['Text'], depth=0, order='reading-order')
          for line in region.get_TextLine())
get_AllAlternativeImages(page=True, region=True, line=True, word=True, glyph=True)[source]

Get all the pc:AlternativeImage in a document

Parameters:
  • page (boolean) – Get images on pc:Page level

  • region (boolean) – Get images on pc:*Region level

  • line (boolean) – Get images on pc:TextLine level

  • word (boolean) – Get images on pc:Word level

  • glyph (boolean) – Get images on pc:Glyph level

Returns:

a list of AlternativeImageType

invalidate_AlternativeImage(feature_selector=None)[source]

Remove derived images from this segment (due to changed coordinates).

If feature_selector is not none, remove only images with matching @comments, e.g. feature_selector=cropped,deskewed.

set_Border(Border)[source]

Set coordinate polygon by given BorderType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

get_AllTextLines(region_order='document', respect_textline_order=True)[source]

Return all the TextLine in the document

Parameters:
  • region_order ("document"|"reading-order"|"reading-order-only") – Whether to return regions sorted by document order (document, default) or by reading order with regions not in the reading order at the end of the returned list (reading-order) or regions not in the reading order omitted (reading-order-only)

  • respect_textline_order (boolean) – Whether to respect @textLineOrder attribute

Returns:

a list of TextLineType

get_ReadingOrderGroups() dict[source]

Aggregate recursive ReadingOrder into a dictionary, mapping each regionRef (i.e. segment @id) to its referring group object (i.e one of

 - RegionRefType - RegionRefIndexedType - OrderedGroupType - OrderedGroupIndexedType - UnoderedGroupType - UnoderedGroupIndexedType

set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.PcGtsType(pcGtsId=None, Metadata=None, Page=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Metadata()[source]
set_Metadata(Metadata)[source]
get_Page()[source]
set_Page(Page)[source]
get_pcGtsId()[source]
set_pcGtsId(pcGtsId)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='PcGtsType', pretty_print=True)[source]
to_etree(parent_element=None, name_='PcGtsType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
property id
get_AllAlternativeImagePaths(page=True, region=True, line=True, word=True, glyph=True)[source]

Get all the pc:AlternativeImage/@filename paths referenced in the PAGE-XML document.

Parameters:
  • page (boolean) – Get images on pc:Page level

  • region (boolean) – Get images on pc:*Region level

  • line (boolean) – Get images on pc:TextLine level

  • word (boolean) – Get images on pc:Word level

  • glyph (boolean) – Get images on pc:Glyph level

Returns:

a list of image filename strings

prune_ReadingOrder()[source]

Remove any empty ReadingOrder elements

class ocrd_models.ocrd_page.PrintSpaceType(Coords=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

PrintSpaceType – Determines the effective area on the paper of a printed page. Its size is equal for all pages of a book (exceptions: titlepage, multipage pictures). It contains all living elements (except marginals) like body type, footnotes, headings, running titles. It does not contain pagenumber (if not part of running title), marginals, signature mark, preview words.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Coords()[source]
set_Coords(Coords)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='PrintSpaceType', pretty_print=True)[source]
to_etree(parent_element=None, name_='PrintSpaceType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.ReadingOrderType(conf=None, OrderedGroup=None, UnorderedGroup=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

ReadingOrderType – Definition of the reading order within the page. To express a reading order between elements they have to be included in an OrderedGroup. Groups may contain further groups.

  • conf – Confidence value (between 0 and 1)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_OrderedGroup()[source]
set_OrderedGroup(OrderedGroup)[source]
get_UnorderedGroup()[source]
set_UnorderedGroup(UnorderedGroup)[source]
get_conf()[source]
set_conf(conf)[source]
validate_ConfSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='ReadingOrderType', pretty_print=True)[source]
to_etree(parent_element=None, name_='ReadingOrderType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.RegionRefIndexedType(index=None, regionRef=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

RegionRefIndexedType – Numbered region index – Position (order number) of this item within the current hierarchy level.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_index()[source]
set_index(index)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RegionRefIndexedType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RegionRefIndexedType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.RegionRefType(regionRef=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RegionRefType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RegionRefType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.RegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, extensiontype_=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

custom – For generic use continuation – Is this region a continuation of another region (in previous column or page, for example)?

  • AlternativeImage – Alternative region images (e.g. black-and-white).

  • Labels – Semantic labels / tags

  • Roles – Roles the region takes (e.g. in context of a parent region).

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_AlternativeImage()[source]
set_AlternativeImage(AlternativeImage)[source]
add_AlternativeImage(value)[source]
insert_AlternativeImage_at(index, value)[source]
replace_AlternativeImage_at(index, value)[source]
get_Coords()[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_Roles()[source]
set_Roles(Roles)[source]
get_TextRegion()[source]
set_TextRegion(TextRegion)[source]
add_TextRegion(value)[source]
insert_TextRegion_at(index, value)[source]
replace_TextRegion_at(index, value)[source]
get_ImageRegion()[source]
set_ImageRegion(ImageRegion)[source]
add_ImageRegion(value)[source]
insert_ImageRegion_at(index, value)[source]
replace_ImageRegion_at(index, value)[source]
get_LineDrawingRegion()[source]
set_LineDrawingRegion(LineDrawingRegion)[source]
add_LineDrawingRegion(value)[source]
insert_LineDrawingRegion_at(index, value)[source]
replace_LineDrawingRegion_at(index, value)[source]
get_GraphicRegion()[source]
set_GraphicRegion(GraphicRegion)[source]
add_GraphicRegion(value)[source]
insert_GraphicRegion_at(index, value)[source]
replace_GraphicRegion_at(index, value)[source]
get_TableRegion()[source]
set_TableRegion(TableRegion)[source]
add_TableRegion(value)[source]
insert_TableRegion_at(index, value)[source]
replace_TableRegion_at(index, value)[source]
get_ChartRegion()[source]
set_ChartRegion(ChartRegion)[source]
add_ChartRegion(value)[source]
insert_ChartRegion_at(index, value)[source]
replace_ChartRegion_at(index, value)[source]
get_SeparatorRegion()[source]
set_SeparatorRegion(SeparatorRegion)[source]
add_SeparatorRegion(value)[source]
insert_SeparatorRegion_at(index, value)[source]
replace_SeparatorRegion_at(index, value)[source]
get_MathsRegion()[source]
set_MathsRegion(MathsRegion)[source]
add_MathsRegion(value)[source]
insert_MathsRegion_at(index, value)[source]
replace_MathsRegion_at(index, value)[source]
get_ChemRegion()[source]
set_ChemRegion(ChemRegion)[source]
add_ChemRegion(value)[source]
insert_ChemRegion_at(index, value)[source]
replace_ChemRegion_at(index, value)[source]
get_MusicRegion()[source]
set_MusicRegion(MusicRegion)[source]
add_MusicRegion(value)[source]
insert_MusicRegion_at(index, value)[source]
replace_MusicRegion_at(index, value)[source]
get_AdvertRegion()[source]
set_AdvertRegion(AdvertRegion)[source]
add_AdvertRegion(value)[source]
insert_AdvertRegion_at(index, value)[source]
replace_AdvertRegion_at(index, value)[source]
get_NoiseRegion()[source]
set_NoiseRegion(NoiseRegion)[source]
add_NoiseRegion(value)[source]
insert_NoiseRegion_at(index, value)[source]
replace_NoiseRegion_at(index, value)[source]
get_UnknownRegion()[source]
set_UnknownRegion(UnknownRegion)[source]
add_UnknownRegion(value)[source]
insert_UnknownRegion_at(index, value)[source]
replace_UnknownRegion_at(index, value)[source]
get_CustomRegion()[source]
set_CustomRegion(CustomRegion)[source]
add_CustomRegion(value)[source]
insert_CustomRegion_at(index, value)[source]
replace_CustomRegion_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
get_continuation()[source]
set_continuation(continuation)[source]
get_extensiontype_()[source]
set_extensiontype_(extensiontype_)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
invalidate_AlternativeImage(feature_selector=None)[source]

Remove derived images from this segment (due to changed coordinates).

If feature_selector is not none, remove only images with matching @comments, e.g. feature_selector=cropped,deskewed.

set_Coords(Coords)[source]

Set coordinate polygon by given CoordsType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

class ocrd_models.ocrd_page.RelationType(id=None, type_=None, custom=None, comments=None, Labels=None, SourceRegionRef=None, TargetRegionRef=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

RelationType – One-to-one relation between to layout object. Use ‘link’ for loose relations and ‘join’ for strong relations (where something is fragmented for instance). Examples for ‘link’: caption - image floating - paragraph paragraph - paragraph (when a paragraph is split across columns and the last word of the first paragraph DOES NOT continue in the second paragraph) drop-cap - paragraph (when the drop-cap is a whole word) Examples for ‘join’: word - word (separated word at the end of a line) drop-cap - paragraph (when the drop-cap is not a whole word) paragraph - paragraph (when a pragraph is split across columns and the last word of the first paragraph DOES continue in the second paragraph)

  • custom – For generic use

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_SourceRegionRef()[source]
set_SourceRegionRef(SourceRegionRef)[source]
get_TargetRegionRef()[source]
set_TargetRegionRef(TargetRegionRef)[source]
get_id()[source]
set_id(id)[source]
get_type()[source]
set_type(type_)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_typeType1(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RelationType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RelationType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.RelationsType(Relation=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

RelationsType – Container for one-to-one relations between layout objects (for example: DropCap - paragraph, caption - image).

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Relation()[source]
set_Relation(Relation)[source]
add_Relation(value)[source]
insert_Relation_at(index, value)[source]
replace_Relation_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RelationsType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RelationsType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.RolesType(TableCellRole=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

TableCellRole – Data for a region that takes on the role of a table cell within a parent table region.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_TableCellRole()[source]
set_TableCellRole(TableCellRole)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='RolesType', pretty_print=True)[source]
to_etree(parent_element=None, name_='RolesType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.SeparatorRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, colour=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

SeparatorRegionType – Separators are lines that lie between columns and paragraphs and can be used to logically separate different articles from each other.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • colour – The colour of the separator

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_orientation()[source]
get_colour()[source]
set_colour(colour)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='SeparatorRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='SeparatorRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.TableCellRoleType(rowIndex=None, columnIndex=None, rowSpan=None, colSpan=None, header=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

rowIndex – Cell position in table starting with row 0 columnIndex – Cell position in table starting with column 0 rowSpan – Number of rows the cell spans (optional; default is 1) colSpan – Number of columns the cell spans (optional; default is 1) header – Is the cell a column or row header?

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_rowIndex()[source]
set_rowIndex(rowIndex)[source]
get_columnIndex()[source]
set_columnIndex(columnIndex)[source]
get_rowSpan()[source]
set_rowSpan(rowSpan)[source]
get_colSpan()[source]
set_colSpan(colSpan)[source]
get_header()[source]
set_header(header)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='TableCellRoleType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TableCellRoleType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.TableRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, rows=None, columns=None, lineColour=None, bgColour=None, lineSeparators=None, embText=None, Grid=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

TableRegionType – Tabular data in any form is represented with a table region. Rows and columns may or may not have separator lines; these lines are not separator regions.

  • orientation – The angle the rectangle encapsulating a region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • rows – The number of rows present in the table

  • columns – The number of columns present in the table

  • lineColour – The colour of the lines used in the region

  • bgColour – The background colour of the region

  • lineSeparators – Specifies the presence of line separators

  • embText – Specifies whether the region also contains text

  • Grid – Table grid (visible or virtual grid lines)

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_Grid()[source]
set_Grid(Grid)[source]
get_orientation()[source]
get_rows()[source]
set_rows(rows)[source]
get_columns()[source]
set_columns(columns)[source]
get_lineColour()[source]
set_lineColour(lineColour)[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
get_lineSeparators()[source]
set_lineSeparators(lineSeparators)[source]
get_embText()[source]
set_embText(embText)[source]
validate_ColourSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='TableRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TableRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.TextEquivType(index=None, conf=None, dataType=None, dataTypeDetails=None, comments=None, PlainText=None, Unicode=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

index – Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content.

  • conf – OCR confidence value (between 0 and 1)

  • dataType – Type of text content (is it free text or a number, for instance). This is only a descriptive attribute, the text type is not checked during XML validation.

  • dataTypeDetails – Refinement for dataType attribute. Can be a regular expression, for instance.

  • PlainText – Text in a “simple” form (ASCII or extended ASCII as mostly used for typing). I.e. no use of special characters for ligatures (should be stored as two separate characters) etc.

  • Unicode – Correct encoding of the original, always using the corresponding Unicode code point. I.e. ligatures have to be represented as one character etc.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_PlainText()[source]
set_PlainText(PlainText)[source]
get_Unicode()[source]
set_Unicode(Unicode)[source]
get_index()[source]
set_index(index)[source]
get_conf()[source]
set_conf(conf)[source]
get_dataType()[source]
set_dataType(dataType)[source]
get_dataTypeDetails()[source]
set_dataTypeDetails(dataTypeDetails)[source]
get_comments()[source]
set_comments(comments)[source]
validate_indexType(value)[source]
validate_ConfSimpleType(value)[source]
validate_TextDataTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:None="http://www.w3.org/2001/XMLSchema" ', name_='TextEquivType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TextEquivType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.TextLineType(id=None, primaryLanguage=None, primaryScript=None, secondaryScript=None, readingDirection=None, production=None, custom=None, comments=None, index=None, AlternativeImage=None, Coords=None, Baseline=None, Word=None, TextEquiv=None, TextStyle=None, UserDefined=None, Labels=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

primaryLanguage – Overrides primaryLanguage attribute of parent text region

  • primaryScript – The primary script used in the text line

  • secondaryScript – The secondary script used in the text line

  • readingDirection – The direction in which text within the line should be read (order of words and characters).

  • production – Overrides the production attribute of the parent text region

  • custom – For generic use

  • index – Position (order number) of this text line within the parent text region.

  • AlternativeImage – Alternative text line images (e.g. black-and-white)

  • Baseline – Multiple connected points that mark the baseline of the glyphs

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_AlternativeImage()[source]
set_AlternativeImage(AlternativeImage)[source]
add_AlternativeImage(value)[source]
insert_AlternativeImage_at(index, value)[source]
replace_AlternativeImage_at(index, value)[source]
get_Coords()[source]
get_Baseline()[source]
set_Baseline(Baseline)[source]
get_Word()[source]
set_Word(Word)[source]
add_Word(value)[source]
insert_Word_at(index, value)[source]
replace_Word_at(index, value)[source]
get_TextEquiv()[source]
set_TextEquiv(TextEquiv)[source]
add_TextEquiv(value)[source]
insert_TextEquiv_at(index, value)[source]
replace_TextEquiv_at(index, value)[source]
get_TextStyle()[source]
set_TextStyle(TextStyle)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_primaryLanguage()[source]
set_primaryLanguage(primaryLanguage)[source]
get_primaryScript()[source]
set_primaryScript(primaryScript)[source]
get_secondaryScript()[source]
set_secondaryScript(secondaryScript)[source]
get_readingDirection()[source]
set_readingDirection(readingDirection)[source]
get_production()[source]
set_production(production)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
get_index()[source]
set_index(index)[source]
validate_LanguageSimpleType(value)[source]
validate_ScriptSimpleType(value)[source]
validate_ReadingDirectionSimpleType(value)[source]
validate_ProductionSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='TextLineType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TextLineType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
invalidate_AlternativeImage(feature_selector=None)[source]

Remove derived images from this segment (due to changed coordinates).

If feature_selector is not none, remove only images with matching @comments, e.g. feature_selector=cropped,deskewed.

set_Coords(Coords)[source]

Set coordinate polygon by given CoordsType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

class ocrd_models.ocrd_page.TextRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, orientation=None, type_=None, leading=None, readingDirection=None, textLineOrder=None, readingOrientation=None, indented=None, align=None, primaryLanguage=None, secondaryLanguage=None, primaryScript=None, secondaryScript=None, production=None, TextLine=None, TextEquiv=None, TextStyle=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

TextRegionType – Pure text is represented as a text region. This includes drop capitals, but practically ornate text may be considered as a graphic.

  • orientation – The angle the rectangle encapsulating the region has to be rotated in clockwise direction in order to correct the present skew (negative values indicate anti-clockwise rotation). (The rotated image can be further referenced via “ AlternativeImage ” .) Range: -179.999,180

  • type – The nature of the text in the region

  • leading – The degree of space in points between the lines of text (line spacing)

  • readingDirection – The direction in which text within lines should be read (order of words and characters), in addition to “ textLineOrder ” .

  • textLineOrder – The order of text lines within the block, in addition to “ readingDirection ” .

  • readingOrientation – The angle the baseline of text within the region has to be rotated (relative to the rectangle encapsulating the region) in clockwise direction in order to correct the present skew, in addition to “ orientation ” (negative values indicate anti-clockwise rotation). Range: -179.999,180

  • indented – Defines whether a region of text is indented or not

  • align – Text align

  • primaryLanguage – The primary language used in the region

  • secondaryLanguage – The secondary language used in the region

  • primaryScript – The primary script used in the region

  • secondaryScript – The secondary script used in the region

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_TextLine()[source]
set_TextLine(TextLine)[source]
add_TextLine(value)[source]
insert_TextLine_at(index, value)[source]
replace_TextLine_at(index, value)[source]
get_TextEquiv()[source]
set_TextEquiv(TextEquiv)[source]
add_TextEquiv(value)[source]
insert_TextEquiv_at(index, value)[source]
replace_TextEquiv_at(index, value)[source]
get_TextStyle()[source]
set_TextStyle(TextStyle)[source]
get_orientation()[source]
get_type()[source]
set_type(type_)[source]
get_leading()[source]
set_leading(leading)[source]
get_readingDirection()[source]
set_readingDirection(readingDirection)[source]
get_textLineOrder()[source]
set_textLineOrder(textLineOrder)[source]
get_readingOrientation()[source]
set_readingOrientation(readingOrientation)[source]
get_indented()[source]
set_indented(indented)[source]
get_align()[source]
set_align(align)[source]
get_primaryLanguage()[source]
set_primaryLanguage(primaryLanguage)[source]
get_secondaryLanguage()[source]
set_secondaryLanguage(secondaryLanguage)[source]
get_primaryScript()[source]
set_primaryScript(primaryScript)[source]
get_secondaryScript()[source]
set_secondaryScript(secondaryScript)[source]
get_production()[source]
set_production(production)[source]
validate_TextTypeSimpleType(value)[source]
validate_ReadingDirectionSimpleType(value)[source]
validate_TextLineOrderSimpleType(value)[source]
validate_AlignSimpleType(value)[source]
validate_LanguageSimpleType(value)[source]
validate_ScriptSimpleType(value)[source]
validate_ProductionSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='TextRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TextRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
set_orientation(orientation)[source]

Set deskewing angle to given orientation number. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been rotated and enlarged with the angle of the previous value).

class ocrd_models.ocrd_page.TextStyleType(fontFamily=None, serif=None, monospace=None, fontSize=None, xHeight=None, kerning=None, textColour=None, textColourRgb=None, bgColour=None, bgColourRgb=None, reverseVideo=None, bold=None, italic=None, underlined=None, underlineStyle=None, subscript=None, superscript=None, strikethrough=None, smallCaps=None, letterSpaced=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

TextStyleType – Monospace (fixed-pitch, non-proportional) or proportional font.

  • fontFamily – For instance: Arial, Times New Roman. Add more information if necessary (e.g. blackletter, antiqua).

  • serif – Serif or sans-serif typeface.

  • fontSize – The size of the characters in points.

  • xHeight – The x-height or corpus size refers to the distance between the baseline and the mean line of lower-case letters in a typeface. The unit is assumed to be pixels.

  • kerning – The degree of space (in points) between the characters in a string of text.

  • textColourRgb – Text colour in RGB encoded format (red value) + (256 x green value) + (65536 x blue value).

  • bgColour – Background colour

  • bgColourRgb – Background colour in RGB encoded format (red value) + (256 x green value) + (65536 x blue value).

  • reverseVideo – Specifies whether the colour of the text appears reversed against a background colour.

  • underlineStyle – Line style details if “underlined” is TRUE

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_fontFamily()[source]
set_fontFamily(fontFamily)[source]
get_serif()[source]
set_serif(serif)[source]
get_monospace()[source]
set_monospace(monospace)[source]
get_fontSize()[source]
set_fontSize(fontSize)[source]
get_xHeight()[source]
set_xHeight(xHeight)[source]
get_kerning()[source]
set_kerning(kerning)[source]
get_textColour()[source]
set_textColour(textColour)[source]
get_textColourRgb()[source]
set_textColourRgb(textColourRgb)[source]
get_bgColour()[source]
set_bgColour(bgColour)[source]
get_bgColourRgb()[source]
set_bgColourRgb(bgColourRgb)[source]
get_reverseVideo()[source]
set_reverseVideo(reverseVideo)[source]
get_bold()[source]
set_bold(bold)[source]
get_italic()[source]
set_italic(italic)[source]
get_underlined()[source]
set_underlined(underlined)[source]
get_underlineStyle()[source]
set_underlineStyle(underlineStyle)[source]
get_subscript()[source]
set_subscript(subscript)[source]
get_superscript()[source]
set_superscript(superscript)[source]
get_strikethrough()[source]
set_strikethrough(strikethrough)[source]
get_smallCaps()[source]
set_smallCaps(smallCaps)[source]
get_letterSpaced()[source]
set_letterSpaced(letterSpaced)[source]
validate_ColourSimpleType(value)[source]
validate_UnderlineStyleSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='TextStyleType', pretty_print=True)[source]
to_etree(parent_element=None, name_='TextStyleType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.UnknownRegionType(id=None, custom=None, comments=None, continuation=None, AlternativeImage=None, Coords=None, UserDefined=None, Labels=None, Roles=None, TextRegion=None, ImageRegion=None, LineDrawingRegion=None, GraphicRegion=None, TableRegion=None, ChartRegion=None, SeparatorRegion=None, MathsRegion=None, ChemRegion=None, MusicRegion=None, AdvertRegion=None, NoiseRegion=None, UnknownRegion=None, CustomRegion=None, gds_collector_=None, **kwargs_)[source]

Bases: RegionType

UnknownRegionType – To be used if the region type cannot be ascertained.

member_data_items_ = []
subclass = None
superclass

alias of RegionType

static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='UnknownRegionType', pretty_print=True)[source]
to_etree(parent_element=None, name_='UnknownRegionType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.UnorderedGroupIndexedType(id=None, regionRef=None, index=None, caption=None, type_=None, continuation=None, custom=None, comments=None, UserDefined=None, Labels=None, RegionRef=None, OrderedGroup=None, UnorderedGroup=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

UnorderedGroupIndexedType – Indexed group containing unordered elements

  • regionRef – Optional link to a parent region of nested regions. The parent region doubles as reading order group. Only the nested regions should be allowed as group members.

  • index – Position (order number) of this item within the current hierarchy level.

  • continuation – Is this group a continuation of another group (from previous column or page, for example)?

  • custom – For generic use

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_RegionRef()[source]
set_RegionRef(RegionRef)[source]
add_RegionRef(value)[source]
insert_RegionRef_at(index, value)[source]
replace_RegionRef_at(index, value)[source]
get_OrderedGroup()[source]
set_OrderedGroup(OrderedGroup)[source]
add_OrderedGroup(value)[source]
insert_OrderedGroup_at(index, value)[source]
replace_OrderedGroup_at(index, value)[source]
get_UnorderedGroup()[source]
set_UnorderedGroup(UnorderedGroup)[source]
add_UnorderedGroup(value)[source]
insert_UnorderedGroup_at(index, value)[source]
replace_UnorderedGroup_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
get_index()[source]
set_index(index)[source]
get_caption()[source]
set_caption(caption)[source]
get_type()[source]
set_type(type_)[source]
get_continuation()[source]
set_continuation(continuation)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_GroupTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='UnorderedGroupIndexedType', pretty_print=True)[source]
to_etree(parent_element=None, name_='UnorderedGroupIndexedType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
get_UnorderedGroupChildren()[source]

List all non-metadata children of an UnorderedGroupType

class ocrd_models.ocrd_page.UnorderedGroupType(id=None, regionRef=None, caption=None, type_=None, continuation=None, custom=None, comments=None, UserDefined=None, Labels=None, RegionRef=None, OrderedGroup=None, UnorderedGroup=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

UnorderedGroupType – Numbered group (contains unordered elements)

  • regionRef – Optional link to a parent region of nested regions. The parent region doubles as reading order group. Only the nested regions should be allowed as group members.

  • continuation – Is this group a continuation of another group (from previous column or page, for example)?

  • custom – For generic use

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_RegionRef()[source]
set_RegionRef(RegionRef)[source]
add_RegionRef(value)[source]
insert_RegionRef_at(index, value)[source]
replace_RegionRef_at(index, value)[source]
get_OrderedGroup()[source]
set_OrderedGroup(OrderedGroup)[source]
add_OrderedGroup(value)[source]
insert_OrderedGroup_at(index, value)[source]
replace_OrderedGroup_at(index, value)[source]
get_UnorderedGroup()[source]
set_UnorderedGroup(UnorderedGroup)[source]
add_UnorderedGroup(value)[source]
insert_UnorderedGroup_at(index, value)[source]
replace_UnorderedGroup_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_regionRef()[source]
set_regionRef(regionRef)[source]
get_caption()[source]
set_caption(caption)[source]
get_type()[source]
set_type(type_)[source]
get_continuation()[source]
set_continuation(continuation)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_GroupTypeSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='UnorderedGroupType', pretty_print=True)[source]
to_etree(parent_element=None, name_='UnorderedGroupType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
get_UnorderedGroupChildren()[source]

List all non-metadata children of an UnorderedGroupType

class ocrd_models.ocrd_page.UserAttributeType(name=None, description=None, type_=None, value=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

UserAttributeType – Structured custom data defined by name, type and value.

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_name()[source]
set_name(name)[source]
get_description()[source]
set_description(description)[source]
get_type()[source]
set_type(type_)[source]
get_value()[source]
set_value(value)[source]
validate_typeType3(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='UserAttributeType', pretty_print=True)[source]
to_etree(parent_element=None, name_='UserAttributeType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.UserDefinedType(UserAttribute=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

UserDefinedType – Container for user-defined attributes

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_UserAttribute()[source]
set_UserAttribute(UserAttribute)[source]
add_UserAttribute(value)[source]
insert_UserAttribute_at(index, value)[source]
replace_UserAttribute_at(index, value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='UserDefinedType', pretty_print=True)[source]
to_etree(parent_element=None, name_='UserDefinedType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
class ocrd_models.ocrd_page.WordType(id=None, language=None, primaryScript=None, secondaryScript=None, readingDirection=None, production=None, custom=None, comments=None, AlternativeImage=None, Coords=None, Glyph=None, TextEquiv=None, TextStyle=None, UserDefined=None, Labels=None, gds_collector_=None, **kwargs_)[source]

Bases: GeneratedsSuper

language – Overrides primaryLanguage attribute of parent line and/or text region

  • primaryScript – The primary script used in the word

  • secondaryScript – The secondary script used in the word

  • readingDirection – The direction in which text within the word should be read (order of characters).

  • production – Overrides the production attribute of the parent text line and/or text region.

  • custom – For generic use

  • AlternativeImage – Alternative word images (e.g. black-and-white)

  • Labels – Semantic labels / tags

member_data_items_ = [<ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>, <ocrd_models.ocrd_page_generateds.MemberSpec_ object>]
subclass = None
superclass = None
static factory(*args_, **kwargs_)[source]
get_ns_prefix_()[source]
set_ns_prefix_(ns_prefix)[source]
get_AlternativeImage()[source]
set_AlternativeImage(AlternativeImage)[source]
add_AlternativeImage(value)[source]
insert_AlternativeImage_at(index, value)[source]
replace_AlternativeImage_at(index, value)[source]
get_Coords()[source]
get_Glyph()[source]
set_Glyph(Glyph)[source]
add_Glyph(value)[source]
insert_Glyph_at(index, value)[source]
replace_Glyph_at(index, value)[source]
get_TextEquiv()[source]
set_TextEquiv(TextEquiv)[source]
add_TextEquiv(value)[source]
insert_TextEquiv_at(index, value)[source]
replace_TextEquiv_at(index, value)[source]
get_TextStyle()[source]
set_TextStyle(TextStyle)[source]
get_UserDefined()[source]
set_UserDefined(UserDefined)[source]
get_Labels()[source]
set_Labels(Labels)[source]
add_Labels(value)[source]
insert_Labels_at(index, value)[source]
replace_Labels_at(index, value)[source]
get_id()[source]
set_id(id)[source]
get_language()[source]
set_language(language)[source]
get_primaryScript()[source]
set_primaryScript(primaryScript)[source]
get_secondaryScript()[source]
set_secondaryScript(secondaryScript)[source]
get_readingDirection()[source]
set_readingDirection(readingDirection)[source]
get_production()[source]
set_production(production)[source]
get_custom()[source]
set_custom(custom)[source]
get_comments()[source]
set_comments(comments)[source]
validate_LanguageSimpleType(value)[source]
validate_ScriptSimpleType(value)[source]
validate_ReadingDirectionSimpleType(value)[source]
validate_ProductionSimpleType(value)[source]
has__content()[source]
export(outfile, level, namespaceprefix_='', namespacedef_='xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15"', name_='WordType', pretty_print=True)[source]
to_etree(parent_element=None, name_='WordType', mapping_=None, reverse_mapping_=None, nsmap_=None)[source]
build(node, gds_collector_=None)[source]
invalidate_AlternativeImage(feature_selector=None)[source]

Remove derived images from this segment (due to changed coordinates).

If feature_selector is not none, remove only images with matching @comments, e.g. feature_selector=cropped,deskewed.

set_Coords(Coords)[source]

Set coordinate polygon by given CoordsType object. Moreover, invalidate self’s ``pc:AlternativeImage``s (because they will have been cropped with a bbox of the previous polygon).

ocrd_models.ocrd_page.to_xml(el, skip_declaration=False) str[source]

Serialize pc:PcGts document as string.