ocrd.processor.builtin.merge_processor module

ocrd.processor.builtin.merge_processor.get_border_bbox(pcgts)[source]
ocrd.processor.builtin.merge_processor.rename_segments(pcgts, start=1)[source]
class ocrd.processor.builtin.merge_processor.MergeProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, version=None)[source]

Bases: Processor

Instantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate parameter.

Parameters:

workspace (Workspace) – The workspace to process. If not None, then chdir to that directory. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

Keyword Arguments:
  • parameter (string) – JSON of the runtime choices for ocrd-tool parameters. Can be None even for processing, but then needs to be set before running.

  • input_file_grp (string) – comma-separated list of METS fileGrp used for input. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • output_file_grp (string) – comma-separated list of METS fileGrp used for output. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • page_id (string) – comma-separated list of METS physical page IDs to process (or empty for all pages). Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResultVariadicListWrapper[source]

Merge PAGE segment hierarchy elements from all input file groups.

For each page, open and deserialise PAGE input files. Rename all elements of the segment hierarchy to new (clash-free) identifers. Redefine the Border coordinates as the convex hull of all input borders. Then add all regions from all input files, concatenating them into a single ReadingOrder in the order of input file groups.

Produce a new PAGE output file by serialising the resulting hierarchy.

property metadata_filename

Relative location of the ocrd-tool.json file inside the package.

Used by metadata_location.

(Override if ocrd-tool.json is not in the root of the module, e.g. namespace/ocrd-tool.json or data/ocrd-tool.json).

property executable

The executable name of this processor tool. Taken from the runtime filename.

Used by ocrd_tool for lookup in metadata.

(Override if your entry-point name deviates from the executable name, or the processor gets instantiated from another runtime.)