ocrd.processor.builtin.filter_processor module

class ocrd.processor.builtin.filter_processor.FilterProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, version=None)[source]

Bases: Processor

Instantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate parameter.

Parameters:

workspace (Workspace) – The workspace to process. If not None, then chdir to that directory. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

Keyword Arguments:
  • parameter (string) – JSON of the runtime choices for ocrd-tool parameters. Can be None even for processing, but then needs to be set before running.

  • input_file_grp (string) – comma-separated list of METS fileGrp used for input. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • output_file_grp (string) – comma-separated list of METS fileGrp used for output. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • page_id (string) – comma-separated list of METS physical page IDs to process (or empty for all pages). Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResultVariadicListWrapper[source]

Remove PAGE segment hierarchy elements based on flexible selection criteria.

Open and deserialise PAGE input file, then iterate over the segment hierarchy down to the level required for select (which could be multiple levels at once).

Remove any segments matching XPath query select from that hierarchy (and from the ReadingOrder if it is a region type).

 Besides full XPath 2.0 syntax, this supports extra predicates: - pc:pixelarea() for the number of pixels of the bounding box (or sum area on node sets), - pc:textequiv() for the first TextEquiv unicode string (or concatenated string on node sets).

If plot is true, then extract and write an image file for all removed segments to the output fileGrp (without reference to the PAGE).

Produce a new PAGE output file by serialising the resulting hierarchy.

property metadata_filename

Relative location of the ocrd-tool.json file inside the package.

Used by metadata_location.

(Override if ocrd-tool.json is not in the root of the module, e.g. namespace/ocrd-tool.json or data/ocrd-tool.json).

property executable

The executable name of this processor tool. Taken from the runtime filename.

Used by ocrd_tool for lookup in metadata.

(Override if your entry-point name deviates from the executable name, or the processor gets instantiated from another runtime.)