ocrd.processor.builtin.filter_processor module¶
- class ocrd.processor.builtin.filter_processor.FilterProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, version=None)[source]¶
Bases:
ProcessorInstantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate
parameter.- Parameters:
workspace (
Workspace) – The workspace to process. If notNone, then chdir to that directory. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.- Keyword Arguments:
parameter (string) – JSON of the runtime choices for ocrd-tool
parameters. Can beNoneeven for processing, but then needs to be set before running.input_file_grp (string) – comma-separated list of METS
fileGrpused for input. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.output_file_grp (string) – comma-separated list of METS
fileGrpused for output. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.page_id (string) – comma-separated list of METS physical
pageIDs to process (or empty for all pages). Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.
- process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResultVariadicListWrapper[source]¶
Remove PAGE segment hierarchy elements based on flexible selection criteria.
Open and deserialise PAGE input file, then iterate over the segment hierarchy down to the level required for
select(which could be multiple levels at once).Remove any segments matching XPath query
selectfrom that hierarchy (and from the ReadingOrder if it is a region type).Besides full XPath 2.0 syntax, this supports extra predicates: - pc:pixelarea() for the number of pixels of the bounding box (or sum area on node sets), - pc:textequiv() for the first TextEquiv unicode string (or concatenated string on node sets).
If
plotis true, then extract and write an image file for all removed segments to the output fileGrp (without reference to the PAGE).Produce a new PAGE output file by serialising the resulting hierarchy.
- property metadata_filename¶
Relative location of the
ocrd-tool.jsonfile inside the package.Used by
metadata_location.(Override if
ocrd-tool.jsonis not in the root of the module, e.g.namespace/ocrd-tool.jsonordata/ocrd-tool.json).
- property executable¶
The executable name of this processor tool. Taken from the runtime filename.
Used by
ocrd_toolfor lookup inmetadata.(Override if your entry-point name deviates from the
executablename, or the processor gets instantiated from another runtime.)