ocrd.processor.builtin.dummy_processor module

class ocrd.processor.builtin.dummy_processor.DummyProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, download_files=True, version=None)[source]

Bases: Processor

Bare-bones processor creates PAGE-XML and optionally copies file from input group to output group

Instantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate parameter.

Parameters:

workspace (Workspace) – The workspace to process. If not None, then chdir to that directory. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

Keyword Arguments:
  • parameter (string) – JSON of the runtime choices for ocrd-tool parameters. Can be None even for processing, but then needs to be set before running.

  • input_file_grp (string) – comma-separated list of METS fileGrp used for input. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • output_file_grp (string) – comma-separated list of METS fileGrp used for output. Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • page_id (string) – comma-separated list of METS physical page IDs to process (or empty for all pages). Deprecated since version 3.0: Should be None here, but then needs to be set before processing.

  • download_files (boolean) – Whether input files will be downloaded prior to processing, defaults to ocrd_utils.config.OCRD_DOWNLOAD_INPUT which is True by default

process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResult[source]

Process the given input_pcgts of the workspace, representing one physical page (passed as one parsed OcrdPage per input fileGrp) under the given parameter, and return the resulting OcrdPageResult.

Optionally, add to the images attribute of the resulting OcrdPageResult instances of OcrdPageResultImage, which have required fields for pil (PIL.Image image data), file_id_suffix (used for generating IDs of the saved image) and alternative_image (reference of the ocrd_models.ocrd_page.AlternativeImageType for setting the filename of the saved image).

(This contains the main functionality and must be overridden by subclasses, unless it does not get called by some overriden process_page_file().)

process_page_file(*input_files: OcrdFile | ClientSideOcrdFile | None) None[source]

Process the given input_files of the workspace, representing one physical page (passed as one opened OcrdFile per input fileGrp) under the given parameter, and make sure the results get added accordingly.

(This uses process_page_pcgts(), but should be overridden by subclasses to handle cases like multiple output fileGrps, non-PAGE input etc.)

property metadata_filename

Relative location of the ocrd-tool.json file inside the package.

Used by metadata_location.

(Override if ocrd-tool.json is not in the root of the module, e.g. namespace/ocrd-tool.json or data/ocrd-tool.json).

property executable

The executable name of this processor tool. Taken from the runtime filename.

Used by ocrd_tool for lookup in metadata.

(Override if your entry-point name deviates from the executable name, or the processor gets instantiated from another runtime.)