ocrd.processor.builtin.dummy_processor module¶
- class ocrd.processor.builtin.dummy_processor.DummyProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, download_files=True, version=None)[source]¶
Bases:
ProcessorBare-bones processor creates PAGE-XML and optionally copies file from input group to output group
Instantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate
parameter.- Parameters:
workspace (
Workspace) – The workspace to process. If notNone, then chdir to that directory. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.- Keyword Arguments:
parameter (string) – JSON of the runtime choices for ocrd-tool
parameters. Can beNoneeven for processing, but then needs to be set before running.input_file_grp (string) – comma-separated list of METS
fileGrpused for input. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.output_file_grp (string) – comma-separated list of METS
fileGrpused for output. Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.page_id (string) – comma-separated list of METS physical
pageIDs to process (or empty for all pages). Deprecated since version 3.0: Should beNonehere, but then needs to be set before processing.download_files (boolean) – Whether input files will be downloaded prior to processing, defaults to
ocrd_utils.config.OCRD_DOWNLOAD_INPUTwhich isTrueby default
- process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResult[source]¶
Process the given
input_pcgtsof theworkspace, representing one physical page (passed as one parsedOcrdPageper input fileGrp) under the givenparameter, and return the resultingOcrdPageResult.Optionally, add to the
imagesattribute of the resultingOcrdPageResultinstances ofOcrdPageResultImage, which have required fields forpil(PIL.Imageimage data),file_id_suffix(used for generating IDs of the saved image) andalternative_image(reference of theocrd_models.ocrd_page.AlternativeImageTypefor setting the filename of the saved image).(This contains the main functionality and must be overridden by subclasses, unless it does not get called by some overriden
process_page_file().)
- process_page_file(*input_files: OcrdFile | ClientSideOcrdFile | None) None[source]¶
Process the given
input_filesof theworkspace, representing one physical page (passed as one openedOcrdFileper input fileGrp) under the givenparameter, and make sure the results get added accordingly.(This uses
process_page_pcgts(), but should be overridden by subclasses to handle cases like multiple output fileGrps, non-PAGE input etc.)
- property metadata_filename¶
Relative location of the
ocrd-tool.jsonfile inside the package.Used by
metadata_location.(Override if
ocrd-tool.jsonis not in the root of the module, e.g.namespace/ocrd-tool.jsonordata/ocrd-tool.json).
- property executable¶
The executable name of this processor tool. Taken from the runtime filename.
Used by
ocrd_toolfor lookup inmetadata.(Override if your entry-point name deviates from the
executablename, or the processor gets instantiated from another runtime.)