ocrd.processor.builtin.dummy_processor module¶
- class ocrd.processor.builtin.dummy_processor.DummyProcessor(workspace: Workspace | None, ocrd_tool=None, parameter=None, input_file_grp=None, output_file_grp=None, page_id=None, download_files=True, version=None)[source]¶
Bases:
Processor
Bare-bones processor creates PAGE-XML and optionally copies file from input group to output group
Instantiate, but do not setup (neither for processing nor other usage). If given, do parse and validate
parameter
.- Parameters:
workspace (
Workspace
) – The workspace to process. If notNone
, then chdir to that directory. Deprecated since version 3.0: Should beNone
here, but then needs to be set before processing.- Keyword Arguments:
parameter (string) – JSON of the runtime choices for ocrd-tool
parameters
. Can beNone
even for processing, but then needs to be set before running.input_file_grp (string) – comma-separated list of METS
fileGrp
used for input. Deprecated since version 3.0: Should beNone
here, but then needs to be set before processing.output_file_grp (string) – comma-separated list of METS
fileGrp
used for output. Deprecated since version 3.0: Should beNone
here, but then needs to be set before processing.page_id (string) – comma-separated list of METS physical
page
IDs to process (or empty for all pages). Deprecated since version 3.0: Should beNone
here, but then needs to be set before processing.download_files (boolean) – Whether input files will be downloaded prior to processing, defaults to
ocrd_utils.config.OCRD_DOWNLOAD_INPUT
which isTrue
by default
- process_page_pcgts(*input_pcgts: OcrdPage | None, page_id: str | None = None) OcrdPageResult [source]¶
Process the given
input_pcgts
of theworkspace
, representing one physical page (passed as one parsedOcrdPage
per input fileGrp) under the givenparameter
, and return the resultingOcrdPageResult
.Optionally, add to the
images
attribute of the resultingOcrdPageResult
instances ofOcrdPageResultImage
, which have required fields forpil
(PIL.Image
image data),file_id_suffix
(used for generating IDs of the saved image) andalternative_image
(reference of theocrd_models.ocrd_page.AlternativeImageType
for setting the filename of the saved image).(This contains the main functionality and must be overridden by subclasses, unless it does not get called by some overriden
process_page_file()
.)
- process_page_file(*input_files: OcrdFile | ClientSideOcrdFile | None) None [source]¶
Process the given
input_files
of theworkspace
, representing one physical page (passed as one openedOcrdFile
per input fileGrp) under the givenparameter
, and make sure the results get added accordingly.(This uses
process_page_pcgts()
, but should be overridden by subclasses to handle cases like multiple output fileGrps, non-PAGE input etc.)
- property metadata_filename¶
Relative location of the
ocrd-tool.json
file inside the package.Used by
metadata_location
.(Override if
ocrd-tool.json
is not in the root of the module, e.g.namespace/ocrd-tool.json
ordata/ocrd-tool.json
).
- property executable¶
The executable name of this processor tool. Taken from the runtime filename.
Used by
ocrd_tool
for lookup inmetadata
.(Override if your entry-point name deviates from the
executable
name, or the processor gets instantiated from another runtime.)