ocrd.processor.helpers module

Helper methods for running and documenting processors

ocrd.processor.helpers.run_cli(executable, mets_url=None, resolver=None, workspace=None, page_id=None, overwrite=None, debug=None, log_level=None, log_filename=None, input_file_grp=None, output_file_grp=None, parameter=None, working_dir=None, mets_server_url=None)[source]

Open a workspace and run a processor on the command line.

If workspace is not none, reuse that. Otherwise, instantiate an Workspace for mets_url (and working_dir) by using ocrd.Resolver.workspace_from_url() (i.e. open or clone local workspace).

Run the processor CLI executable on the workspace, passing: - the workspace, - page_id - input_file_grp - output_file_grp - parameter (after applying any parameter_override settings)

(Will create output files and update the in the filesystem).

Parameters:

executable (string) – Executable name of the module processor.

ocrd.processor.helpers.run_processor(processorClass, mets_url=None, resolver=None, workspace=None, page_id=None, log_level=None, input_file_grp=None, output_file_grp=None, parameter=None, working_dir=None, mets_server_url=None, instance_caching=False)[source]

Instantiate a Pythonic processor, open a workspace, run the processor and save the workspace.

If workspace is not none, reuse that. Otherwise, instantiate an Workspace for mets_url (and working_dir) by using ocrd.Resolver.workspace_from_url() (i.e. open or clone local workspace).

Instantiate a Python object for processorClass, passing: - the workspace, - page_id - input_file_grp - output_file_grp - parameter (after applying any parameter_override settings)

Run the processor on the workspace (creating output files in the filesystem).

Finally, write back the workspace (updating the METS in the filesystem).

If instance_caching is True, then processor instances (for the same set of parameter values) will be cached internally. Thus, these objects (and all their memory resources, like loaded models) get re-used instead of re-instantiated when a match occurs - as long as the program is being run. They only get deleted (and their resources freed) when as many as OCRD_MAX_PROCESSOR_CACHE instances have already been cached while this particular parameter set was re-used least frequently. (See ProcessingWorker for use-cases.)

Parameters:

processorClass (object) – Python class of the module processor.