ocrd.processor.helpers module¶
Helper methods for running and documenting processors
- ocrd.processor.helpers.run_cli(executable, mets_url=None, resolver=None, workspace=None, page_id=None, overwrite=None, debug=None, log_level=None, log_filename=None, input_file_grp=None, output_file_grp=None, parameter=None, working_dir=None, mets_server_url=None)[source]¶
Open a workspace and run a processor on the command line.
If
workspaceis not none, reuse that. Otherwise, instantiate anWorkspaceformets_url(andworking_dir) by usingocrd.Resolver.workspace_from_url()(i.e. open or clone local workspace).Run the processor CLI
executableon the workspace, passing: - the workspace, -page_id-input_file_grp-output_file_grp-parameter(after applying anyparameter_overridesettings)(Will create output files and update the in the filesystem).
- Parameters:
executable (string) – Executable name of the module processor.
- ocrd.processor.helpers.run_processor(processorClass, mets_url=None, resolver=None, workspace=None, page_id=None, log_level=None, input_file_grp=None, output_file_grp=None, parameter=None, working_dir=None, mets_server_url=None, instance_caching=False)[source]¶
Instantiate a Pythonic processor, open a workspace, run the processor and save the workspace.
If
workspaceis not none, reuse that. Otherwise, instantiate anWorkspaceformets_url(andworking_dir) by usingocrd.Resolver.workspace_from_url()(i.e. open or clone local workspace).Instantiate a Python object for
processorClass, passing: - the workspace, -page_id-input_file_grp-output_file_grp-parameter(after applying anyparameter_overridesettings)Run the processor on the workspace (creating output files in the filesystem).
Finally, write back the workspace (updating the METS in the filesystem).
If
instance_cachingis True, then processor instances (for the same set ofparametervalues) will be cached internally. Thus, these objects (and all their memory resources, like loaded models) get re-used instead of re-instantiated when a match occurs - as long as the program is being run. They only get deleted (and their resources freed) when as many asOCRD_MAX_PROCESSOR_CACHEinstances have already been cached while this particular parameter set was re-used least frequently. (SeeProcessingWorkerfor use-cases.)- Parameters:
processorClass (object) – Python class of the module processor.