ocrd_utils.config module¶
Most behavior of OCR-D is controlled via command-line flags or keyword args. Some behavior is global or too cumbersome to handle via explicit code and better solved by using environment variables.
OcrdEnvConfig is a base class to make this more streamlined, to be subclassed in the ocrd package for the actual values
OCRD_METS_CACHING
If set to true, access to the METS file is cached, speeding in-memory search and modification.
OCRD_MAX_PROCESSOR_CACHE
Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers. (Default: “128”)
OCRD_MAX_PARALLEL_PAGES
Maximum number of processor threads for page-parallel processing (within each Processor’s selected page range, independent of the number of Processing Workers or Processor Servers). If set >1, then a METS Server must be used for METS synchronisation. (Default: “1”)
OCRD_PROCESSING_PAGE_TIMEOUT
Timeout in seconds for processing a single page. If set >0, when exceeded, the same as OCRD_MISSING_OUTPUT applies. (Default: “0”)
OCRD_PROFILE
Whether to enable gathering runtime statistics on the ocrd.profile logger (comma-separated): - CPU: yields CPU and wall-time, - RSS: also yields peak memory (resident set size) - PSS: also yields peak memory (proportional set size) (Default: “”)
OCRD_PROFILE_FILE
If set, then the CPU profile is written to this file for later peruse with a analysis tools like snakeviz
OCRD_DOWNLOAD_RETRIES
Number of times to retry failed attempts for downloads of resources or workspace files.
OCRD_DOWNLOAD_TIMEOUT
Timeout in seconds for connecting or reading (comma-separated) when downloading.
OCRD_DOWNLOAD_INPUT
Whether to download files not present locally during processing (Default: “True”)
OCRD_MISSING_INPUT
How to deal with missing input files (for some fileGrp/pageId) during processing:
SKIP: ignore and proceed with next page’s input
ABORT: throw
MissingInputFile
(Default: “SKIP”)
OCRD_MISSING_OUTPUT
How to deal with missing output files (for some fileGrp/pageId) during processing:
SKIP: ignore and proceed processing next page
COPY: fall back to copying input PAGE to output fileGrp for page
ABORT: re-throw whatever caused processing to fail
(Default: “SKIP”)
OCRD_MAX_MISSING_OUTPUTS
Maximal rate of skipped/fallback pages among all processed pages before aborting (decimal fraction, ignored if negative). (Default: “0.1”)
OCRD_EXISTING_OUTPUT
How to deal with already existing output files (for some fileGrp/pageId) during processing:
SKIP: ignore and proceed processing next page
OVERWRITE: force writing result to output fileGrp for page
ABORT: re-throw
FileExistsError
(Default: “SKIP”)
OCRD_NETWORK_SERVER_ADDR_PROCESSING
Default address of Processing Server to connect to (for ocrd network client processing). (Default: “”)
OCRD_NETWORK_CLIENT_POLLING_SLEEP
How many seconds to sleep before trying again. (Default: “10”)
OCRD_NETWORK_CLIENT_POLLING_TIMEOUT
Timeout for a blocking ocrd network client (in seconds). (Default: “3600”)
OCRD_NETWORK_SERVER_ADDR_WORKFLOW
Default address of Workflow Server to connect to (for ocrd network client workflow). (Default: “”)
OCRD_NETWORK_SERVER_ADDR_WORKSPACE
Default address of Workspace Server to connect to (for ocrd network client workspace). (Default: “”)
OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS
Number of attempts for a RabbitMQ client to connect before failing. (Default: “3”)
OCRD_NETWORK_RABBITMQ_HEARTBEAT
Controls AMQP heartbeat timeout (in seconds) negotiation during connection tuning. An integer value always overrides the value proposed by broker. Use 0 to deactivate heartbeat.
(Default: “0”)
OCRD_NETWORK_SOCKETS_ROOT_DIR
The root directory where all mets server related socket files are created (Default: “/tmp/ocrd_network_sockets”)
OCRD_NETWORK_LOGS_ROOT_DIR
The root directory where all ocrd_network related file logs are stored (Default: “/tmp/ocrd_network_logs”)
HOME
Directory to look for ocrd_logging.conf, fallback for unset XDG variables. (Default: “/home/kba”)
XDG_DATA_HOME
Directory to look for ./ocrd-resources/* (i.e. ocrd resmgr data location) (Default: “/home/kba/.local/share”)
XDG_CONFIG_HOME
Directory to look for ./ocrd/resources.yml (i.e. ocrd resmgr user database) (Default: “/home/kba/.config”)
OCRD_LOGGING_DEBUG
Print information about the logging setup to STDERR (Default: “False”)
- class ocrd_utils.config.OcrdEnvVariable(name, description, parser=<class 'str'>, validator=<function OcrdEnvVariable.<lambda>>, default=[False, None])[source]¶
Bases:
object
An environment variable for use in OCR-D.
- Parameters:
name (str) – Name of the environment variable
description (str) – Description of what the variable is used for.
- Keyword Arguments:
parser (callable) – Function to transform the raw (string) value to whatever is needed.
validator (callable) – Function to validate that the raw (string) value is parseable.
default (tuple(bool, any)) – 2-tuple, first element is a bool whether there is a default value defined and second element contains that default value, which can be a callable for deferred evaluation