ocrd_utils.config module

Most behavior of OCR-D is controlled via command-line flags or keyword args. Some behavior is global or too cumbersome to handle via explicit code and better solved by using environment variables.

OcrdEnvConfig is a base class to make this more streamlined, to be subclassed in the ocrd package for the actual values

  • OCRD_METS_CACHING

If set to true, access to the METS file is cached, speeding in-memory search and modification.

  • OCRD_MAX_PROCESSOR_CACHE

Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers. (Default: “128”)

  • OCRD_MAX_PARALLEL_PAGES

Maximum number of processor threads for page-parallel processing (within each Processor’s selected page range, independent of the number of Processing Workers or Processor Servers). If set >1, then a METS Server must be used for METS synchronisation. (Default: “1”)

  • OCRD_PROCESSING_PAGE_TIMEOUT

Timeout in seconds for processing a single page. If set >0, when exceeded, the same as OCRD_MISSING_OUTPUT applies. (Default: “0”)

  • OCRD_PROFILE

Whether to enable gathering runtime statistics on the ocrd.profile logger (comma-separated):  - CPU: yields CPU and wall-time, - RSS: also yields peak memory (resident set size) - PSS: also yields peak memory (proportional set size)  (Default: “”)

  • OCRD_PROFILE_FILE

If set, then the CPU profile is written to this file for later peruse with a analysis tools like snakeviz

  • OCRD_DOWNLOAD_RETRIES

Number of times to retry failed attempts for downloads of resources or workspace files.

  • OCRD_DOWNLOAD_TIMEOUT

Timeout in seconds for connecting or reading (comma-separated) when downloading.

  • OCRD_DOWNLOAD_INPUT

Whether to download files not present locally during processing (Default: “True”)

  • OCRD_MISSING_INPUT

How to deal with missing input files (for some fileGrp/pageId) during processing: 

  • SKIP: ignore and proceed with next page’s input

  • ABORT: throw MissingInputFile

 (Default: “SKIP”)

  • OCRD_MISSING_OUTPUT

How to deal with missing output files (for some fileGrp/pageId) during processing: 

  • SKIP: ignore and proceed processing next page

  • COPY: fall back to copying input PAGE to output fileGrp for page

  • ABORT: re-throw whatever caused processing to fail

 (Default: “SKIP”)

  • OCRD_MAX_MISSING_OUTPUTS

Maximal rate of skipped/fallback pages among all processed pages before aborting (decimal fraction, ignored if negative). (Default: “0.1”)

  • OCRD_EXISTING_OUTPUT

How to deal with already existing output files (for some fileGrp/pageId) during processing: 

  • SKIP: ignore and proceed processing next page

  • OVERWRITE: force writing result to output fileGrp for page

  • ABORT: re-throw FileExistsError

 (Default: “SKIP”)

  • OCRD_NETWORK_SERVER_ADDR_PROCESSING

Default address of Processing Server to connect to (for ocrd network client processing). (Default: “”)

  • OCRD_NETWORK_CLIENT_POLLING_SLEEP

How many seconds to sleep before trying again. (Default: “10”)

  • OCRD_NETWORK_CLIENT_POLLING_TIMEOUT

Timeout for a blocking ocrd network client (in seconds). (Default: “3600”)

  • OCRD_NETWORK_SERVER_ADDR_WORKFLOW

Default address of Workflow Server to connect to (for ocrd network client workflow). (Default: “”)

  • OCRD_NETWORK_SERVER_ADDR_WORKSPACE

Default address of Workspace Server to connect to (for ocrd network client workspace). (Default: “”)

  • OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS

Number of attempts for a RabbitMQ client to connect before failing. (Default: “3”)

  • OCRD_NETWORK_RABBITMQ_HEARTBEAT

    Controls AMQP heartbeat timeout (in seconds) negotiation during connection tuning. An integer value always overrides the value proposed by broker. Use 0 to deactivate heartbeat.

    (Default: “0”)

  • OCRD_NETWORK_SOCKETS_ROOT_DIR

The root directory where all mets server related socket files are created (Default: “/tmp/ocrd_network_sockets”)

  • OCRD_NETWORK_LOGS_ROOT_DIR

The root directory where all ocrd_network related file logs are stored (Default: “/tmp/ocrd_network_logs”)

  • HOME

Directory to look for ocrd_logging.conf, fallback for unset XDG variables. (Default: “/home/kba”)

  • XDG_DATA_HOME

Directory to look for ./ocrd-resources/* (i.e. ocrd resmgr data location) (Default: “/home/kba/.local/share”)

  • XDG_CONFIG_HOME

Directory to look for ./ocrd/resources.yml (i.e. ocrd resmgr user database) (Default: “/home/kba/.config”)

  • OCRD_LOGGING_DEBUG

Print information about the logging setup to STDERR (Default: “False”)

class ocrd_utils.config.OcrdEnvVariable(name, description, parser=<class 'str'>, validator=<function OcrdEnvVariable.<lambda>>, default=[False, None])[source]

Bases: object

An environment variable for use in OCR-D.

Parameters:
  • name (str) – Name of the environment variable

  • description (str) – Description of what the variable is used for.

Keyword Arguments:
  • parser (callable) – Function to transform the raw (string) value to whatever is needed.

  • validator (callable) – Function to validate that the raw (string) value is parseable.

  • default (tuple(bool, any)) – 2-tuple, first element is a bool whether there is a default value defined and second element contains that default value, which can be a callable for deferred evaluation

describe(wrap_text=True, indent_text=True)[source]

Output help information on a config option.

If option.description is a multiline string with complex formatting (e.g. markdown lists), replace empty lines with  and set wrap_text to False.

class ocrd_utils.config.OcrdEnvConfig[source]

Bases: object

add(name, *args, **kwargs)[source]
has_default(name)[source]
reset_defaults()[source]
describe(name, *args, **kwargs)[source]
is_set(name)[source]
raw_value(name)[source]