ocrd_utils.str module

Utility functions for strings, paths and URL.

ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]

Assert that a string of comma-separated fileGrps contains exactly n entries.

ocrd_utils.str.concat_padded(base, *args)[source]

Concatenate string and zero-padded 4 digit number

ocrd_utils.str.get_local_filename(url, start=None)[source]

Return local filename, optionally relative to start

  • url (string) – filename or URL

  • start (string) – Base path to remove from filename. Raise an exception if not a prefix of url


Whether a url is a local filename.


Return whether a value is a str.

ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]

Derive a new file ID for an output file from an existing input file ocrd_file and the name of the output file’s fileGrp/@USE, output_file_grp. If ocrd_file’s ID contains the input file’s fileGrp name, then replace it by output_file_grp. Otherwise use output_file_grp together with the position of ocrd_file within the input fileGrp (as a fallback counter). Increment counter until there is no more ID conflict.

ocrd_utils.str.nth_url_segment(url, n=- 1)[source]

Return the last /-delimited segment of a URL-like string

  • url (string) –

  • n (integer) – index of segment, default: -1


Parse a string as either the path to a JSON object or a literal JSON object.

Empty strings are equivalent to ‘{}’


Parse a string of JSON interspersed with #-prefixed full-line comments


Remove everything from URL after path.


Sanitize input to be safely used as the basename of a local file.