ocrd_utils.str module

Utility functions for strings, paths and URL.

ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]

Assert that a string of comma-separated fileGrps contains exactly n entries.

ocrd_utils.str.concat_padded(base, *args)[source]

Concatenate string and zero-padded 4 digit number

ocrd_utils.str.get_local_filename(url, start=None)[source]

Return local filename, optionally relative to start

  • url (string) – filename or URL

  • start (string) – Base path to remove from filename. Raise an exception if not a prefix of url


Whether a url is a local filename.


Return whether a value is a str.

ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]

Derive a new file ID for an output file from an existing input file ocrd_file and the name of the output file’s fileGrp/@USE, output_file_grp. If ocrd_file’s ID contains the input file’s fileGrp name, then replace it by output_file_grp. Else if ocrd_file has a pageId but it is not contained in the ocrd_file.ID, then

concatenate output_file_grp and ocrd_file.pageId.

Otherwise concatenate output_file_grp with the ocrd_file.ID.

Note: make_file_id cannot guarantee that the new ID is unique within an actual ocrd_models.ocrd_mets.OcrdMets. The caller is responsible for ensuring uniqueness of files to be added. Ultimately, ID conflicts will lead to ocrd_models.ocrd_mets.OcrdMets.add_file() raising an exception. This can be avoided if all processors use make_file_id consistently for ID generation.

Note: make_file_id generates page-specific IDs. For IDs representing page segments or pc:AlternativeImage files, the output of make_file_id may need to be concatenated with a unique string for that sub-page element, such as “.IMG” or the segment ID.

ocrd_utils.str.nth_url_segment(url, n=- 1)[source]

Return the last /-delimited segment of a URL-like string

  • url (string) –

  • n (integer) – index of segment, default: -1


Parse a string as either the path to a JSON object or a literal JSON object.

Empty strings are equivalent to ‘{}’


Parse a string of JSON interspersed with #-prefixed full-line comments


Remove everything from URL after path.


Sanitize input to be safely used as the basename of a local file.