ocrd_utils.str module

Utility functions for strings, paths and URL.

ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]

Assert that a string of comma-separated fileGrps contains exactly n entries.

ocrd_utils.str.concat_padded(base, *args)[source]

Concatenate string and zero-padded 4 digit number

ocrd_utils.str.get_local_filename(url, start=None)[source]

Return local filename, optionally relative to start

Parameters:
  • url (string) – filename or URL

  • start (string) – Base path to remove from filename. Raise an exception if not a prefix of url

ocrd_utils.str.is_local_filename(url)[source]

Whether a url is a local filename.

ocrd_utils.str.partition_list(lst, chunks, chunk_index=None)[source]

Partition a list into roughly equally-sized chunks

Parameters:
  • lst (list) – list to partition

  • chunks (int) – number of chunks to generate (not per chunk!)

Keyword Arguments:

chunk_index (None|int) – If provided, return only a list consisting of this chunk

Returns:

list(list())

ocrd_utils.str.is_string(val)[source]

Return whether a value is a str.

ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]

Derive a new file ID for an output file from an existing input file ocrd_file and the name of the output file’s fileGrp/@USE, output_file_grp. If ocrd_file’s ID contains the input file’s fileGrp name, then replace it by output_file_grp. Else if ocrd_file has a pageId but it is not contained in the ocrd_file.ID, then

concatenate output_file_grp and ocrd_file.pageId.

Otherwise concatenate output_file_grp with the ocrd_file.ID.

Note: make_file_id cannot guarantee that the new ID is unique within an actual ocrd_models.ocrd_mets.OcrdMets. The caller is responsible for ensuring uniqueness of files to be added. Ultimately, ID conflicts will lead to ocrd_models.ocrd_mets.OcrdMets.add_file() raising an exception. This can be avoided if all processors use make_file_id consistently for ID generation.

Note: make_file_id generates page-specific IDs. For IDs representing page segments or pc:AlternativeImage files, the output of make_file_id may need to be concatenated with a unique string for that sub-page element, such as “.IMG” or the segment ID.

ocrd_utils.str.make_xml_id(idstr: str) str[source]

Turn idstr into a valid xml:id literal by replacing : with _, removing everything non-alphanumeric, . and - and prepending id_ if idstr starts with a number.

ocrd_utils.str.nth_url_segment(url, n=-1)[source]

Return the last /-delimited segment of a URL-like string

Parameters:
  • url (string)

  • n (integer) – index of segment, default: -1

ocrd_utils.str.parse_json_file_with_comments(val)[source]

Parse a file of JSON interspersed with #-prefixed full-line comments

ocrd_utils.str.parse_json_string_or_file(*values, resolve_preset_file=None)[source]

Parse a string as either the path to a JSON object or a literal JSON object.

Empty strings are equivalent to ‘{}’

ocrd_utils.str.parse_json_string_with_comments(val)[source]

Parse a string of JSON interspersed with #-prefixed full-line comments

ocrd_utils.str.remove_non_path_from_url(url)[source]

Remove everything from URL after path.

ocrd_utils.str.safe_filename(url)[source]

Sanitize input to be safely used as the basename of a local file.