ocrd_utils.str module¶
Utility functions for strings, paths and URL.
- ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]¶
Assert that a string of comma-separated fileGrps contains exactly
n
entries.
- ocrd_utils.str.concat_padded(base, *args)[source]¶
Concatenate string and zero-padded 4 digit number
- ocrd_utils.str.get_local_filename(url, start=None)[source]¶
Return local filename, optionally relative to
start
- Parameters:
url (string) – filename or URL
start (string) – Base path to remove from filename. Raise an exception if not a prefix of url
- ocrd_utils.str.partition_list(lst, chunks, chunk_index=None)[source]¶
Partition a list into roughly equally-sized chunks
- Parameters:
lst (list) – list to partition
chunks (int) – number of chunks to generate (not per chunk!)
- Keyword Arguments:
chunk_index (None|int) – If provided, return only a list consisting of this chunk
- Returns:
list(list())
- ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]¶
Derive a new file ID for an output file from an existing input file
ocrd_file
and the name of the output file’sfileGrp/@USE
,output_file_grp
. Ifocrd_file
’s ID contains the input file’s fileGrp name, then replace it byoutput_file_grp
. Else ifocrd_file
has apageId
but it is not contained in theocrd_file.ID
, thenconcatenate
output_file_grp
andocrd_file.pageId
.Otherwise concatenate
output_file_grp
with theocrd_file.ID
.Note:
make_file_id
cannot guarantee that the new ID is unique within an actualocrd_models.ocrd_mets.OcrdMets
. The caller is responsible for ensuring uniqueness of files to be added. Ultimately, ID conflicts will lead toocrd_models.ocrd_mets.OcrdMets.add_file()
raising an exception. This can be avoided if all processors usemake_file_id
consistently for ID generation.Note:
make_file_id
generates page-specific IDs. For IDs representing page segments orpc:AlternativeImage
files, the output ofmake_file_id
may need to be concatenated with a unique string for that sub-page element, such as “.IMG” or the segment ID.
- ocrd_utils.str.make_xml_id(idstr: str) str [source]¶
Turn
idstr
into a validxml:id
literal by replacing:
with_
, removing everything non-alphanumeric,.
and-
and prepending id_ ifidstr
starts with a number.
- ocrd_utils.str.nth_url_segment(url, n=-1)[source]¶
Return the last /-delimited segment of a URL-like string
- Parameters:
url (string)
n (integer) – index of segment, default: -1
- ocrd_utils.str.parse_json_file_with_comments(val)[source]¶
Parse a file of JSON interspersed with #-prefixed full-line comments
- ocrd_utils.str.parse_json_string_or_file(*values, resolve_preset_file=None)[source]¶
Parse a string as either the path to a JSON object or a literal JSON object.
Empty strings are equivalent to ‘{}’