ocrd_utils.str module¶
Utility functions for strings, paths and URL.
- ocrd_utils.str.assert_file_grp_cardinality(grps, n, msg=None)[source]¶
Assert that a string of comma-separated fileGrps contains exactly
nentries.
- ocrd_utils.str.concat_padded(base, *args)[source]¶
Concatenate string and zero-padded 4 digit number
- ocrd_utils.str.get_local_filename(url, start=None)[source]¶
Return local filename, optionally relative to
start- Parameters:
url (string) – filename or URL
start (string) – Base path to remove from filename. Raise an exception if not a prefix of url
- ocrd_utils.str.partition_list(lst, chunks, chunk_index=None)[source]¶
Partition a list into roughly equally-sized chunks
- Parameters:
lst (list) – list to partition
chunks (int) – number of chunks to generate (not per chunk!)
- Keyword Arguments:
chunk_index (None|int) – If provided, return only a list consisting of this chunk
- Returns:
list(list())
- ocrd_utils.str.make_file_id(ocrd_file, output_file_grp)[source]¶
Derive a new file ID for an output file from an existing input file
ocrd_fileand the name of the output file’sfileGrp/@USE,output_file_grp. Ifocrd_file’s ID contains the input file’s fileGrp name, then replace it byoutput_file_grp. Else ifocrd_filehas apageIdbut it is not contained in theocrd_file.ID, thenconcatenate
output_file_grpandocrd_file.pageId.Otherwise concatenate
output_file_grpwith theocrd_file.ID.Note:
make_file_idcannot guarantee that the new ID is unique within an actualocrd_models.ocrd_mets.OcrdMets. The caller is responsible for ensuring uniqueness of files to be added. Ultimately, ID conflicts will lead toocrd_models.ocrd_mets.OcrdMets.add_file()raising an exception. This can be avoided if all processors usemake_file_idconsistently for ID generation.Note:
make_file_idgenerates page-specific IDs. For IDs representing page segments orpc:AlternativeImagefiles, the output ofmake_file_idmay need to be concatenated with a unique string for that sub-page element, such as “.IMG” or the segment ID.
- ocrd_utils.str.make_xml_id(idstr: str) str[source]¶
Turn
idstrinto a validxml:idliteral by replacing:with_, removing everything non-alphanumeric,.and-and prepending id_ ifidstrstarts with a number.
- ocrd_utils.str.nth_url_segment(url, n=-1)[source]¶
Return the last /-delimited segment of a URL-like string
- Parameters:
url (string)
n (integer) – index of segment, default: -1
- ocrd_utils.str.parse_json_file_with_comments(val)[source]¶
Parse a file of JSON interspersed with #-prefixed full-line comments
- ocrd_utils.str.parse_json_string_or_file(*values, resolve_preset_file=None)[source]¶
Parse a string as either the path to a JSON object or a literal JSON object.
Empty strings are equivalent to ‘{}’