ocrd.workspace_bagger module

class ocrd.workspace_bagger.WorkspaceBagger(resolver, strict=False)[source]

Bases: object

Serialize/De-serialize from OCRD-ZIP to workspace and back.

bag(workspace, ocrd_identifier, dest=None, ocrd_mets='mets.xml', ocrd_base_version_checksum=None, processes=1, skip_zip=False, tag_files=None)[source]

Bag a workspace

See https://ocr-d.github.com/ocrd_zip#packing-a-workspace-as-ocrd-zip

  • workspace (ocrd.Workspace) – workspace to bag

  • ord_identifier (string) – Ocrd-Identifier in bag-info.txt

  • dest (string) – Path of the generated OCRD-ZIP.

  • ord_mets (string) – Ocrd-Mets in bag-info.txt

  • ord_base_version_checksum (string) – Ocrd-Base-Version-Checksum in bag-info.txt

  • processes (integer) – Number of parallel processes checksumming

  • skip_zip (boolean) – Whether to leave directory unzipped

  • tag_files (list<string>) – Path names of additional tag files to be bagged at the root of the bag

spill(src, dest)[source]

Spill a workspace, i.e. unpack it and turn it into a workspace.

See https://ocr-d.github.com/ocrd_zip#unpacking-ocrd-zip-to-a-workspace

  • src (string) – Path to OCRD-ZIP

  • dest (string) – Path to directory to unpack data folder to


Validate conformance with BagIt and OCR-D bagit profile.

recreate_checksums(src, dest=None, overwrite=False)[source]

(Re)creates the files containing the checksums of a bag

This function uses bag.py to create new files: manifest-sha512.txt and tagminifest-sha512.txt for the bag. Also ‘Payload-Oxum’ in bag-info.txt will be set to the appropriate value.

  • src (string) – Path to Bag. May be an zipped or unziped bagit

  • dest (string) – Path to where the result should be stored. Not needed if overwrite is set

  • overwrite (bool) – Replace bag with newly created bag