ocrd_models.ocrd_mets module

API to METS

class ocrd_models.ocrd_mets.OcrdMets(**kwargs)[source]

Bases: OcrdXmlDocument

API to a single METS file

static empty_mets(now: str | None = None, cache_flag: bool = False)[source]

Create an empty METS file from bundled template.

property unique_identifier: str | None

Get the unique identifier by looking through mods:identifier See specs for details.

property agents: List[OcrdAgent]

List all :py:class:`ocrd_models.ocrd_agent.OcrdAgent`s

add_agent(*args, **kwargs) OcrdAgent[source]

Add an ocrd_models.ocrd_agent.OcrdAgent to the list of agents in the metsHdr.

property file_groups: List[str]

fileGrp` entries.

Type:

List the @USE of all `mets

find_all_files(*args, **kwargs) List[OcrdFile][source]

Like find_files() but return a list of all results. Equivalent to list(self.find_files(...))

find_files(ID: str | None = None, fileGrp: str | None = None, pageId: str | None = None, mimetype: str | None = None, url: str | None = None, local_filename: str | None = None, local_only: bool = False, include_fileGrp: List[str] | None = None, exclude_fileGrp: List[str] | None = None) Iterator[OcrdFile][source]

Search mets:file entries in this METS document and yield results. The ID, pageId, fileGrp, url and mimetype parameters can each be either a literal string, or a regular expression if the string starts with // (double slash). If it is a regex, the leading // is removed and candidates are matched against the regex with re.fullmatch. If it is a literal string, comparison is done with string equality. The pageId parameter supports the numeric range operator ... For example, to find all files in pages PHYS_0001 to PHYS_0003, PHYS_0001..PHYS_0003 will be expanded to PHYS_0001,PHYS_0002,PHYS_0003. :keyword ID: @ID of the mets:file :kwtype ID: string :keyword fileGrp: @USE of the mets:fileGrp to list files of :kwtype fileGrp: string :keyword pageId: @ID of the corresponding physical mets:structMap entry (physical page) :kwtype pageId: string :keyword url: @xlink:href remote/original URL of mets:Flocat of mets:file :kwtype url: string :keyword local_filename: @xlink:href local/cached filename of mets:Flocat of mets:file :kwtype local_filename: string :keyword mimetype: @MIMETYPE of mets:file :kwtype mimetype: string :keyword local: Whether to restrict results to local files in the filesystem :kwtype local: boolean :keyword include_fileGrp: List of allowed file groups :kwtype include_fileGrp: list[str] :keyword exclude_fileGrp: List of disallowd file groups :kwtype exclude_fileGrp: list[str]

Yields:

ocrd_models:ocrd_file:OcrdFile instantiations

add_file_group(fileGrp: str) _Element[source]

Add a new mets:fileGrp. :param fileGrp: @USE of the new mets:fileGrp. :type fileGrp: string

rename_file_group(old: str, new: str) None[source]

Rename a mets:fileGrp by changing the @USE from old to new.

remove_file_group(USE: str, recursive: bool = False, force: bool = False) None[source]

Remove a mets:fileGrp (single fixed @USE or multiple regex @USE) :param USE: @USE of the mets:fileGrp to delete. Can be a regex if prefixed with // :type USE: string :param recursive: Whether to recursively delete each mets:file in the group :type recursive: boolean :param force: Do not raise an exception if mets:fileGrp does not exist :type force: boolean

add_file(fileGrp: str, mimetype: str | None = None, url: str | None = None, ID: str | None = None, pageId: str | None = None, force: bool = False, local_filename: str | None = None, ignore: bool = False, **kwargs) OcrdFile[source]

Instantiate and add a new ocrd_models.ocrd_file.OcrdFile. :param fileGrp: @USE of mets:fileGrp to add to :type fileGrp: string

Keyword Arguments:
  • mimetype (string) – @MIMETYPE of the mets:file to use

  • url (string) – @xlink:href (URL or path) of the mets:file to use

  • ID (string) – @ID of the mets:file to use

  • pageId (string) – @ID in the physical mets:structMap to link to

  • force (boolean) – Whether to add the file even if a mets:file with the same @ID already exists.

  • ignore (boolean) – Do not look for existing files at all. Shift responsibility for preventing errors from duplicate ID to the user.

  • local_filename (string)

remove_file(*args, **kwargs) List[OcrdFile] | OcrdFile[source]

Delete each ocrd:file matching the query. Same arguments as find_files()

remove_one_file(ID: str | OcrdFile, fileGrp: str | None = None) OcrdFile[source]

Delete an existing ocrd_models.ocrd_file.OcrdFile. :param ID: @ID of the mets:file to delete Can also be an ocrd_models.ocrd_file.OcrdFile to avoid search via ID. :type ID: string|OcrdFile :param fileGrp: @USE of the mets:fileGrp containing the mets:file. Used only for optimization. :type fileGrp: string

Returns:

The old ocrd_models.ocrd_file.OcrdFile reference.

property physical_pages: List[str]

List all page IDs (the @ID of each physical mets:structMap mets:div)

get_physical_pages(for_fileIds: List[str] | None = None, for_pageIds: str | None = None, return_divs: bool = False) List[str | _Element][source]

List all page IDs (the @ID of each physical mets:structMap mets:div), optionally for a subset of mets:file @ID for_fileIds, or for a subset selector expression (comma-separated, range, and/or regex) for_pageIds. If return_divs is set, returns div memory objects instead of strings of ids

set_physical_page_for_file(pageId: str, ocrd_file: OcrdFile, order: str | None = None, orderlabel: str | None = None) None[source]

Set the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file, creating all structures if necessary. :param pageId: @ID of the physical mets:structMap entry to use :type pageId: string :param ocrd_file: existing ocrd_models.ocrd_file.OcrdFile object :type ocrd_file: object

Keyword Arguments:
  • order (string) – @ORDER to use

  • orderlabel (string) – @ORDERLABEL to use

update_physical_page_attributes(page_id: str, **kwargs) None[source]
get_physical_page_for_file(ocrd_file: OcrdFile) str | None[source]

Get the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file.

remove_physical_page(ID: str) None[source]

Delete page (physical mets:structMap mets:div entry @ID) ID.

remove_physical_page_fptr(fileId: str) List[str][source]

Delete all mets:fptr[@FILEID = fileId] to mets:file[@ID == fileId] for fileId from all mets:div entries in the physical mets:structMap. :returns: fptrs were deleted from :rtype: List of pageIds that mets

property physical_pages_labels: Dict[str, Tuple[str | None, str | None, str | None]]

Map all page IDs (the @ID of each physical mets:structMap mets:div) to their @ORDER, @ORDERLABEL and @LABEL attributes, if any.

merge(other_mets, force: bool = False, fileGrp_mapping: Dict[str, str] | None = None, fileId_mapping: Dict[str, str] | None = None, pageId_mapping: Dict[str, str] | None = None, after_add_cb: Callable[[OcrdFile], Any] | None = None, **kwargs) None[source]

Add all files from other_mets. Accepts the same kwargs as find_files() :keyword force: Whether to add_file`s with force (overwriting existing ``mets:file``s) :kwtype force: boolean :keyword fileGrp_mapping: Map :py:attr:`other_mets() fileGrp to fileGrp in this METS :kwtype fileGrp_mapping: dict :keyword fileId_mapping: Map other_mets file ID to file ID in this METS :kwtype fileId_mapping: dict :keyword pageId_mapping: Map other_mets page ID to page ID in this METS :kwtype pageId_mapping: dict :keyword after_add_cb: Callback received after file is added to the METS :kwtype after_add_cb: function