ocrd_models.ocrd_mets module¶
API to METS
- class ocrd_models.ocrd_mets.OcrdMets(**kwargs)[source]¶
Bases:
OcrdXmlDocumentAPI to a single METS file
- static empty_mets(now: str | None = None, cache_flag: bool = False)[source]¶
Create an empty METS file from bundled template.
- property unique_identifier: str | None¶
Get the unique identifier by looking through
mods:identifierSee specs for details.
- property agents: List[OcrdAgent]¶
List all
ocrd_models.ocrd_agent.OcrdAgententries.
- add_agent(**kwargs) OcrdAgent[source]¶
Add an
ocrd_models.ocrd_agent.OcrdAgentto the list of agents in themetsHdr.
- property file_groups: List[str]¶
List the
@USEof allmets:fileGrpentries.
- find_all_files(*args, **kwargs) List[OcrdFile][source]¶
Like
find_files()but return a list of all results. Equivalent tolist(self.find_files(...))
- find_files(ID: str | None = None, fileGrp: str | None = None, pageId: str | None = None, mimetype: str | None = None, url: str | None = None, local_filename: str | None = None, local_only: bool = False, include_fileGrp: List[str] | None = None, exclude_fileGrp: List[str] | None = None) Iterator[OcrdFile][source]¶
Search
mets:fileentries in this METS document and yield results. TheID,pageId,fileGrp,urlandmimetypeparameters can each be either a literal string, or a regular expression if the string starts with//(double slash).If it is a regex, the leading
//is removed and candidates are matched against the regex with re.fullmatch. If it is a literal string, comparison is done with string equality.The
pageIdparameter also supports comma-separated lists, as well as the numeric range operator..and the negation operator~.For example, to find all files in pages
PHYS_0001toPHYS_0003, the both expressionsPHYS_0001..PHYS_0003andPHYS_0001,PHYS_0002,PHYS_0003will be expanded to the same 3 pages. To find all files above that subrange, both expressions~PHYS_0001..PHYS_0003and~PHYS_0001,~PHYS_0002,~PHYS_0003will be expanded toPHYS_0004and upwards.- Keyword Arguments:
ID (string) –
@IDof themets:filefileGrp (string) –
@USEof themets:fileGrpto list files ofpageId (string) –
@IDof the corresponding physicalmets:structMapentry (physical page)url (string) –
@xlink:hrefremote/original URL ofmets:Flocatofmets:filelocal_filename (string) –
@xlink:hreflocal/cached filename ofmets:Flocatofmets:filemimetype (string) –
@MIMETYPEofmets:filelocal (boolean) – Whether to restrict results to local files in the filesystem
include_fileGrp (list[str]) – List of allowed file groups
exclude_fileGrp (list[str]) – List of disallowd file groups
- Yields:
ocrd_models:ocrd_file:OcrdFileinstantiations
- add_file_group(fileGrp: str) _Element[source]¶
Add a new
mets:fileGrp. :param fileGrp:@USEof the newmets:fileGrp. :type fileGrp: string
- rename_file_group(old: str, new: str) None[source]¶
Rename a
mets:fileGrpby changing the@USEfromoldtonew.
- remove_file_group(USE: str, recursive: bool = False, force: bool = False) None[source]¶
Remove a
mets:fileGrp(single fixed@USEor multiple regex@USE) :param USE:@USEof themets:fileGrpto delete. Can be a regex if prefixed with//:type USE: string :param recursive: Whether to recursively delete eachmets:filein the group :type recursive: boolean :param force: Do not raise an exception ifmets:fileGrpdoes not exist :type force: boolean
- add_file(fileGrp: str, mimetype: str | None = None, url: str | None = None, ID: str | None = None, pageId: str | None = None, force: bool = False, local_filename: str | None = None, ignore: bool = False, **kwargs) OcrdFile[source]¶
Instantiate and add a new
ocrd_models.ocrd_file.OcrdFile. :param fileGrp:@USEofmets:fileGrpto add to :type fileGrp: string- Keyword Arguments:
mimetype (string) –
@MIMETYPEof themets:fileto useurl (string) –
@xlink:href(URL or path) of themets:fileto useID (string) –
@IDof themets:fileto usepageId (string) –
@IDin the physicalmets:structMapto link toforce (boolean) – Whether to add the file even if a
mets:filewith the same@IDalready exists.ignore (boolean) – Do not look for existing files at all. (Shifts responsibility for preventing errors from duplicate ID to the user.)
local_filename (string)
- remove_file(*args, **kwargs) List[OcrdFile] | OcrdFile[source]¶
Delete each
ocrd:filematching the query. Same arguments asfind_files()
- remove_one_file(ID: str | OcrdFile, fileGrp: str = None) OcrdFile[source]¶
Delete an existing
ocrd_models.ocrd_file.OcrdFile. :param ID:@IDof themets:fileto delete.(Can also be an
ocrd_models.ocrd_file.OcrdFileto avoid search viaID.)- Parameters:
fileGrp (string) –
@USEof themets:fileGrpcontaining themets:file. (Used only for optimization.)- Returns:
The old
ocrd_models.ocrd_file.OcrdFilereference.
- property physical_pages: List[str]¶
List all page IDs (the
@IDof each physicalmets:structMapmets:div)
- get_physical_pages(for_fileIds: List[str] | None = None, for_pageIds: str | None = None, return_divs: bool = False) List[str | _Element][source]¶
List all page IDs (the
@IDof each physicalmets:structMapmets:div), optionally for a subset ofmets:file@IDfor_fileIds, or for a subset selector expression (comma-separated, range, and/or regex)for_pageIds. If return_divs is set, returns div memory objects instead of strings of ids
- get_physical_page_patterns(page_attr_patterns: List[METS_DIV_ATTRIBUTE_PATTERN]) List[_Element][source]¶
- set_physical_page_for_file(pageId: str, ocrd_file: OcrdFile, order: str | None = None, orderlabel: str | None = None) None[source]¶
Set the physical page ID (
@IDof the physicalmets:structMapmets:diventry) corresponding to themets:fileocrd_file, creating all structures if necessary. :param pageId:@IDof the physicalmets:structMapentry to use :type pageId: string :param ocrd_file: existingocrd_models.ocrd_file.OcrdFileobject :type ocrd_file: object- Keyword Arguments:
order (string) –
@ORDERto useorderlabel (string) –
@ORDERLABELto use
- get_physical_page_for_file(ocrd_file: OcrdFile) str | None[source]¶
Get the physical page ID (
@IDof the physicalmets:structMapmets:diventry) corresponding to themets:fileocrd_file.
- remove_physical_page(ID: str) None[source]¶
Delete page (physical
mets:structMapmets:diventry@ID)ID.
- remove_physical_page_fptr(fileId: str) List[str][source]¶
Delete all
mets:fptr[@FILEID = fileId]tomets:file[@ID == fileId]forfileIdfrom allmets:diventries in the physicalmets:structMap.- Returns:
fptrs were deleted from
- Return type:
List of pageIds that mets
- property physical_pages_labels: Dict[str, Tuple[str | None, str | None, str | None]]¶
Map all page IDs (the
@IDof each physicalmets:structMapmets:div) to their@ORDER,@ORDERLABELand@LABELattributes, if any.
- merge(other_mets, force: bool = False, fileGrp_mapping: Dict[str, str] | None = None, fileId_mapping: Dict[str, str] | None = None, pageId_mapping: Dict[str, str] | None = None, after_add_cb: Callable[[OcrdFile], Any] | None = None, **kwargs) None[source]¶
Add all files from other_mets. Accepts the same kwargs as
find_files():keyword force: Whether to doadd_file()withforce(overwriting existingmets:fileentries) :kwtype force: boolean :keyword fileGrp_mapping: Mapother_metsfileGrp to fileGrp in this METS :kwtype fileGrp_mapping: dict :keyword fileId_mapping: Mapother_metsfile ID to file ID in this METS :kwtype fileId_mapping: dict :keyword pageId_mapping: Mapother_metspage ID to page ID in this METS :kwtype pageId_mapping: dict :keyword after_add_cb: Callback received after file is added to the METS :kwtype after_add_cb: function