ocrd_validators.workspace_validator module¶
Validating a workspace.
- class ocrd_validators.workspace_validator.WorkspaceValidator(resolver, mets_url, src_dir=None, skip=None, download=False, page_strictness='strict', page_coordinate_consistency='poly', include_fileGrp=None, exclude_fileGrp=None)[source]¶
Bases:
object
Validator for OcrdMets <../ocrd_models/ocrd_models.ocrd_mets.html>.
Construct a new WorkspaceValidator.
- Parameters:
resolver (Resolver)
mets_url (string)
src_dir (string)
skip (list)
download (boolean)
page_strictness ("strict"|"lax"|"fix"|"off") – how strict to check multi-level TextEquiv consistency of PAGE XML files
page_coordinate_consistency ("poly"|"baseline"|"both"|"off") –
check whether each segment’s coords are fully contained within its parent’s:
include_fileGrp (list[str]) – filegrp whitelist
exclude_fileGrp (list[str]) – filegrp blacklist
- static check_file_grp(workspace, input_file_grp=None, output_file_grp=None, page_id=None, report=None)[source]¶
Return a report on whether input_file_grp is/are in workspace.mets and output_file_grp is/are not. To be run before processing
- Parameters:
workspacec (Workspace)
input_file_grp (list|string)
output_file_grp (list|string)
page_id (list|string)
- static validate(*args, **kwargs)[source]¶
Validates the workspace of a METS URL against the specs
- Parameters:
resolver (
ocrd.Resolver
) – Resolvermets_url (string) – URL of the METS file
src_dir (string, None) – Directory containing mets file
skip (list) – Validation checks to omit. One or more of ‘mets_unique_identifier’, ‘mets_file_group_names’, ‘mets_files’, ‘pixel_density’, ‘dimension’, ‘url’, ‘multipage’, ‘page’, ‘page_xsd’, ‘mets_xsd’, ‘mets_fileid_page_pcgtsid’
download (boolean) – Whether to download remote file references temporarily during validation (like a processor would)
- Returns:
report (
ValidationReport
) Report on the validity