ocrd_validators.workspace_validator module

Validating a workspace.

class ocrd_validators.workspace_validator.WorkspaceValidator(resolver, mets_url, src_dir=None, skip=None, download=False, page_strictness='strict', page_coordinate_consistency='poly', include_fileGrp=None, exclude_fileGrp=None)[source]

Bases: object

Validator for OcrdMets <../ocrd_models/ocrd_models.ocrd_mets.html>.

Construct a new WorkspaceValidator.

Parameters:
  • resolver (Resolver)

  • mets_url (string)

  • src_dir (string)

  • skip (list)

  • download (boolean)

  • page_strictness ("strict"|"lax"|"fix"|"off") – how strict to check multi-level TextEquiv consistency of PAGE XML files

  • page_coordinate_consistency ("poly"|"baseline"|"both"|"off") –

    check whether each segment’s coords are fully contained within its parent’s:

    • ”poly”: *Region/TextLine/Word/Glyph in Border/*Region/TextLine/Word

    • ”baseline”: Baseline in TextLine

    • ”both”: both poly and baseline checks

    • ”off”: no coordinate checks

  • include_fileGrp (list[str]) – filegrp whitelist

  • exclude_fileGrp (list[str]) – filegrp blacklist

static check_file_grp(workspace, input_file_grp=None, output_file_grp=None, page_id=None, report=None)[source]

Return a report on whether input_file_grp is/are in workspace.mets and output_file_grp is/are not. To be run before processing

Parameters:
  • workspacec (Workspace)

  • input_file_grp (list|string)

  • output_file_grp (list|string)

  • page_id (list|string)

static validate(*args, **kwargs)[source]

Validates the workspace of a METS URL against the specs

Parameters:
  • resolver (ocrd.Resolver) – Resolver

  • mets_url (string) – URL of the METS file

  • src_dir (string, None) – Directory containing mets file

  • skip (list) – Validation checks to omit. One or more of ‘mets_unique_identifier’, ‘mets_file_group_names’, ‘mets_files’, ‘pixel_density’, ‘dimension’, ‘url’, ‘multipage’, ‘page’, ‘page_xsd’, ‘mets_xsd’, ‘mets_fileid_page_pcgtsid’

  • download (boolean) – Whether to download remote file references temporarily during validation (like a processor would)

Returns:

report (ValidationReport) Report on the validity