ocrd_validators.ocrd_tool_validator module

Validating ocrd-tool.json.

See specs.

class ocrd_validators.ocrd_tool_validator.OcrdToolValidator(schema, validator_class=<class 'jsonschema.validators.Draft6Validator'>)[source]

Bases: JsonValidator

JsonValidator validating against the ocrd-tool.json schema.

Construct a JsonValidator.

Parameters:
  • schema (dict) –

  • validator_class (Draft6Validator|DefaultValidatingDraft6Validator) –

static validate(obj, schema={'additionalProperties': False, 'description': 'Schema for tools by OCR-D MP', 'properties': {'dockerhub': {'description': 'DockerHub image', 'type': 'string'}, 'git_url': {'description': 'Github/Gitlab URL', 'format': 'url', 'type': 'string'}, 'tools': {'additionalProperties': False, 'patternProperties': {'ocrd-.*': {'additionalProperties': False, 'properties': {'categories': {'description': 'Tools belong to this categories, representing modules within the OCR-D project structure', 'items': {'enum': ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance'], 'type': 'string'}, 'type': 'array'}, 'description': {'description': 'Concise description what the tool does'}, 'executable': {'description': 'The name of the CLI executable in $PATH', 'type': 'string'}, 'input_file_grp': {'description': 'Input fileGrp@USE this tool expects by default', 'items': {'type': 'string'}, 'type': 'array'}, 'output_file_grp': {'description': 'Output fileGrp@USE this tool produces by default', 'items': {'type': 'string'}, 'type': 'array'}, 'parameters': {'description': 'Object describing the parameters of a tool. Keys are parameter names, values sub-schemas.', 'patternProperties': {'.*': {'additionalProperties': False, 'properties': {'additionalProperties': {'description': 'Whether an object value may contain properties not explicitly defined', 'type': 'boolean'}, 'cacheable': {'default': False, 'description': "If parameter is reference to file: Whether the file should be cached, e.g. because it is large and won't change.", 'type': 'boolean'}, 'content-type': {'default': 'application/octet-stream', 'description': 'The media type of resources this processor expects for this parameter. Most processors use files for resources (e.g.  `*.traineddata` for `ocrd-tesserocr-recognize`) while others use directories of files (e.g. `default` for `ocrd-eynollah-segment`).  If a parameter requires directories, it must set `content-type` to `text/directory`.\n', 'type': 'string'}, 'default': {'description': 'Default value when not provided by the user'}, 'description': {'description': 'Concise description of syntax and semantics of this parameter'}, 'enum': {'description': 'List the allowed values if a fixed list.', 'type': 'array'}, 'exclusiveMaximum': {'description': 'Maximum value for number parameters, excluding the maximum', 'type': 'number'}, 'exclusiveMinimum': {'description': 'Minimum value for number parameters, excluding the minimum', 'type': 'number'}, 'format': {'description': 'Subtype, such as `float` for type `number` or `uri` for type `string`.'}, 'items': {'description': 'describe the items of an array further', 'type': 'object'}, 'maximum': {'description': 'Maximum value for number parameters, including the maximum', 'type': 'number'}, 'minimum': {'description': 'Minimum value for number parameters, including the minimum', 'type': 'number'}, 'multipleOf': {'description': 'For number values, those values must be multiple of this number', 'type': 'number'}, 'properties': {'description': 'Describe the properties of an object value', 'type': 'object'}, 'required': {'description': 'Whether this parameter is required', 'type': 'boolean'}, 'type': {'description': 'Data type of this parameter', 'enum': ['string', 'number', 'boolean', 'object', 'array'], 'type': 'string'}}, 'required': ['description', 'type'], 'type': 'object'}}, 'type': 'object'}, 'resource_locations': {'default': ['data', 'cwd', 'system', 'module'], 'description': 'The locations in the filesystem this processor supports for resource lookup', 'items': {'enum': ['data', 'cwd', 'system', 'module'], 'type': 'string'}, 'type': 'array'}, 'resources': {'description': 'Resources for this processor', 'items': {'additionalProperties': False, 'properties': {'description': {'description': 'A description of the resource', 'type': 'string'}, 'name': {'description': 'Name to store the resource as', 'type': 'string'}, 'parameter_usage': {'default': 'as-is', 'description': 'Defines how the parameter is to be used', 'enum': ['as-is', 'without-extension'], 'type': 'string'}, 'path_in_archive': {'default': '.', 'description': 'if type is archive, the resource is at this location in the archive', 'type': 'string'}, 'size': {'description': 'Size of the resource in bytes', 'type': 'number'}, 'type': {'default': 'file', 'description': 'Type of the URL', 'enum': ['file', 'directory', 'archive'], 'type': 'string'}, 'url': {'description': 'URLs of all components of this resource', 'type': 'string'}, 'version_range': {'default': '>= 0.0.1', 'description': 'Range of supported versions, syntax like in PEP 440', 'type': 'string'}}, 'required': ['url', 'description', 'name', 'size'], 'type': 'object'}, 'type': 'array'}, 'steps': {'description': 'This tool can be used at these steps in the OCR-D functional model', 'items': {'enum': ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis'], 'type': 'string'}, 'type': 'array'}}, 'required': ['description', 'steps', 'executable', 'categories', 'input_file_grp'], 'type': 'object'}}, 'type': 'object'}, 'version': {'description': 'Version of the tool, expressed as MAJOR.MINOR.PATCH.', 'pattern': '^[0-9]+\\.[0-9]+\\.[0-9]+$', 'type': 'string'}}, 'required': ['version', 'git_url', 'tools'], 'type': 'object'})[source]

Validate against ocrd-tool.json schema.