ocrd.cli.workspace module

OCR-D CLI: workspace management

ocrd workspace

Working with workspace

ocrd workspace [OPTIONS] COMMAND [ARGS]...

Options

-d, --directory <WORKSPACE_DIR>

Changes the workspace folder location [default: METS_URL directory or .]”

-M, --mets-basename <mets_basename>

METS file basename. Deprecated, use –mets/–directory

-m, --mets <METS_URL>

The path/URL of the METS file [default: WORKSPACE_DIR/mets.xml]

--backup

Backup mets.xml whenever it is saved.

Environment variables

WORKSPACE_DIR

Provide a default for -d

add

Add a file or http(s) URL FNAME to METS in a workspace. If FNAME is not an http(s) URL and is not a workspace-local existing file, try to copy to workspace.

ocrd workspace add [OPTIONS] FNAME

Options

-G, --file-grp <FILE_GRP>

Required fileGrp USE

-i, --file-id <FILE_ID>

Required ID for the file

-m, --mimetype <TYPE>

Media type of the file. Guessed from extension if not provided

-g, --page-id <PAGE_ID>

ID of the physical page

-C, --check-file-exists

Whether to ensure FNAME exists

--ignore

Do not check whether file exists.

--force

If file with ID already exists, replace it. No effect if –ignore is set.

Arguments

FNAME

Required argument

backup

Backing and restoring workspaces - dev edition

ocrd workspace backup [OPTIONS] COMMAND [ARGS]...

add

Create a new backup

ocrd workspace backup add [OPTIONS]

list

List backups

ocrd workspace backup list [OPTIONS]

restore

Restore backup BAK

ocrd workspace backup restore [OPTIONS] BAK

Options

-f, --choose-first

Restore first matching version if more than one

Arguments

BAK

Required argument

undo

Restore the last backup

ocrd workspace backup undo [OPTIONS]

bulk-add

Add files in bulk to an OCR-D workspace.

FILE_GLOB can either be a shell glob expression or a list of files.

–regex is applied to the absolute path of every file in FILE_GLOB and can define named groups that can be used in –page-id, –file-id, –mimetype, –url and –file-grp by referencing the named group ‘grp’ in the regex as ‘{{ grp }}’.

b Example:

ocrd workspace bulk-add \

–regex ‘^.*/(?P<fileGrp>[^/]+)/page_(?P<pageid>.*).(?P<ext>[^.]*)$’ \ –file-id ‘FILE_{{ fileGrp }}_{{ pageid }}’ \ –page-id ‘PHYS_{{ pageid }}’ \ –file-grp “{{ fileGrp }}” \ –url ‘{{ fileGrp }}/FILE_{{ pageid }}.{{ ext }}’ \ path/to/files//.*

ocrd workspace bulk-add [OPTIONS] FILE_GLOB...

Options

-r, --regex <regex>

Required Regular expression matching the FILE_GLOB filesystem paths to define named captures usable in the other parameters

-m, --mimetype <mimetype>

Media type of the file. If not provided, guess from filename

-g, --page-id <page_id>

physical page ID of the file

-i, --file-id <file_id>

Required ID of the file

-u, --url <url>

Required local filesystem path in the workspace directory (copied from source file if different)

-G, --file-grp <file_grp>

Required File group USE of the file

-n, --dry-run

Don’t actually do anything to the METS or filesystem, just preview

-I, --ignore

Disable checking for existing file entries (faster)

-f, --force

Replace existing file entries with the same ID (no effect when –ignore is set, too)

-s, --skip

Skip files not matching –regex (instead of failing)

Arguments

FILE_GLOB

Required argument(s)

clone

Create a workspace from METS_URL and return the directory

METS_URL can be a URL, an absolute path or a path relative to $PWD. If METS_URL is not provided, use –mets accordingly. METS_URL can also be an OAI-PMH GetRecord URL wrapping a METS file.

ocrd workspace clone [OPTIONS] METS_URL [WORKSPACE_DIR]

Options

-f, --clobber-mets

Overwrite existing METS file

-a, --download

Download all files and change location in METS file after cloning

Arguments

METS_URL

Required argument

WORKSPACE_DIR

Optional argument

find

Find files.

(If any FILTER starts with //, then its remainder

will be interpreted as a regular expression.)

ocrd workspace find [OPTIONS]

Options

-i, --file-id <FILTER>

ID

-g, --page-id <FILTER>

Page ID

-m, --mimetype <FILTER>

Media type to look for

-G, --file-grp <FILTER>

fileGrp USE

-k, --output-field <output_field>

Output field. Repeat for multiple fields, will be joined with tab

Options

url | mimetype | pageId | ID | fileGrp | basename | basename_without_extension | local_filename

--download

Download found files to workspace and change location in METS file

get-id

Get METS id if any

ocrd workspace get-id [OPTIONS]

init

Create a workspace with an empty METS file in –directory.

ocrd workspace init [OPTIONS] [DIRECTORY]

Options

-f, --clobber-mets

Clobber mets.xml if it exists

Arguments

DIRECTORY

Optional argument

list-group

List fileGrp USE attributes

ocrd workspace list-group [OPTIONS]

list-page

List physical page IDs

ocrd workspace list-page [OPTIONS]

merge

Merges this workspace with the workspace that contains METS_PATH

The --file-id, --page-id, --mimetype and --fileGrp options have the same semantics as in ocrd workspace find, see ocrd workspace find --help for an explanation.

ocrd workspace merge [OPTIONS] METS_PATH

Options

--copy-files

Copy files as well

--fileGrp-mapping <filegrp_mapping>

JSON object mapping src to dest fileGrp

-i, --file-id <FILTER>

ID

-g, --page-id <FILTER>

Page ID

-m, --mimetype <FILTER>

Media type to look for

-G, --file-grp <FILTER>

fileGrp USE

Arguments

METS_PATH

Required argument

prune-files

Removes mets:files that point to non-existing local files

(If any FILTER starts with //, then its remainder

will be interpreted as a regular expression.)

ocrd workspace prune-files [OPTIONS]

Options

-G, --file-grp <FILTER>

fileGrp USE

-m, --mimetype <FILTER>

Media type to look for

-g, --page-id <FILTER>

Page ID

-i, --file-id <FILTER>

ID

remove

Delete files (given by their ID attribute ID).

(If any ID starts with //, then its remainder

will be interpreted as a regular expression.)

ocrd workspace remove [OPTIONS] [ID]...

Options

-k, --keep-file

Do not delete file from file system

-f, --force

Continue even if mets:file or file on file system does not exist

Arguments

ID

Optional argument(s)

remove-group

Delete fileGrps (given by their USE attribute GROUP).

(If any GROUP starts with //, then its remainder

will be interpreted as a regular expression.)

ocrd workspace remove-group [OPTIONS] [GROUP]...

Options

-r, --recursive

Delete any files in the group before the group itself

-f, --force

Continue removing even if group or containing files not found in METS

-k, --keep-files

Do not delete files from file system

Arguments

GROUP

Optional argument(s)

rename-group

Rename fileGrp (USE attribute NEW to OLD).

ocrd workspace rename-group [OPTIONS] OLD NEW

Arguments

OLD

Required argument

NEW

Required argument

set-id

Set METS ID.

If one of the supported identifier mechanisms is used, will set this identifier.

Otherwise will create a new <mods:identifier type=”purl”>{{ ID }}</mods:identifier>.

ocrd workspace set-id [OPTIONS] ID

Arguments

ID

Required argument

validate

Validate a workspace

METS_URL can be a URL, an absolute path or a path relative to $PWD. If not given, use –mets accordingly.

Check that the METS and its referenced file contents abide by the OCR-D specifications.

ocrd workspace validate [OPTIONS] [METS_URL]

Options

-a, --download

Download all files

-s, --skip <skip>

Tests to skip

Options

imagefilename | dimension | mets_unique_identifier | mets_file_group_names | mets_files | pixel_density | page | page_xsd | mets_xsd | url

--page-textequiv-consistency, --page-strictness <page_textequiv_consistency>

How strict to check PAGE multi-level textequiv consistency

Options

strict | lax | fix | off

--page-coordinate-consistency <page_coordinate_consistency>

How fierce to check PAGE multi-level coordinate consistency

Options

poly | baseline | both | off

Arguments

METS_URL

Optional argument

class ocrd.cli.workspace.WorkspaceCtx(directory, mets_url, mets_basename, automatic_backup)[source]

Bases: object