ocrd.resolver module

class ocrd.resolver.Resolver[source]

Bases: object

Handle uploads, downloads, repository access, and manage temporary directories

download_to_directory(directory, url, basename=None, if_exists='skip', subdir=None, retries=None, timeout=None)[source]

Download a URL url to a local file in directory.

If url looks like a file path, check whether that exists. If it does exist and is within directory` already, return early. If it does exist but is outside of ``directory. copy it. If url` does not appear to be a file path, try downloading via HTTP, retrying ``retries times with timeout timeout between calls.

If basename is not given but subdir is, set basename to the last path segment of url.

If the target file already exists within directory, behavior depends on if_exists:
  • skip (default): do nothing and return early. Note that this

  • overwrite: overwrite the existing file

  • raise: raise a FileExistsError

Parameters:
  • directory (string) – Directory to download files to

  • url (string) – URL to download from

Keyword Arguments:
  • basename (string, None) – basename part of the filename on disk. Defaults to last path segment of url if unset.

  • if_exists (string, "skip") – What to do if target file already exists. One of skip (default), overwrite or raise

  • subdir (string, None) – Subdirectory to create within the directory. Think mets:fileGrp[@USE].

  • retries (int, None) – Number of retries to attempt on network failure.

  • timeout (tuple, None) – Timeout in seconds for establishing a connection and reading next chunk of data.

Returns:

Local filename string, relative to directory

workspace_from_url(mets_url, dst_dir=None, clobber_mets=False, mets_basename=None, download=False, src_baseurl=None, mets_server_url=None, **kwargs)[source]

Create a workspace from a METS by URL (i.e. clone if mets_url is remote or dst_dir is given).

Parameters:

mets_url (string) – Source METS URL or filesystem path

Keyword Arguments:
  • dst_dir (string, None) – Target directory for the workspace. By default create a temporary directory under ocrd.constants.TMP_PREFIX. (The resulting path can be retrieved via ocrd.Workspace.directory.)

  • clobber_mets (boolean, False) – Whether to overwrite existing mets.xml. By default existing mets.xml will raise an exception.

  • download (boolean, False) – Whether to also download all the files referenced by the METS

  • src_baseurl (string, None) – Base URL for resolving relative file locations

  • mets_server_url (string, None) – URI of TCP or local path of UDS for METS server handling the OcrdMets of the workspace. By default the METS will be read from and written to the filesystem directly.

  • () (**kwargs) – Passed on to OcrdMets.find_files if download == True

Download (clone) mets_url to mets.xml in dst_dir, unless the former is already local and the latter is none or already identical to its directory name.

Returns:

a new Workspace

workspace_from_nothing(directory, mets_basename='mets.xml', clobber_mets=False)[source]

Create an empty workspace.

Parameters:

directory (string) – Target directory for the workspace. If none, create a temporary directory under ocrd.constants.TMP_PREFIX. (The resulting path can be retrieved via ocrd.Workspace.directory.)

Keyword Arguments:

clobber_mets (boolean, False) – Whether to overwrite existing mets.xml. By default existing mets.xml will raise an exception.

Returns:

a new Workspace

resolve_mets_arguments(directory, mets_url, mets_basename='mets.xml', mets_server_url=None)[source]

Resolve the --mets, --mets-basename, –directory`, --mets-server-url, arguments into a coherent set of arguments according to https://github.com/OCR-D/core/issues/517