ocrd.resolver module¶
- class ocrd.resolver.Resolver[source]¶
Bases:
object
Handle uploads, downloads, repository access, and manage temporary directories
- download_to_directory(directory, url, basename=None, if_exists='skip', subdir=None, retries=None, timeout=None)[source]¶
Download a URL
url
to a local file indirectory
.If
url
looks like a file path, check whether that exists. If it does exist and is withindirectory` already, return early. If it does exist but is outside of ``directory
. copy it. Ifurl` does not appear to be a file path, try downloading via HTTP, retrying ``retries
times with timeouttimeout
between calls.If
basename
is not given butsubdir
is, setbasename
to the last path segment ofurl
.- If the target file already exists within
directory
, behavior depends onif_exists
: skip
(default): do nothing and return early. Note that thisoverwrite
: overwrite the existing fileraise
: raise aFileExistsError
- Parameters:
directory (string) – Directory to download files to
url (string) – URL to download from
- Keyword Arguments:
basename (string, None) – basename part of the filename on disk. Defaults to last path segment of
url
if unset.if_exists (string, "skip") – What to do if target file already exists. One of
skip
(default),overwrite
orraise
subdir (string, None) – Subdirectory to create within the directory. Think
mets:fileGrp[@USE]
.retries (int, None) – Number of retries to attempt on network failure.
timeout (tuple, None) – Timeout in seconds for establishing a connection and reading next chunk of data.
- Returns:
Local filename string, relative to directory
- If the target file already exists within
- workspace_from_url(mets_url, dst_dir=None, clobber_mets=False, mets_basename=None, download=False, src_baseurl=None, mets_server_url=None, **kwargs)[source]¶
Create a workspace from a METS by URL (i.e. clone if
mets_url
is remote ordst_dir
is given).- Parameters:
mets_url (string) – Source METS URL or filesystem path
- Keyword Arguments:
dst_dir (string, None) – Target directory for the workspace. By default create a temporary directory under
ocrd.constants.TMP_PREFIX
. (The resulting path can be retrieved viaocrd.Workspace.directory
.)clobber_mets (boolean, False) – Whether to overwrite existing
mets.xml
. By default existingmets.xml
will raise an exception.download (boolean, False) – Whether to also download all the files referenced by the METS
src_baseurl (string, None) – Base URL for resolving relative file locations
mets_server_url (string, None) – URI of TCP or local path of UDS for METS server handling the OcrdMets of the workspace. By default the METS will be read from and written to the filesystem directly.
() (**kwargs) – Passed on to
OcrdMets.find_files
if download == True
Download (clone)
mets_url
tomets.xml
indst_dir
, unless the former is already local and the latter isnone
or already identical to its directory name.- Returns:
a new
Workspace
- workspace_from_nothing(directory, mets_basename='mets.xml', clobber_mets=False)[source]¶
Create an empty workspace.
- Parameters:
directory (string) – Target directory for the workspace. If
none
, create a temporary directory underocrd.constants.TMP_PREFIX
. (The resulting path can be retrieved viaocrd.Workspace.directory
.)- Keyword Arguments:
clobber_mets (boolean, False) – Whether to overwrite existing
mets.xml
. By default existingmets.xml
will raise an exception.- Returns:
a new
Workspace
- resolve_mets_arguments(directory, mets_url, mets_basename='mets.xml', mets_server_url=None)[source]¶
Resolve the
--mets
,--mets-basename
, –directory`,--mets-server-url
, arguments into a coherent set of arguments according to https://github.com/OCR-D/core/issues/517