fix(python): verify TLS cert and model hash on download#342
Conversation
get_model() disabled TLS certificate and hostname verification (ssl.CERT_NONE) and never hash-checked freshly downloaded files — the existing md5/sha256 check only ran against already-cached files to decide whether to re-download. A network-position attacker could therefore replace a model in transit with an arbitrary payload that the application would load without any integrity check (CWE-295, CWE-494). - Drop check_hostname=False / verify_mode=ssl.CERT_NONE; use the default verifying SSL context from ssl.create_default_context(). - After a download completes, hash the file and compare it against the expected value, deleting the file and raising on mismatch. The ignore_verification flag is honored for parity with the cached-file path.
There was a problem hiding this comment.
PR Summary:
- Fixes two security vulnerabilities in
get_model(): disabled TLS verification (CWE-295) and missing integrity check on freshly downloaded files (CWE-494). - Drops
check_hostname=False/ssl.CERT_NONE— uses the default verifying SSL context instead. - Adds post-download SHA-256 hash verification, deleting the file and raising on mismatch, honoring the
ignore_verificationflag for parity with the cached-file path.
Review Summary:
The core security fixes are correct and well-motivated. Two issues were found in the new post-download verification block:
-
Misleading key name (high): The
_MODEL_LISTdict uses the key"md5"but all values are 64-character SHA-256 hex digests (confirmed by looking atget_file_hash_sha256usage on both the cached and freshly-downloaded paths). This PR is the right moment to rename or annotate the key — a future maintainer could "correct" the hash function tohashlib.md5and silently break the security guarantee being established here. -
Lost exception chain (medium): When the new hash-check
RuntimeError(lines 223–226) is raised, it is caught by the outerexcept Exceptionblock which re-raises a newRuntimeErrorwithoutraise ... from e. The precise "expected X, got Y" hash information ends up buried as a string inside the generic"Failed to download model: ..."message rather than appearing as a proper causal chain in the traceback, making debugging harder. Usingraise RuntimeError(...) from eandunlink(missing_ok=True)in the except handler would resolve this cleanly.
Suggestions
| print(f"Warning: Model verification skipped for '{name}' as requested.") | ||
| else: | ||
| downloaded_hash = get_file_hash_sha256(model_file) | ||
| if downloaded_hash != model_info["md5"]: |
There was a problem hiding this comment.
The field is named "md5" in _MODEL_LIST but it stores SHA-256 hashes (64-char hex strings), and the comparison on line 221 is against get_file_hash_sha256(). This is inconsistent with the cached-file path which also uses get_file_hash_sha256() for the same field (line 175–176) — so the logic is at least self-consistent, but the key name "md5" is a clear misnomer that is preserved here.
This PR is a good opportunity to rename the field (or add an alias key), or at minimum add a comment clarifying that "md5" actually holds a SHA-256 digest, so future maintainers don't "fix" it by switching to hashlib.md5.
| if downloaded_hash != model_info["md5"]: | |
| if downloaded_hash != model_info["md5"]: # key name is misleading; value is a SHA-256 hex digest |
actions
Feedback: Rate this comment to help me improve future code reviews:
- 👍 Good - Helpful and accurate
- 👎 Poor - Wrong, unclear, or unhelpful
- Skip if you don't have any strong opinions either way.
| raise RuntimeError( | ||
| f"Downloaded model hash mismatch for '{name}': " | ||
| f"expected {model_info['md5']}, got {downloaded_hash}" | ||
| ) |
There was a problem hiding this comment.
When hash verification fails, the RuntimeError raised on line 223 is caught by the outer except Exception block, which re-raises a new RuntimeError. The original exception is not chained (raise ... from e is missing), so the specific mismatch message (expected vs. actual hash) gets buried inside the string f"Failed to download model: {e}" rather than appearing as a proper cause in the traceback.
Additionally, downloading_flag IS cleaned up by the except handler (line 234–235 checks .exists() before unlinking), so that's fine.
Fix suggestion for the outer except:
except Exception as e:
model_file.unlink(missing_ok=True)
downloading_flag.unlink(missing_ok=True)
raise RuntimeError(f"Failed to download model: {e}") from eactions
Feedback: Rate this comment to help me improve future code reviews:
- 👍 Good - Helpful and accurate
- 👎 Poor - Wrong, unclear, or unhelpful
- Skip if you don't have any strong opinions either way.
get_model() disabled TLS certificate and hostname verification (ssl.CERT_NONE) and never hash-checked freshly downloaded files — the existing md5/sha256 check only ran against already-cached files to decide whether to re-download. A network-position attacker could therefore replace a model in transit with an arbitrary payload that the application would load without any integrity check (CWE-295, CWE-494).