Skip to content

fix(python): verify TLS cert and model hash on download#342

Open
AAtomical wants to merge 1 commit into
HyperInspire:masterfrom
AAtomical:fix/tls-and-hash-verify
Open

fix(python): verify TLS cert and model hash on download#342
AAtomical wants to merge 1 commit into
HyperInspire:masterfrom
AAtomical:fix/tls-and-hash-verify

Conversation

@AAtomical

Copy link
Copy Markdown

get_model() disabled TLS certificate and hostname verification (ssl.CERT_NONE) and never hash-checked freshly downloaded files — the existing md5/sha256 check only ran against already-cached files to decide whether to re-download. A network-position attacker could therefore replace a model in transit with an arbitrary payload that the application would load without any integrity check (CWE-295, CWE-494).

  • Drop check_hostname=False / verify_mode=ssl.CERT_NONE; use the default verifying SSL context from ssl.create_default_context().
  • After a download completes, hash the file and compare it against the expected value, deleting the file and raising on mismatch. The ignore_verification flag is honored for parity with the cached-file path.

get_model() disabled TLS certificate and hostname verification
(ssl.CERT_NONE) and never hash-checked freshly downloaded files —
the existing md5/sha256 check only ran against already-cached files
to decide whether to re-download. A network-position attacker could
therefore replace a model in transit with an arbitrary payload that
the application would load without any integrity check
(CWE-295, CWE-494).

- Drop check_hostname=False / verify_mode=ssl.CERT_NONE; use the
  default verifying SSL context from ssl.create_default_context().
- After a download completes, hash the file and compare it against
  the expected value, deleting the file and raising on mismatch.
  The ignore_verification flag is honored for parity with the
  cached-file path.

@devloai devloai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary:

  • Fixes two security vulnerabilities in get_model(): disabled TLS verification (CWE-295) and missing integrity check on freshly downloaded files (CWE-494).
  • Drops check_hostname=False / ssl.CERT_NONE — uses the default verifying SSL context instead.
  • Adds post-download SHA-256 hash verification, deleting the file and raising on mismatch, honoring the ignore_verification flag for parity with the cached-file path.

Review Summary:

The core security fixes are correct and well-motivated. Two issues were found in the new post-download verification block:

  1. Misleading key name (high): The _MODEL_LIST dict uses the key "md5" but all values are 64-character SHA-256 hex digests (confirmed by looking at get_file_hash_sha256 usage on both the cached and freshly-downloaded paths). This PR is the right moment to rename or annotate the key — a future maintainer could "correct" the hash function to hashlib.md5 and silently break the security guarantee being established here.

  2. Lost exception chain (medium): When the new hash-check RuntimeError (lines 223–226) is raised, it is caught by the outer except Exception block which re-raises a new RuntimeError without raise ... from e. The precise "expected X, got Y" hash information ends up buried as a string inside the generic "Failed to download model: ..." message rather than appearing as a proper causal chain in the traceback, making debugging harder. Using raise RuntimeError(...) from e and unlink(missing_ok=True) in the except handler would resolve this cleanly.

Suggestions

  • Rename the "md5" key in _MODEL_LIST to "sha256" throughout the file to match the actual hash algorithm used. Apply
  • Add hash verification to the ModelScope download path (_download_from_modelscope_with_cache) for consistent integrity guarantees across both download modes. Apply

print(f"Warning: Model verification skipped for '{name}' as requested.")
else:
downloaded_hash = get_file_hash_sha256(model_file)
if downloaded_hash != model_info["md5"]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is named "md5" in _MODEL_LIST but it stores SHA-256 hashes (64-char hex strings), and the comparison on line 221 is against get_file_hash_sha256(). This is inconsistent with the cached-file path which also uses get_file_hash_sha256() for the same field (line 175–176) — so the logic is at least self-consistent, but the key name "md5" is a clear misnomer that is preserved here.

This PR is a good opportunity to rename the field (or add an alias key), or at minimum add a comment clarifying that "md5" actually holds a SHA-256 digest, so future maintainers don't "fix" it by switching to hashlib.md5.

Suggested change
if downloaded_hash != model_info["md5"]:
if downloaded_hash != model_info["md5"]: # key name is misleading; value is a SHA-256 hex digest
actions

Feedback: Rate this comment to help me improve future code reviews:

  • 👍 Good - Helpful and accurate
  • 👎 Poor - Wrong, unclear, or unhelpful
  • Skip if you don't have any strong opinions either way.

raise RuntimeError(
f"Downloaded model hash mismatch for '{name}': "
f"expected {model_info['md5']}, got {downloaded_hash}"
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hash verification fails, the RuntimeError raised on line 223 is caught by the outer except Exception block, which re-raises a new RuntimeError. The original exception is not chained (raise ... from e is missing), so the specific mismatch message (expected vs. actual hash) gets buried inside the string f"Failed to download model: {e}" rather than appearing as a proper cause in the traceback.

Additionally, downloading_flag IS cleaned up by the except handler (line 234–235 checks .exists() before unlinking), so that's fine.

Fix suggestion for the outer except:

        except Exception as e:
            model_file.unlink(missing_ok=True)
            downloading_flag.unlink(missing_ok=True)
            raise RuntimeError(f"Failed to download model: {e}") from e
actions

Feedback: Rate this comment to help me improve future code reviews:

  • 👍 Good - Helpful and accurate
  • 👎 Poor - Wrong, unclear, or unhelpful
  • Skip if you don't have any strong opinions either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant