Skip to content

Add npm risky_new_dependency metadata rule#779

Open
christophetd wants to merge 3 commits into
v3from
feat/npm-risky-new-dependency-rule
Open

Add npm risky_new_dependency metadata rule#779
christophetd wants to merge 3 commits into
v3from
feat/npm-risky-new-dependency-rule

Conversation

@christophetd

@christophetd christophetd commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Adds an npm metadata rule, risky_new_dependency, that flags a package whose scanned version introduced a dependency that is itself high risk.

How it works

  • Diff vs the previous version (no extra fetch). The registry metadata GuardDog already downloads contains every version's dependencies plus a time map. The detector takes the version published just before the scanned one and computes newly-added dependency names (a version bump of an existing dep is not "new").
  • Each new dependency is scanned as a real package, sandboxed identically to the parent. The sub-scan runs as a subprocess guarddog npm scan <dep>, taking the same CLI path; the parent's sandbox decision is propagated via the GUARDDOG_SUBSCAN_SANDBOX env var. In-process scanning couldn't match the parent's sandbox (applied once, permanently).
  • Source-code rules only. The sub-scan excludes all metadata rules: a maliciously introduced dependency shows up in its code, and metadata rules (typosquatting, manifest-mismatch, …) were the dominant FP source. Excluding metadata also excludes this rule, so the check can't recurse.
  • Threshold. The parent is flagged when a new dep scores >= GUARDDOG_NEW_DEPENDENCY_RISK_THRESHOLD (default 5.0). It's a normal metadata rule, so it counts toward issues/exit codes and renders through all reporters.

Evidence that it doesn't add noise

I scanned the top 1k packages on npm, plus a sample of 2k amongst the top 1-to-20k most popular packages (total 3k packages). In the current state:

  • 169/3000 introduced a new dependency in their latest version (with regards to the second to last version)
  • Out of these 169 newly-introduced dependency, 1 would have matched the threshold 5/10 (@chakra-ui/react for the new @ark-ui/react@5.37.2 dependency))
image

Evidence that it would detect real threats

Axios: The payload was delivered through a new dependency plain-crypto-js, which GuardDog flags with score >= 5 when scanning with source code rules only:

$ poetry run guarddog npm scan https://github.com/DataDog/malicious-software-packages-dataset/raw/refs/heads/main/samples/npm/malicious_intent/plain-crypto-js/4.2.1/2026-03-31-plain-crypto-js-v4.2.1.zip --zip-password infected --sandbox
(...)
Assessment:  Suspicious  (5.3/10)

Mastra: Backdoored a bunch of packages by adding the easy-day-js dependency. This rule would have caught it too, since it flags easy-day-js with a score >= 5:

$ poetry run guarddog npm scan https://github.com/DataDog/malicious-software-packages-dataset/raw/refs/heads/main/samples/npm/malicious_intent/easy-day-js/1.11.22/2026-06-17-easy-day-js-v1.11.22.zip --zip-password infected --sandbox
(...)
Assessment:  High risk  (7.3/10)

Sample output

image

Open questions / discussion

  • So far we retrieve the previous version as "last version published before the currently scanned one, based on publishing time". We might want to do it based on semver versioning, but in the general case, they should be equivalent

Flags dependencies newly introduced in a package version (relative to the
previous published version) that are themselves high risk. The previous
version is derived from the registry 'time' map; any dependency name present
in the scanned version but absent from the previous one is scanned as a
standalone package via a subprocess invocation of guarddog, so it is sandboxed
identically to the parent. The sub-scan runs source-code rules only (all
metadata rules excluded, which also excludes this rule, preventing recursion),
and the parent is flagged when a new dependency scores at or above the
configurable threshold (GUARDDOG_NEW_DEPENDENCY_RISK_THRESHOLD, default 7.0).

Also extracts shared npm version-resolution helpers into guarddog/utils/npm.py
(reused by the project scanner) and stops the human-readable reporter from
emitting an empty location line for metadata findings.
@christophetd christophetd requested a review from a team as a code owner June 17, 2026 15:28
@christophetd christophetd marked this pull request as draft June 17, 2026 15:29

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 282b3c84c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread guarddog/analyzer/metadata/npm/risky_new_dependency.py
Comment thread guarddog/analyzer/metadata/npm/risky_new_dependency.py Outdated
With the sub-scan running source-code rules only, the 5.0-6.9 (suspicious)
band was empty of false positives across 3000 popular npm packages, so the
metadata noise that previously justified a 7.0 default is gone. Lower the
default to 5.0 to catch genuinely suspicious newly-added dependencies.
…endency

Addresses review feedback:
- The new-dependency diff and sub-scan now resolve npm aliases to the real
  package (e.g. "x": "npm:evil@1" scans evil, not x). Because the diff is
  built from resolved package names, an alias retargeted to a different package
  under the same key is also detected.
- optionalDependencies are now included alongside dependencies (npm installs
  them by default), closing an install-time coverage gap. devDependencies stay
  excluded (not installed for consumers).

Alias handling (NPM_ALIAS_PATTERN / resolve_npm_alias) is lifted into
guarddog/utils/npm.py and reused by the project scanner.
@christophetd christophetd marked this pull request as ready for review June 17, 2026 16:13

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f499dfd32a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

package so the diff and sub-scan target the aliased package, not the local
alias name (e.g. "x": "npm:evil@1" -> {"evil": "1"})."""
resolved: dict = {}
for section in ("dependencies", "optionalDependencies"):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include installed peer dependencies in the diff

When a package version newly adds a non-optional peerDependencies entry such as "evil": "^1.0.0", this loop ignores it, so new_dependencies stays empty and the risky package is never sub-scanned. npm v7+ installs peer dependencies by default (npm docs), so consumers that do not already have the peer will still install that newly introduced package while this detector misses the install-time path.

Useful? React with 👍 / 👎.

f"'{resolved_version or 'latest'}'"
)

command = [

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double check that's solid enough

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and it should work with poetry run, pip, uv tool run and uvx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant