Skip to content

WEB-4808: supply discovery daemon credentials via persisted config (fix scheduled idle)#141

Draft
vigneshsubbiah16 wants to merge 2 commits into
stagingfrom
vis1/web-4808-discovery-daemon-creds
Draft

WEB-4808: supply discovery daemon credentials via persisted config (fix scheduled idle)#141
vigneshsubbiah16 wants to merge 2 commits into
stagingfrom
vis1/web-4808-discovery-daemon-creds

Conversation

@vigneshsubbiah16

Copy link
Copy Markdown
Collaborator

Problem (WEB-4808, High)

The recurring ai.getunbound.discovery LaunchDaemon runs unbound-discovery scan as root, every 43200s, with no --api-key, no --domain, no EnvironmentVariables. A root daemon sees no user shell env, so the entry point finds no credentials and idles fail-open (exit 0) on every scheduled run. Fleet discovery therefore happened only once, at onboard time, via the one-shot inline scan — AI tools installed after onboarding were never re-inventoried.

Linear: https://linear.app/unboundsec/issue/WEB-4808

Fix

Two coordinated parts. The world-readable plist and onboard.sh.tmpl are left untouched (the binary self-loads config instead of baking secrets into the plist).

1. Persist creds at setup timebinary/src/unbound_hook/setup_cmd.py
_run_discovery now calls a new _write_discovery_config(opts) that writes /opt/unbound/etc/discovery.json (api_key + domain) atomically, 0600 root:wheel (mkstemp + os.replace), into the pkg-provisioned config dir (postinstall creates it 0750 root:wheel). Best-effort: a persist failure is logged but never aborts onboarding.

2. Read config as a fallbackpackaging/unbound_discovery_entry.py
Credential resolution now falls back to that file when --api-key/UNBOUND_API_KEY and --domain are absent. File-sourced creds are fed through the existing channels (key via UNBOUND_API_KEY env to keep it out of ps; domain via --domain argv). Explicit flags/env still win.

It also strips a leading scan token the way mcp-scan is shifted off upstream's strict argparse. With no creds the lenient parse tolerated scan (which masked the bug); the moment creds resolve, an unstripped scan would SystemExit(2) — so credential delivery alone would have turned the idle into an exit-2 instead of a working scan. Stripping it keeps the plist secret-free and lets the credentialed sweep run.

Security

  • Credentials never land in the world-readable plist (ProgramArguments stays scan, no EnvironmentVariables).
  • Config file is 0600 root:wheel, written atomically so a reader never sees a partial/wrong-permission file.

Fail-open preserved (sacred)

If the config file is absent, unreadable, malformed, empty, or wrong-typed, the loader returns {} (it never raises) and the daemon idles-and-exits-0 exactly as today. The fix only adds a credential source; it can never crash, crash-loop, or block dev work.

Tests

  • packaging/test_discovery_entry.py (new, 12 cases, green locally): config file resolves creds for the daemon's scan invocation and the sweep runs; explicit flags/env win over the file; partial / missing / malformed / empty config all still idle exit 0; loader never raises; --version short-circuits before config.
  • packaging/test-discovery-daemon.sh: added variant 3 — the exact production shape (plist with scan token, no plist creds, creds only in the config file) — asserting the daemon scans + POSTs and emits neither "No configuration provided" nor "unrecognized arguments". Backs up/restores any pre-existing host config so it stays non-destructive.
12 passed in 0.08s   # packaging/test_discovery_entry.py

(The frozen-binary daemon variant 3 and the CLI suite run in CI against the built bundle; coding_discovery_tools is only fetched at build time, so the in-process pytest stubs the upstream sweep to assert the credentialed path is reached.)

Coordination

Draft — do not merge.

🤖 Generated with Claude Code

vigneshsubbiah16 and others added 2 commits June 13, 2026 10:56
The recurring unbound-discovery LaunchDaemon runs `unbound-discovery scan`
as root, with no --api-key/--domain, no EnvironmentVariables, and no shell
env. Every scheduled run found no credentials and idled fail-open (exit 0),
so fleet discovery happened only once at onboard time via the inline scan.

Fix (two coordinated parts, plist and onboard.sh left untouched):

1. setup_cmd.py: _run_discovery now persists the discovery key + backend
   domain to /opt/unbound/etc/discovery.json (0600 root:wheel, atomic
   write via mkstemp+os.replace) before the one-shot scan. The dir is the
   pkg-provisioned config dir (postinstall, 0750 root:wheel). Best-effort:
   a persist failure is logged, never aborts onboarding.

2. unbound_discovery_entry.py: credential resolution now falls back to that
   config file when --api-key/UNBOUND_API_KEY and --domain are absent, so
   the scheduled root daemon can find its creds. File-sourced creds are fed
   through the existing channels (key via UNBOUND_API_KEY env to keep it out
   of `ps`; domain via --domain argv). Explicit flags/env still win.

   Also strips a leading `scan` token the same way `mcp-scan` is shifted off
   upstream's strict argparse: with no creds the lenient parse tolerated it
   (masking the bug), but once creds resolve an unstripped `scan` would
   SystemExit(2). Stripping it keeps the world-readable plist secret-free
   AND lets the credentialed sweep actually run.

Fail-open preserved: absent/unreadable/malformed config returns {} and the
daemon idles-and-exits-0 exactly as before. The config loader never raises.

Tests:
- packaging/test_discovery_entry.py (new, 12 cases): config file resolves
  creds for the daemon's `scan` invocation; flags/env win; partial/missing/
  malformed config still idles exit 0; loader never raises; --version
  short-circuits.
- packaging/test-discovery-daemon.sh: variant 3 reproduces production
  exactly (plist with `scan` token, NO plist creds, creds only in the
  config file) and asserts the daemon scans + POSTs; backs up/restores any
  pre-existing host config so it stays non-destructive.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o CI

Convergent reviewer should-fix items on the root-on-fleet discovery
credential path (PR #141). Surgical, fail-open preserved.

1. (M1 / dir-creation mode) _write_discovery_config now creates the leaf
   /opt/unbound/etc dir as 0750 instead of relying on the umask-default
   0755 (world-traversable), and when running as root enforces 0750 +
   root:root ownership even on a pre-existing dir. This reinforces the
   directory-level symlink/TOCTOU barrier the 0600 credential file relies
   on. It no longer auto-creates ancestors above `etc`: a missing
   /opt/unbound (pkg-provisioned) returns a reason instead of building the
   tree world-readable. The atomic 0600 root:wheel file write
   (mkstemp+fchmod+fchown+os.replace) is unchanged, and the whole thing
   stays best-effort/fail-open — a persist failure is reported, never
   aborts onboarding.

2. The 12 hermetic fail-open tests (packaging/test_discovery_entry.py)
   were not run by CI — only the macOS release build runs pytest, which
   would let a fail-open regression merge. static-checks.yml (every PR)
   now runs `pytest packaging/`, gating the SACRED idle-and-exit-0 path.

3. New hermetic writer test (packaging/test_discovery_writer.py): asserts
   the persisted file is 0600 and round-trips api_key+domain, the etc dir
   is not world-traversable, and a missing-grandparent / non-writable
   parent returns a reason string rather than raising.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant