Skip to content

feat(instructions): honor scoped applyTo via CWD-implicit overlap (#231)#238

Merged
jrob5756 merged 6 commits into
microsoft:mainfrom
franklixuefei:xuefl/applyto-scope-cwd-implicit
Jun 26, 2026
Merged

feat(instructions): honor scoped applyTo via CWD-implicit overlap (#231)#238
jrob5756 merged 6 commits into
microsoft:mainfrom
franklixuefei:xuefl/applyto-scope-cwd-implicit

Conversation

@franklixuefei

Copy link
Copy Markdown
Member

Today _is_always_on_instructions_file only loads .github/instructions/ files with applyTo: "**", silently dropping every file with a scoped glob (**/*.cs, services/foo/**, etc.). This is the bug described in issue #231: real Azure repos have 0/9 instruction files loading despite their applyTo values being valid per GitHub's documented convention.

This change:

  • Adds ConventionDirectory.extract_scope: Callable[[Path], str | None], the convention-author-facing single-callable scope extractor. Returns None to opt out, ALWAYS_ON_SCOPE ("**") for always-include, or any glob string for scoped evaluation. The bidirectional overlap test lives in conductor core so future scoped conventions (Cursor's globs:, Cline's future scope field, etc.) only register their own extract_scope — no duplication of the overlap semantic.
  • Adds _scope_overlaps + _single_glob_overlaps (literal-prefix approximation) and the helpers _normalize_scope_path, _apply_convention_filters, _discovery_reason. Multi-glob applyTo values separated by ; or , (both observed in real-world data) are supported; brace expansion is documented as unsupported.
  • Replaces _is_always_on_instructions_file with _extract_apply_to. Files without applyTo (or with non-string applyTo) still opt out — matching today's behavior for those cases.
  • Computes cwd_rel per walk level (relative to the convention's owner directory, not the workspace root) so nested services/GW/.github/instructions/foo.md with applyTo: "src/**" resolves src relative to services/GW — the closest-wins semantic that makes nested conventions meaningful.
  • Adds DiscoveredInstruction + discover_workspace_instructions_detailed so callers can reason about why each file was included without re-parsing the filesystem.
  • Adds --print-loaded-instructions CLI flag (plus full plumbing through cli/run.py, cli/bg_runner.py) that dumps the resolved list with scope + reason to stderr — for debugging "why isn't my instruction loading" without instrumenting source code.

Honest framing for the design tradeoff (per the issue body and Brent Rusinow's review feedback): at repo-root CWD (the dominant case for programmatic launchers like skyship / octane / ATS pr-reviewer), the bidirectional overlap loads every scoped file. That is strictly better than today's silent skip (correctness fix), but it is not token-bloat narrowing for programmatic callers. Narrowing only kicks in when a human cds into a subdir. Correctness > token economy is the deliberate v1 tradeoff; programmatic narrowing can be revisited later if measured to matter.

Tests: 83 passed in tests/test_config/test_instructions.py covering the overlap helper (positive, disjoint, segment-boundary, multi-glob ;/,, leading-/, Windows backslash, leading-** always-match, empty-separator-only rejection, documented brace-expansion limitation), the new discover_workspace_instructions_detailed shape, the _walk_directory_convention 3-tuple yield, root-CWD-loads-scoped and subdir-narrows-scoped end-to-end behaviors, owner-relative cwd_rel for nested conventions, and bg_runner forwarding of the new flag.

End-to-end repro from the issue body verified: a fresh repo with scoped applyTo files plus .github/copilot-instructions.md loads all 4 files at root CWD (vs. 1 today) and narrows correctly when invoked from a disjoint subdir.

Closes #231

@franklixuefei

Copy link
Copy Markdown
Member Author

@jrob5756 kindly requesting your review on this, thanks Jason!

@codecov-commenter

codecov-commenter commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.56115% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@960723e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/conductor/cli/run.py 93.54% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #238   +/-   ##
=======================================
  Coverage        ?   86.47%           
=======================================
  Files           ?       68           
  Lines           ?    12210           
  Branches        ?        0           
=======================================
  Hits            ?    10559           
  Misses          ?     1651           
  Partials        ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

franklixuefei pushed a commit to franklixuefei/conductor that referenced this pull request Jun 1, 2026
- R1-001: pin _discovery_reason invariant with an assert instead of the
  unreachable 'or scope is None' branch, so a future regression in
  _apply_convention_filters surfaces loudly instead of silently
  relabeling a buggy None scope as 'always-on'.

- R1-002: thread a single start_dir through both build_instructions_preamble
  and the --print-loaded-instructions debug dump, so the printed list
  cannot drift from what was actually loaded (the exact failure mode
  the flag exists to debug).

- R1-003: resolve git_root before comparing it against the (already-
  resolved) walk pointer in discover_workspace_instructions. On
  symlinked git roots the unresolved/resolved paths differ, causing
  the parent-walk to overshoot the git root. Adds a regression test
  (skipped on Windows where symlink creation needs elevation).
franklixuefei pushed a commit to franklixuefei/conductor that referenced this pull request Jun 1, 2026
Adds direct unit tests for the three uncovered code paths flagged by
codecov on PR microsoft#238:

* _print_loaded_instructions helper: empty-state + populated-state
  formatting (header, paths, reasons, scope/no-scope, applyTo='**'
  suppression, stdout-stays-clean invariant).
* _discovery_reason's convention.extract_scope is None branch:
  exercised via a custom ConventionDirectory with no extract_scope.
* _apply_convention_filters opt-out: extract_scope returning None
  correctly rejects files.
* discover_workspace_instructions_detailed's ValueError fallback when
  start_dir.relative_to(current) fails (symlinked-ancestor edge case):
  monkey-patched to force the path, asserting the over-include bias.
* _normalize_scope_path ./ and . branches (authors sometimes write
  ./services/GW/** to be explicit about repo-relative paths).

Patch coverage on src/conductor/config/instructions.py now 99% (only
pre-existing _find_git_root filesystem-root and legacy _unwrap_preamble
fallback remain). _print_loaded_instructions fully covered.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
franklixuefei pushed a commit to franklixuefei/conductor that referenced this pull request Jun 1, 2026
Pulls the 7-line --print-loaded-instructions branch out of
run_workflow_async into a small helper. The branch was previously
uncovered (codecov flagged 3 lines on PR microsoft#238) because it lives inside
a 500+ line async function with provider/MCP/dashboard setup that's
expensive to mock for a single conditional.

The new helper has the same semantics (no-op when disabled or when
auto-discovery didn't run; otherwise discover-and-print using the
caller-supplied start_dir) and is directly unit-tested for all three
branches. The call site in run_workflow_async becomes a single line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@franklixuefei

Copy link
Copy Markdown
Member Author

@jrob5756 kindly requesting your review on this, thanks Jason!

xuefl-msft and others added 5 commits June 18, 2026 14:33
…crosoft#231)

Today `_is_always_on_instructions_file` only loads `.github/instructions/`
files with `applyTo: "**"`, silently dropping every file with a scoped
glob (`**/*.cs`, `services/foo/**`, etc.). This is the bug described in
issue microsoft#231: real Azure repos have 0/9 instruction files loading despite
their `applyTo` values being valid per GitHub's documented convention.

This change:

* Adds `ConventionDirectory.extract_scope: Callable[[Path], str | None]`,
  the convention-author-facing single-callable scope extractor. Returns
  `None` to opt out, `ALWAYS_ON_SCOPE` (`"**"`) for always-include, or any
  glob string for scoped evaluation. The bidirectional overlap test lives
  in conductor core so future scoped conventions (Cursor's `globs:`,
  Cline's future scope field, etc.) only register their own
  `extract_scope` — no duplication of the overlap semantic.
* Adds `_scope_overlaps` + `_single_glob_overlaps` (literal-prefix
  approximation) and the helpers `_normalize_scope_path`,
  `_apply_convention_filters`, `_discovery_reason`. Multi-glob `applyTo`
  values separated by `;` or `,` (both observed in real-world data) are
  supported; brace expansion is documented as unsupported.
* Replaces `_is_always_on_instructions_file` with `_extract_apply_to`.
  Files without `applyTo` (or with non-string `applyTo`) still opt out —
  matching today's behavior for those cases.
* Computes `cwd_rel` per walk level (relative to the convention's owner
  directory, not the workspace root) so nested
  `services/GW/.github/instructions/foo.md` with `applyTo: "src/**"`
  resolves `src` relative to `services/GW` — the closest-wins semantic
  that makes nested conventions meaningful.
* Adds `DiscoveredInstruction` + `discover_workspace_instructions_detailed`
  so callers can reason about why each file was included without
  re-parsing the filesystem.
* Adds `--print-loaded-instructions` CLI flag (plus full plumbing through
  `cli/run.py`, `cli/bg_runner.py`) that dumps the resolved list with
  scope + reason to stderr — for debugging "why isn't my instruction
  loading" without instrumenting source code.

Honest framing for the design tradeoff (per the issue body and Brent
Rusinow's review feedback): at repo-root CWD (the dominant case for
programmatic launchers like skyship / octane / ATS pr-reviewer), the
bidirectional overlap loads every scoped file. That is strictly better
than today's silent skip (correctness fix), but it is *not* token-bloat
narrowing for programmatic callers. Narrowing only kicks in when a human
`cd`s into a subdir. Correctness > token economy is the deliberate v1
tradeoff; programmatic narrowing can be revisited later if measured to
matter.

Tests: 83 passed in `tests/test_config/test_instructions.py` covering
the overlap helper (positive, disjoint, segment-boundary, multi-glob
`;`/`,`, leading-`/`, Windows backslash, leading-`**` always-match,
empty-separator-only rejection, documented brace-expansion limitation),
the new `discover_workspace_instructions_detailed` shape, the
`_walk_directory_convention` 3-tuple yield, root-CWD-loads-scoped and
subdir-narrows-scoped end-to-end behaviors, owner-relative `cwd_rel` for
nested conventions, and bg_runner forwarding of the new flag.

End-to-end repro from the issue body verified: a fresh repo with scoped
`applyTo` files plus `.github/copilot-instructions.md` loads all 4 files
at root CWD (vs. 1 today) and narrows correctly when invoked from a
disjoint subdir.

Closes microsoft#231

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- R1-001: pin _discovery_reason invariant with an assert instead of the
  unreachable 'or scope is None' branch, so a future regression in
  _apply_convention_filters surfaces loudly instead of silently
  relabeling a buggy None scope as 'always-on'.

- R1-002: thread a single start_dir through both build_instructions_preamble
  and the --print-loaded-instructions debug dump, so the printed list
  cannot drift from what was actually loaded (the exact failure mode
  the flag exists to debug).

- R1-003: resolve git_root before comparing it against the (already-
  resolved) walk pointer in discover_workspace_instructions. On
  symlinked git roots the unresolved/resolved paths differ, causing
  the parent-walk to overshoot the git root. Adds a regression test
  (skipped on Windows where symlink creation needs elevation).
Adds direct unit tests for the three uncovered code paths flagged by
codecov on PR microsoft#238:

* _print_loaded_instructions helper: empty-state + populated-state
  formatting (header, paths, reasons, scope/no-scope, applyTo='**'
  suppression, stdout-stays-clean invariant).
* _discovery_reason's convention.extract_scope is None branch:
  exercised via a custom ConventionDirectory with no extract_scope.
* _apply_convention_filters opt-out: extract_scope returning None
  correctly rejects files.
* discover_workspace_instructions_detailed's ValueError fallback when
  start_dir.relative_to(current) fails (symlinked-ancestor edge case):
  monkey-patched to force the path, asserting the over-include bias.
* _normalize_scope_path ./ and . branches (authors sometimes write
  ./services/GW/** to be explicit about repo-relative paths).

Patch coverage on src/conductor/config/instructions.py now 99% (only
pre-existing _find_git_root filesystem-root and legacy _unwrap_preamble
fallback remain). _print_loaded_instructions fully covered.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pulls the 7-line --print-loaded-instructions branch out of
run_workflow_async into a small helper. The branch was previously
uncovered (codecov flagged 3 lines on PR microsoft#238) because it lives inside
a 500+ line async function with provider/MCP/dashboard setup that's
expensive to mock for a single conditional.

The new helper has the same semantics (no-op when disabled or when
auto-discovery didn't run; otherwise discover-and-print using the
caller-supplied start_dir) and is directly unit-tested for all three
branches. The call site in run_workflow_async becomes a single line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@franklixuefei franklixuefei force-pushed the xuefl/applyto-scope-cwd-implicit branch from a7c3a2c to 77d4ca2 Compare June 18, 2026 22:34
@franklixuefei

Copy link
Copy Markdown
Member Author

@jrob5756 kindly requesting your review on this, thanks Jason!

@jrob5756

jrob5756 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

@jrob5756 kindly requesting your review on this, thanks Jason!

Sorry for the delay here, @franklixuefei . Got really busy on a couple unrelated projects. Will try to wrap this up this week.

@jrob5756 jrob5756 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a thorough the code changes. The core fix is right and well covered. I rebuilt discovery on a scratch repo and the overlap math holds up: root CWD loads every scoped file, subdirs narrow, and a nested .github/instructions resolves its applyTo against the right owner dir. Tests pass and ruff/ty are clean.

Nothing here blocks merge. The one theme worth a real look is silent drops: a malformed, list-valued, or empty applyTo quietly removes a file from discovery, and the logger.warning fallback is effectively invisible because conductor installs no logging handlers, so those records only reach Python's bare lastResort stderr and never the dashboard or the --web-bg logs. Everything else is type/comment/test polish. Details inline.

Comment thread src/conductor/config/instructions.py
Comment thread src/conductor/config/instructions.py Outdated
Comment thread src/conductor/config/instructions.py Outdated
Comment thread src/conductor/config/instructions.py
Comment thread src/conductor/cli/run.py Outdated
Comment thread src/conductor/cli/run.py Outdated
Comment thread src/conductor/cli/run.py Outdated
Comment thread src/conductor/cli/run.py Outdated
@franklixuefei

Copy link
Copy Markdown
Member Author

Took a thorough the code changes. The core fix is right and well covered. I rebuilt discovery on a scratch repo and the overlap math holds up: root CWD loads every scoped file, subdirs narrow, and a nested .github/instructions resolves its applyTo against the right owner dir. Tests pass and ruff/ty are clean.

Nothing here blocks merge. The one theme worth a real look is silent drops: a malformed, list-valued, or empty applyTo quietly removes a file from discovery, and the logger.warning fallback is effectively invisible because conductor installs no logging handlers, so those records only reach Python's bare lastResort stderr and never the dashboard or the --web-bg logs. Everything else is type/comment/test polish. Details inline.

@jrob5756 thanks for your review! Addressed most comment while leaving one comment open for your take. Thanks!

…Reject Enum sentinel, surface unusable applyTo, type/wording polish

- split applyTo 'absent' (silent opt-out) from 'present-but-unusable' (list/empty/
  separator-only) and warn on the latter so a real authoring slip isn't a silent drop
- reason: str -> DiscoveryReason(StrEnum) (FILE_CONVENTION/ALWAYS_ON/SCOPE_OVERLAP)
- _REJECT: object() -> one-member _Reject(Enum) for real is-narrowing (filter chain type-checks)
- log before the can't-happen relative_to permissive fallback (mirrors the _discovery_reason assert)
- run.py: list[Any] -> list[DiscoveredInstruction] via TYPE_CHECKING; 'loaded'->'discovered';
  drop stale repr comment; compare against ALWAYS_ON_SCOPE not the literal '**'
- tests: +3 for the absent/list/empty applyTo split; constructions use DiscoveryReason

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@franklixuefei franklixuefei force-pushed the xuefl/applyto-scope-cwd-implicit branch from 69e810b to 2e2b121 Compare June 24, 2026 23:21
@franklixuefei franklixuefei requested a review from jrob5756 June 24, 2026 23:21
@franklixuefei

Copy link
Copy Markdown
Member Author

@jrob5756 Just a gentle nudge on re-reviewing this after the comments were resolved (except for one of them needing your call). Wanted this to make it to the 0.1.20 release train :-P Thanks!

@jrob5756 jrob5756 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review threads addressed and resolved. Accepted logger.warning as the channel for the unusable-applyTo warnings (it reaches stderr by default; dashboard/JSONL surfacing can be a fast-follow). Core fix is solid and well-tested, CI green. LGTM 🚀

@jrob5756 jrob5756 merged commit e3d73b4 into microsoft:main Jun 26, 2026
9 checks passed
jrob5756 added a commit that referenced this pull request Jun 27, 2026
Bump version 0.1.19 -> 0.1.20 and finalize the changelog.

Changelog (0.1.20):
- Added: Hermes provider (#235), cost budget enforcement (#212),
  external-workflow-friction knobs — output_mode / max_parse_recovery_attempts /
  gate-respond CLI / Windows paths (#234), templated reasoning.effort &
  context_tier (#263), scoped applyTo instruction loading (#238).
- Fixed: Copilot model attribution for auto-routed runs (#268),
  claude-agent-sdk default tool preset when tools: omitted (#269).

Re-locked uv.lock to record 0.1.20. Quality gates green locally
(ruff, ty, pytest excl. real_api/performance: 3673 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Honor scoped applyTo in .github/instructions/ via CWD-derived workspace scope

4 participants