feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180) by brrusino · Pull Request #215 · microsoft/conductor

brrusino · 2026-05-20T01:02:45Z

Summary

Implements Phase 1 of issue #180: bundles the existing plugin reference docs as package data and wires them into the instructions pipeline with an opt-in flag. This gives agents deep understanding of Conductor's YAML schema, execution model, authoring patterns, and CLI commands.

Changes

New: `src/conductor/expert/` package

loader.py — loads and caches (~70KB) bundled reference docs via importlib.resources + lru_cache, wraps in <conductor_knowledge> tags
knowledge/ — bundled markdown reference docs (yaml-schema.md, authoring.md, execution.md)

Schema (`config/schema.py`)

AgentDef.conductor_expert: bool | None — tri-state: None = inherit workflow default, True = force enable, False = force disable. Forbidden on script/workflow/human_gate agents.
RuntimeConfig.conductor_expert: bool — workflow-wide default (False)

Executor (`executor/agent.py`)

New _build_prompt_prefix() helper shared by execute() and render_prompt()
New _should_inject_expert() — resolves tri-state agent flag vs workflow default

Engine (`engine/workflow.py`)

Passes conductor_expert_default to AgentExecutor in both single-provider and multi-provider paths

Tests & Docs

25 new tests in tests/test_expert/ covering loader, schema validation, and executor integration
Updated AGENTS.md with expert package documentation
Added examples/conductor-expert.yaml example workflow

Closes #180 (Phase 1)

brrusino · 2026-05-20T01:02:59Z

🤖 Multi-Agent PR Review #1

PR: #1 — feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)
Merge readiness: ⚠️ Needs minor fixes
Review confidence: 82/100 🟡 · reviewer agreement & finding strength — not a PR-quality score
Models: claude-opus-4.7 (Security & Correctness), gpt-5.5 (Robustness & Completeness)
Deliberation: 1 round, consensus reached

📋 Summary

Clean, well-scoped Phase 1 of the Conductor Expert feature. The tri-state opt-in resolution, shared prompt-prefix builder, and schema validation rejecting the flag on non-provider-backed agents are all implemented correctly, and test coverage hits the meaningful seams (loader caching, tag wrapping, tri-state, ordering, validator rejections). No security or correctness blockers — the knowledge content is static package data with no user-controlled paths.

The one item worth addressing before merge is that the bundled knowledge docs the feature is supposed to teach agents from are now stale: they don't document the new runtime.conductor_expert field or the per-agent override, so expert-enabled agents could flag valid workflows or omit the flag when generating examples. Remaining items (packaging robustness for the bundled .md files, error handling in the loader) are weak-consensus polish — non-blocking.

🔍 Consensus Findings

Severity	File	Finding	Recommendation
🟡 warning	`src/conductor/expert/knowledge/yaml-schema.md:29-38, 150-154`	Bundled knowledge docs don't document the new `workflow.runtime.conductor_expert` field or per-agent tri-state override. Expert-enabled agents may flag valid workflows as having unknown fields or omit the flag when generating examples.	Update `yaml-schema.md` and `authoring.md` (and `execution.md` if it covers prompt construction) to cover the runtime default, per-agent tri-state semantics, prompt ordering, and the provider-backed-only restriction.
🔵 suggestion	`src/conductor/expert/loader.py:2508-2517`	`load_expert_knowledge()` calls `resource.read_text(...)` with no error handling. A missing/corrupted doc surfaces as a raw `FileNotFoundError`/`UnicodeDecodeError` from inside agent execution, and because `lru_cache` doesn't cache exceptions, every expert-enabled agent will repeatedly re-attempt the failing read.	Wrap reads in try/except for `FileNotFoundError`/`OSError`/`UnicodeDecodeError` and raise a project-specific error with reinstall guidance.
🔵 suggestion	`pyproject.toml:64-68`	`src/conductor/expert/knowledge/*.md` is loaded via `importlib.resources`. Hatchling defaults include the files today, but no test exercises the loader against an installed wheel — a future build-config change could silently drop the docs.	Declare an explicit Hatchling `artifacts` rule for the knowledge `.md` files, or add a wheel-install smoke test.
🔵 suggestion	`src/conductor/config/schema.py:159-178`	Both reviewers converged: strict rejection of `conductor_expert` (including `false`) on `script`/`human_gate`/`workflow` agent types is the intended design — surfaces author confusion early and matches `reasoning`/`retry` handling.	No change required.

💡 Suggested fix for R2-001 (stale knowledge docs)

     max_agent_iterations: integer   # Max tool-use roundtrips per agent (1-500, optional)
     max_session_seconds: float      # Wall-clock timeout per agent session in seconds (optional)
     default_reasoning_effort: string # Workflow-wide reasoning/thinking effort: low, medium, high, xhigh (optional)
+    conductor_expert: boolean       # Inject bundled Conductor knowledge into provider-backed agents (default: false)
     mcp_servers:                    # MCP server configurations

💡 Suggested fix for R1-002 (loader error handling)

-        text = resource.read_text(encoding="utf-8").strip()
+        try:
+            text = resource.read_text(encoding="utf-8").strip()
+        except (FileNotFoundError, OSError, UnicodeDecodeError) as e:
+            raise RuntimeError(
+                f"Conductor Expert knowledge file '{name}' is missing or unreadable. "
+                "This usually indicates a broken install; try reinstalling conductor."
+            ) from e

💡 Suggested fix for R1-001 (explicit packaging artifacts)

 [tool.hatch.build.targets.wheel]
 packages = ["src/conductor"]
+artifacts = ["src/conductor/expert/knowledge/*.md"]
 exclude = [
     "src/conductor/web/frontend",
 ]

🔎 Unique / Disputed Findings

None — all findings reached consensus in round 1.

📊 Model Comparison

Aspect	Reviewer 1 (claude-opus-4.7)	Reviewer 2 (gpt-5.5)
Merge Readiness	✅ Ready to merge	⚠️ Needs minor fixes
Critical findings	0	0
Warnings	0	1
Suggestions	3	0
Total findings	3	1

🏁 Verdict

Reviewers agree on the substance: no blockers, one warning worth addressing (stale bundled knowledge docs), and a few polish suggestions. Updating the bundled yaml-schema.md / authoring.md to document the new conductor_expert field is the only change that meaningfully affects the feature's value, since expert-enabled agents will be reading those docs.

Merge readiness: ⚠️ Needs minor fixes — update bundled knowledge docs to cover the new conductor_expert field before merge; loader error handling and explicit packaging artifacts are non-blocking polish.
Review confidence: 82/100 🟡 · reviewer agreement & finding strength — not a PR-quality score

brrusino

Inline finding

codecov-commenter · 2026-05-20T01:44:13Z

Codecov Report

❌ Patch coverage is 90.80460% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@085b7a5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/conductor/skills/loader.py	86.04%	6 Missing ⚠️
src/conductor/skills/registry.py	86.84%	5 Missing ⚠️
src/conductor/executor/agent.py	89.47%	4 Missing ⚠️
src/conductor/config/schema.py	97.61%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #215   +/-   ##
=======================================
  Coverage        ?   88.38%           
=======================================
  Files           ?       66           
  Lines           ?    10722           
  Branches        ?        0           
=======================================
  Hits            ?     9477           
  Misses          ?     1245           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jrob5756

Approach discussion: should this be a general "skills" capability instead of a bespoke `conductor_expert`?

Before a line-by-line review, I want to raise the architectural framing, because I think there's a more extensible shape that also resolves a duplication problem in this PR. (Posting as a pending comment for discussion — not blocking yet.)

1. The Expert mechanism is sound, but the flagship example undersells it

Injecting Conductor knowledge into the instructions_preamble is the right call for the runtime gap #180 describes: agents running inside a workflow (e.g. the conductor watch #181 evaluator) are spawned by the provider with a system prompt and have no skill/plugin harness, so the existing Claude-Code skill genuinely can't reach them.

But the examples/conductor-expert.yaml reviewer is the one case where the existing skill is a real substitute — a human in an IDE with the conductor skill can already "review this workflow.yaml." The differentiator isn't the task, it's autonomous, in-workflow, no-human execution. A stronger example would be a self-contained generate→review→fix loop or a watch-style evaluator with no human in the loop.

2. The bundled `knowledge/` docs duplicate the skill docs — and have already diverged

src/conductor/expert/knowledge/{yaml-schema,authoring,execution}.md are copies of plugins/conductor/skills/conductor/references/*. In this very PR they're already out of sync:

doc	expert copy	skill copy	diverged?
execution.md	629	629	identical
authoring.md	910	1096	yes (skill edited here, bundle not)
yaml-schema.md	591	809	yes

This reintroduces the exact "stale / inconsistent" failure mode #180 set out to eliminate — on day one. Whatever we do, there should be a single source of truth, not two hand-maintained copies.

3. The more extensible framing: first-class "skills" in Conductor, with the conductor skill as one bundled opt-in skill

Rather than a bespoke conductor_expert boolean, generalize to a skills: capability (workflow- and agent-level, with per-agent enable/disable). The conductor skill then becomes one opt-in skill — which delivers the "Expert" outcome and eliminates the duplication (the skill dir is the single source). It also lets authors attach any skill (code-review, house style, domain rules), and composes with #135 / the Phase-3 MCP direction.

I verified the SDK reality, because it determines feasibility:

	GitHub Copilot SDK (primary provider)	Anthropic / Claude provider
Native skill support	Yes — `create_session(skill_directories=[...], disabled_skills=[...])`, confirmed in the installed `github-copilot-sdk` 0.3.0	Only via the code-execution tool + container + beta headers
Conductor's current path can use it?	Yes, trivially — slots into `session_kwargs` like `mcp_servers` already does	No — the Claude provider uses the raw Messages API with no container

So skill_directories is essentially free on Copilot (we could pass the existing plugins/conductor/skills/conductor/ dir natively). On Claude, native skills would require adopting Anthropic's server-side sandbox; the realistic path is to load the SKILL.md and inject it into the preamble — which is exactly the mechanism this PR already implements. So this PR's machinery becomes the Claude half of a general skills feature.

4. Provider-parity is the real thing to design around

Conductor's hard parity rule assumes identical mechanisms; skills break that (native on Copilot, manual injection on Claude). That's reconcilable — parity is about observable contract, not internals (MCP is already wired per-provider). Same contract ("enabled agent's model sees the skill; opt-in/out behaves identically in YAML"), different plumbing. Three things to settle:

Claude token strategy: eager-inject the whole SKILL.md on opt-in (simple, matches what the Copilot CLI does) vs. progressive disclosure via a tool (matches Anthropic's intent + Phase 3). Eager is fine for Phase 1.
Executable skill resources: Copilot skills can bundle scripts; on Claude-without-container they can't run. Docs-only skills (like this one) are fine, but a general feature should declare/gate executable skills.
Trust & path resolution: loading author-specified dirs is a prompt-injection surface (consider an allowlist); skill paths should resolve relative to the workflow file, not cwd.

Suggestion

Keep the tri-state opt-in and the preamble-injection machinery you've already built (it's good and it's the Claude half), but reframe conductor_expert → a generalized skills capability, make the conductor skill a bundled opt-in skill (no copied docs), and use native skill_directories on the Copilot path. Happy to sketch the concrete schema/provider design if useful.

Replaces the bespoke 'conductor_expert' opt-in knowledge base (microsoft#180) with a generalized 'skills' capability per Jason's review feedback on PR microsoft#215. Eliminates the duplicated docs in src/conductor/expert/ that were already diverging from the canonical files under plugins/conductor/skills/conductor/, and pivots Copilot onto the SDK's native skill_directories so it benefits from progressive disclosure. Schema changes: - AgentDef.conductor_expert: bool | None -> skills: list[str] | None (tri-state via list presence: omit = inherit, [] = opt-out, [name] = set) - RuntimeConfig.conductor_expert: bool -> skills: list[str] = [] - skills field rejected on script/workflow/human_gate/wait/set/terminate agent types (parity with the prior conductor_expert restrictions) - Field validators reject unknown skill names at workflow-load time New module conductor.skills: - registry.py — resolves built-in skill names (currently just 'conductor') to on-disk directories. Probes both editable-install and wheel-install layouts. - loader.py — reads SKILL.md + references/*.md for providers that lack native skill support, wraps in <skills><skill name='...'>...</skill> </skills>. Cached per-directory. Provider parity (same observable contract, different mechanism): - AgentProvider.supports_native_skills (default False); execute() gains skill_directories kwarg. - Copilot overrides supports_native_skills=True and forwards resolved directories on session_kwargs.skill_directories, so the SDK loads skills natively via SKILL.md frontmatter. - Claude accepts (and ignores) skill_directories; the executor has already eager-injected skill content into the rendered prompt. Executor wiring: - AgentExecutor takes workflow_skills= (replaces conductor_expert_default) - _resolve_skills_for_agent does the tri-state resolution - _build_prompt_prefix branches on provider.supports_native_skills to decide whether to eager-inject - execute() resolves skill dirs for native-skill providers and forwards them; TypeError fallback keeps legacy provider stubs working Bundling: - pyproject.toml uses hatchling force-include to ship plugins/conductor/skills/conductor/ alongside the conductor/ package in the wheel (registry probes both layouts) - src/conductor/expert/ deleted entirely (the canonical docs already live under plugins/conductor/skills/conductor/) Docs and example: - AGENTS.md, plugins/conductor/skills/conductor/references/{yaml-schema, authoring}.md rewritten for the skills model - examples/conductor-expert.yaml -> examples/skills-self-improving-workflow.yaml (generate -> review -> fix loop demonstrating the building blocks for the future 'conductor watch' work in microsoft#181) Tests: - tests/test_expert/ -> tests/test_skills/ with registry, loader, schema, and executor-integration coverage - tests/test_providers/test_copilot_skills.py covers the end-to-end plumbing of skill_directories into create_session - Integration assertions updated for the renamed RuntimeConfig field Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

brrusino · 2026-06-01T22:00:53Z

Agreed on all three — taking the full reframe in this PR rather than shipping the bespoke field and ripping it out later.

In scope

conductor_expert → skills: list[str] on AgentDef and RuntimeConfig. Tri-state via list presence: omitted = inherit, [] = explicit none, [conductor] = explicit set.
New built-in skill registry in src/conductor/skills/ resolves conductor to the existing plugins/conductor/skills/conductor/ dir, bundled via pyproject.toml artifacts. src/conductor/expert/knowledge/ gone — divergence solved.
Native skill_directories on the Copilot path, passed on create_session alongside mcp_servers.
Eager preamble injection stays as the Claude half.
Better example: replace examples/conductor-expert.yaml with examples/skills-self-improving-workflow.yaml — self-contained generate → review → fix loop, no human in the loop. Directly demonstrates the autonomous in-workflow value (and the Feature: conductor watch — convergence primitive for iterative fix-validate loops #181 enabler) the current reviewer example undersells.

Provider-parity nuance

The Copilot SDK has two skill modes: session-level skill_directories (discoverable, progressive disclosure) and CustomAgentConfig.skills (eager preload). Our Copilot path uses the default agent, not CustomAgentConfig, so the lowest-risk move is skill_directories only.

Parity contract becomes "the agent has access to the named skill" — same as MCP today, mechanism differs by provider. I think this is the right Phase 1 contract. Happy to switch to CustomAgentConfig for stricter eager-load parity if you'd prefer, but it's a bigger refactor.

Deferred to follow-ups

User-defined skill directories (path-based registration + trust/allowlist)
Executable skill resources
Progressive disclosure via MCP (Phase 3 from Feature: Conductor Expert — reusable, version-accurate knowledge base for Conductor-aware agents #180)
CustomAgentConfig-based eager preload on Copilot for stricter parity

If anything here is off conceptually, happy to revisit.

Adds a generalized skills capability so provider-backed agents can opt into bundled, reusable knowledge or capabilities via a single named list. Phase 1 ships one built-in skill — `conductor` — sourced from plugins/conductor/skills/conductor/ (the same canonical content the Copilot CLI plugin consumes; no duplication). New code - src/conductor/skills/ — registry (built-in name → directory), loader (SKILL.md + references/*.md → eager preamble), public surface - examples/skills-self-improving-workflow.yaml — end-to-end example - tests/test_skills/ — 51 tests covering registry, loader, schema field validation, and executor integration on both provider variants Schema - AgentDef.skills: list[str] | None — tri-state per-agent opt-in (omitted = inherit; [] = explicit none; [name, …] = explicit set). Forbidden on script / human_gate / workflow / wait / set / terminate. - RuntimeConfig.skills: list[str] — workflow-wide default for every provider-backed agent. - field_validator on both sites resolves every name through the registry so unknown skills fail at `conductor validate` time. Provider parity (the user-facing contract is identical — "the agent has access to the named skill" — but the mechanism differs): - Copilot: supports_native_skills=True. Executor forwards resolved directories on execute(skill_directories=…); the provider passes them through to session_kwargs["skill_directories"] for native progressive disclosure via SKILL.md frontmatter. - Claude: supports_native_skills=False (default). Executor reads SKILL.md + references/*.md and eager-injects them into the rendered prompt inside <skills><skill name="…">…</skill></skills>. The provider accepts the skill_directories kwarg for signature parity and immediately discards it (documented in the docstring). - Claude Agent SDK: same eager-inject path as Claude — the upstream claude-agent-sdk surfaces no skill kwarg today. If/when it does, flip supports_native_skills to True. ProviderCapabilities - Adds a skills: bool field to the descriptor (defaults to False so unaudited providers fail validation loudly rather than silently). - All three providers declare skills=True. - declared_limitations() lists "no skills support" for providers that opt out. Wheel packaging - pyproject.toml force-includes plugins/conductor/skills/conductor under the conductor namespace so the bundled skill ships with the installed wheel, not just the source checkout. The registry probes both layouts (editable install + wheel install) before falling back to a deterministic SkillNotFoundError. Closes microsoft#180 (Phase 1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add skill_directories param to _MockProvider.execute for LSP compliance - Assert agent.dialog is not None in _run_evaluator (guarded by caller) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

brrusino · 2026-06-08T20:54:54Z

@jrob5756 anything else you'd want to see with this one?

brrusino commented May 20, 2026

View reviewed changes

Comment thread src/conductor/expert/knowledge/yaml-schema.md Outdated

jrob5756 reviewed Jun 1, 2026

View reviewed changes

brrusino requested a review from jrob5756 June 1, 2026 22:32

jrob5756 force-pushed the feature/conductor-expert branch from 1462b6b to 42a49ee Compare June 5, 2026 13:29

fix(types): satisfy ty checks for _MockProvider and dialog_evaluator

7d0b668

- Add skill_directories param to _MockProvider.execute for LSP compliance - Assert agent.dialog is not None in _run_evaluator (guarded by caller) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215

feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215
brrusino wants to merge 2 commits into
microsoft:mainfrom
brrusino:feature/conductor-expert

brrusino commented May 20, 2026

Uh oh!

brrusino commented May 20, 2026

Uh oh!

brrusino left a comment

Uh oh!

Uh oh!

codecov-commenter commented May 20, 2026 •

edited

Loading

Uh oh!

jrob5756 left a comment

Uh oh!

brrusino commented Jun 1, 2026

Uh oh!

brrusino commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

brrusino commented May 20, 2026

Summary

Changes

New: src/conductor/expert/ package

Schema (config/schema.py)

Executor (executor/agent.py)

Engine (engine/workflow.py)

Tests & Docs

Uh oh!

brrusino commented May 20, 2026

🤖 Multi-Agent PR Review #1

📋 Summary

🔍 Consensus Findings

🔎 Unique / Disputed Findings

📊 Model Comparison

🏁 Verdict

Uh oh!

brrusino left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

Approach discussion: should this be a general "skills" capability instead of a bespoke conductor_expert?

1. The Expert mechanism is sound, but the flagship example undersells it

2. The bundled knowledge/ docs duplicate the skill docs — and have already diverged

3. The more extensible framing: first-class "skills" in Conductor, with the conductor skill as one bundled opt-in skill

4. Provider-parity is the real thing to design around

Suggestion

Uh oh!

brrusino commented Jun 1, 2026

In scope

Provider-parity nuance

Deferred to follow-ups

Uh oh!

brrusino commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New: `src/conductor/expert/` package

Schema (`config/schema.py`)

Executor (`executor/agent.py`)

Engine (`engine/workflow.py`)

codecov-commenter commented May 20, 2026 •

edited

Loading

Approach discussion: should this be a general "skills" capability instead of a bespoke `conductor_expert`?

2. The bundled `knowledge/` docs duplicate the skill docs — and have already diverged