feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215
feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215brrusino wants to merge 2 commits into
Conversation
🤖 Multi-Agent PR Review #1PR: #1 — feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180) 📋 SummaryClean, well-scoped Phase 1 of the Conductor Expert feature. The tri-state opt-in resolution, shared prompt-prefix builder, and schema validation rejecting the flag on non-provider-backed agents are all implemented correctly, and test coverage hits the meaningful seams (loader caching, tag wrapping, tri-state, ordering, validator rejections). No security or correctness blockers — the knowledge content is static package data with no user-controlled paths. The one item worth addressing before merge is that the bundled knowledge docs the feature is supposed to teach agents from are now stale: they don't document the new 🔍 Consensus Findings
💡 Suggested fix for R2-001 (stale knowledge docs) max_agent_iterations: integer # Max tool-use roundtrips per agent (1-500, optional)
max_session_seconds: float # Wall-clock timeout per agent session in seconds (optional)
default_reasoning_effort: string # Workflow-wide reasoning/thinking effort: low, medium, high, xhigh (optional)
+ conductor_expert: boolean # Inject bundled Conductor knowledge into provider-backed agents (default: false)
mcp_servers: # MCP server configurations💡 Suggested fix for R1-002 (loader error handling)- text = resource.read_text(encoding="utf-8").strip()
+ try:
+ text = resource.read_text(encoding="utf-8").strip()
+ except (FileNotFoundError, OSError, UnicodeDecodeError) as e:
+ raise RuntimeError(
+ f"Conductor Expert knowledge file '{name}' is missing or unreadable. "
+ "This usually indicates a broken install; try reinstalling conductor."
+ ) from e💡 Suggested fix for R1-001 (explicit packaging artifacts) [tool.hatch.build.targets.wheel]
packages = ["src/conductor"]
+artifacts = ["src/conductor/expert/knowledge/*.md"]
exclude = [
"src/conductor/web/frontend",
]🔎 Unique / Disputed FindingsNone — all findings reached consensus in round 1. 📊 Model Comparison
🏁 VerdictReviewers agree on the substance: no blockers, one warning worth addressing (stale bundled knowledge docs), and a few polish suggestions. Updating the bundled Merge readiness: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #215 +/- ##
=======================================
Coverage ? 88.38%
=======================================
Files ? 66
Lines ? 10722
Branches ? 0
=======================================
Hits ? 9477
Misses ? 1245
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jrob5756
left a comment
There was a problem hiding this comment.
Approach discussion: should this be a general "skills" capability instead of a bespoke conductor_expert?
Before a line-by-line review, I want to raise the architectural framing, because I think there's a more extensible shape that also resolves a duplication problem in this PR. (Posting as a pending comment for discussion — not blocking yet.)
1. The Expert mechanism is sound, but the flagship example undersells it
Injecting Conductor knowledge into the instructions_preamble is the right call for the runtime gap #180 describes: agents running inside a workflow (e.g. the conductor watch #181 evaluator) are spawned by the provider with a system prompt and have no skill/plugin harness, so the existing Claude-Code skill genuinely can't reach them.
But the examples/conductor-expert.yaml reviewer is the one case where the existing skill is a real substitute — a human in an IDE with the conductor skill can already "review this workflow.yaml." The differentiator isn't the task, it's autonomous, in-workflow, no-human execution. A stronger example would be a self-contained generate→review→fix loop or a watch-style evaluator with no human in the loop.
2. The bundled knowledge/ docs duplicate the skill docs — and have already diverged
src/conductor/expert/knowledge/{yaml-schema,authoring,execution}.md are copies of plugins/conductor/skills/conductor/references/*. In this very PR they're already out of sync:
| doc | expert copy | skill copy | diverged? |
|---|---|---|---|
| execution.md | 629 | 629 | identical |
| authoring.md | 910 | 1096 | yes (skill edited here, bundle not) |
| yaml-schema.md | 591 | 809 | yes |
This reintroduces the exact "stale / inconsistent" failure mode #180 set out to eliminate — on day one. Whatever we do, there should be a single source of truth, not two hand-maintained copies.
3. The more extensible framing: first-class "skills" in Conductor, with the conductor skill as one bundled opt-in skill
Rather than a bespoke conductor_expert boolean, generalize to a skills: capability (workflow- and agent-level, with per-agent enable/disable). The conductor skill then becomes one opt-in skill — which delivers the "Expert" outcome and eliminates the duplication (the skill dir is the single source). It also lets authors attach any skill (code-review, house style, domain rules), and composes with #135 / the Phase-3 MCP direction.
I verified the SDK reality, because it determines feasibility:
| GitHub Copilot SDK (primary provider) | Anthropic / Claude provider | |
|---|---|---|
| Native skill support | Yes — create_session(skill_directories=[...], disabled_skills=[...]), confirmed in the installed github-copilot-sdk 0.3.0 |
Only via the code-execution tool + container + beta headers |
| Conductor's current path can use it? | Yes, trivially — slots into session_kwargs like mcp_servers already does |
No — the Claude provider uses the raw Messages API with no container |
So skill_directories is essentially free on Copilot (we could pass the existing plugins/conductor/skills/conductor/ dir natively). On Claude, native skills would require adopting Anthropic's server-side sandbox; the realistic path is to load the SKILL.md and inject it into the preamble — which is exactly the mechanism this PR already implements. So this PR's machinery becomes the Claude half of a general skills feature.
4. Provider-parity is the real thing to design around
Conductor's hard parity rule assumes identical mechanisms; skills break that (native on Copilot, manual injection on Claude). That's reconcilable — parity is about observable contract, not internals (MCP is already wired per-provider). Same contract ("enabled agent's model sees the skill; opt-in/out behaves identically in YAML"), different plumbing. Three things to settle:
- Claude token strategy: eager-inject the whole
SKILL.mdon opt-in (simple, matches what the Copilot CLI does) vs. progressive disclosure via a tool (matches Anthropic's intent + Phase 3). Eager is fine for Phase 1. - Executable skill resources: Copilot skills can bundle scripts; on Claude-without-container they can't run. Docs-only skills (like this one) are fine, but a general feature should declare/gate executable skills.
- Trust & path resolution: loading author-specified dirs is a prompt-injection surface (consider an allowlist); skill paths should resolve relative to the workflow file, not cwd.
Suggestion
Keep the tri-state opt-in and the preamble-injection machinery you've already built (it's good and it's the Claude half), but reframe conductor_expert → a generalized skills capability, make the conductor skill a bundled opt-in skill (no copied docs), and use native skill_directories on the Copilot path. Happy to sketch the concrete schema/provider design if useful.
Replaces the bespoke 'conductor_expert' opt-in knowledge base (microsoft#180) with a generalized 'skills' capability per Jason's review feedback on PR microsoft#215. Eliminates the duplicated docs in src/conductor/expert/ that were already diverging from the canonical files under plugins/conductor/skills/conductor/, and pivots Copilot onto the SDK's native skill_directories so it benefits from progressive disclosure. Schema changes: - AgentDef.conductor_expert: bool | None -> skills: list[str] | None (tri-state via list presence: omit = inherit, [] = opt-out, [name] = set) - RuntimeConfig.conductor_expert: bool -> skills: list[str] = [] - skills field rejected on script/workflow/human_gate/wait/set/terminate agent types (parity with the prior conductor_expert restrictions) - Field validators reject unknown skill names at workflow-load time New module conductor.skills: - registry.py — resolves built-in skill names (currently just 'conductor') to on-disk directories. Probes both editable-install and wheel-install layouts. - loader.py — reads SKILL.md + references/*.md for providers that lack native skill support, wraps in <skills><skill name='...'>...</skill> </skills>. Cached per-directory. Provider parity (same observable contract, different mechanism): - AgentProvider.supports_native_skills (default False); execute() gains skill_directories kwarg. - Copilot overrides supports_native_skills=True and forwards resolved directories on session_kwargs.skill_directories, so the SDK loads skills natively via SKILL.md frontmatter. - Claude accepts (and ignores) skill_directories; the executor has already eager-injected skill content into the rendered prompt. Executor wiring: - AgentExecutor takes workflow_skills= (replaces conductor_expert_default) - _resolve_skills_for_agent does the tri-state resolution - _build_prompt_prefix branches on provider.supports_native_skills to decide whether to eager-inject - execute() resolves skill dirs for native-skill providers and forwards them; TypeError fallback keeps legacy provider stubs working Bundling: - pyproject.toml uses hatchling force-include to ship plugins/conductor/skills/conductor/ alongside the conductor/ package in the wheel (registry probes both layouts) - src/conductor/expert/ deleted entirely (the canonical docs already live under plugins/conductor/skills/conductor/) Docs and example: - AGENTS.md, plugins/conductor/skills/conductor/references/{yaml-schema, authoring}.md rewritten for the skills model - examples/conductor-expert.yaml -> examples/skills-self-improving-workflow.yaml (generate -> review -> fix loop demonstrating the building blocks for the future 'conductor watch' work in microsoft#181) Tests: - tests/test_expert/ -> tests/test_skills/ with registry, loader, schema, and executor-integration coverage - tests/test_providers/test_copilot_skills.py covers the end-to-end plumbing of skill_directories into create_session - Integration assertions updated for the renamed RuntimeConfig field Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Agreed on all three — taking the full reframe in this PR rather than shipping the bespoke field and ripping it out later. In scope
Provider-parity nuanceThe Copilot SDK has two skill modes: session-level Parity contract becomes "the agent has access to the named skill" — same as MCP today, mechanism differs by provider. I think this is the right Phase 1 contract. Happy to switch to Deferred to follow-ups
If anything here is off conceptually, happy to revisit. |
Adds a generalized skills capability so provider-backed agents can opt into bundled, reusable knowledge or capabilities via a single named list. Phase 1 ships one built-in skill — `conductor` — sourced from plugins/conductor/skills/conductor/ (the same canonical content the Copilot CLI plugin consumes; no duplication). New code - src/conductor/skills/ — registry (built-in name → directory), loader (SKILL.md + references/*.md → eager preamble), public surface - examples/skills-self-improving-workflow.yaml — end-to-end example - tests/test_skills/ — 51 tests covering registry, loader, schema field validation, and executor integration on both provider variants Schema - AgentDef.skills: list[str] | None — tri-state per-agent opt-in (omitted = inherit; [] = explicit none; [name, …] = explicit set). Forbidden on script / human_gate / workflow / wait / set / terminate. - RuntimeConfig.skills: list[str] — workflow-wide default for every provider-backed agent. - field_validator on both sites resolves every name through the registry so unknown skills fail at `conductor validate` time. Provider parity (the user-facing contract is identical — "the agent has access to the named skill" — but the mechanism differs): - Copilot: supports_native_skills=True. Executor forwards resolved directories on execute(skill_directories=…); the provider passes them through to session_kwargs["skill_directories"] for native progressive disclosure via SKILL.md frontmatter. - Claude: supports_native_skills=False (default). Executor reads SKILL.md + references/*.md and eager-injects them into the rendered prompt inside <skills><skill name="…">…</skill></skills>. The provider accepts the skill_directories kwarg for signature parity and immediately discards it (documented in the docstring). - Claude Agent SDK: same eager-inject path as Claude — the upstream claude-agent-sdk surfaces no skill kwarg today. If/when it does, flip supports_native_skills to True. ProviderCapabilities - Adds a skills: bool field to the descriptor (defaults to False so unaudited providers fail validation loudly rather than silently). - All three providers declare skills=True. - declared_limitations() lists "no skills support" for providers that opt out. Wheel packaging - pyproject.toml force-includes plugins/conductor/skills/conductor under the conductor namespace so the bundled skill ships with the installed wheel, not just the source checkout. The registry probes both layouts (editable install + wheel install) before falling back to a deterministic SkillNotFoundError. Closes microsoft#180 (Phase 1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1462b6b to
42a49ee
Compare
- Add skill_directories param to _MockProvider.execute for LSP compliance - Assert agent.dialog is not None in _run_evaluator (guarded by caller) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@jrob5756 anything else you'd want to see with this one? |
Summary
Implements Phase 1 of issue #180: bundles the existing plugin reference docs as package data and wires them into the instructions pipeline with an opt-in flag. This gives agents deep understanding of Conductor's YAML schema, execution model, authoring patterns, and CLI commands.
Changes
New:
src/conductor/expert/packageloader.py— loads and caches (~70KB) bundled reference docs viaimportlib.resources+lru_cache, wraps in<conductor_knowledge>tagsknowledge/— bundled markdown reference docs (yaml-schema.md, authoring.md, execution.md)Schema (
config/schema.py)AgentDef.conductor_expert: bool | None— tri-state: None = inherit workflow default, True = force enable, False = force disable. Forbidden on script/workflow/human_gate agents.RuntimeConfig.conductor_expert: bool— workflow-wide default (False)Executor (
executor/agent.py)_build_prompt_prefix()helper shared byexecute()andrender_prompt()_should_inject_expert()— resolves tri-state agent flag vs workflow defaultEngine (
engine/workflow.py)conductor_expert_defaulttoAgentExecutorin both single-provider and multi-provider pathsTests & Docs
tests/test_expert/covering loader, schema validation, and executor integrationAGENTS.mdwith expert package documentationexamples/conductor-expert.yamlexample workflowCloses #180 (Phase 1)