Skip to content

feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215

Open
brrusino wants to merge 2 commits into
microsoft:mainfrom
brrusino:feature/conductor-expert
Open

feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)#215
brrusino wants to merge 2 commits into
microsoft:mainfrom
brrusino:feature/conductor-expert

Conversation

@brrusino

Copy link
Copy Markdown
Contributor

Summary

Implements Phase 1 of issue #180: bundles the existing plugin reference docs as package data and wires them into the instructions pipeline with an opt-in flag. This gives agents deep understanding of Conductor's YAML schema, execution model, authoring patterns, and CLI commands.

Changes

New: src/conductor/expert/ package

  • loader.py — loads and caches (~70KB) bundled reference docs via importlib.resources + lru_cache, wraps in <conductor_knowledge> tags
  • knowledge/ — bundled markdown reference docs (yaml-schema.md, authoring.md, execution.md)

Schema (config/schema.py)

  • AgentDef.conductor_expert: bool | None — tri-state: None = inherit workflow default, True = force enable, False = force disable. Forbidden on script/workflow/human_gate agents.
  • RuntimeConfig.conductor_expert: bool — workflow-wide default (False)

Executor (executor/agent.py)

  • New _build_prompt_prefix() helper shared by execute() and render_prompt()
  • New _should_inject_expert() — resolves tri-state agent flag vs workflow default

Engine (engine/workflow.py)

  • Passes conductor_expert_default to AgentExecutor in both single-provider and multi-provider paths

Tests & Docs

  • 25 new tests in tests/test_expert/ covering loader, schema validation, and executor integration
  • Updated AGENTS.md with expert package documentation
  • Added examples/conductor-expert.yaml example workflow

Closes #180 (Phase 1)

@brrusino

Copy link
Copy Markdown
Contributor Author

🤖 Multi-Agent PR Review #1

PR: #1 — feat: Conductor Expert — opt-in knowledge base for Conductor-aware agents (#180)
Merge readiness: ⚠️ Needs minor fixes
Review confidence: 82/100 🟡 · reviewer agreement & finding strength — not a PR-quality score
Models: claude-opus-4.7 (Security & Correctness), gpt-5.5 (Robustness & Completeness)
Deliberation: 1 round, consensus reached


📋 Summary

Clean, well-scoped Phase 1 of the Conductor Expert feature. The tri-state opt-in resolution, shared prompt-prefix builder, and schema validation rejecting the flag on non-provider-backed agents are all implemented correctly, and test coverage hits the meaningful seams (loader caching, tag wrapping, tri-state, ordering, validator rejections). No security or correctness blockers — the knowledge content is static package data with no user-controlled paths.

The one item worth addressing before merge is that the bundled knowledge docs the feature is supposed to teach agents from are now stale: they don't document the new runtime.conductor_expert field or the per-agent override, so expert-enabled agents could flag valid workflows or omit the flag when generating examples. Remaining items (packaging robustness for the bundled .md files, error handling in the loader) are weak-consensus polish — non-blocking.

🔍 Consensus Findings

Severity File Finding Recommendation
🟡 warning src/conductor/expert/knowledge/yaml-schema.md:29-38, 150-154 Bundled knowledge docs don't document the new workflow.runtime.conductor_expert field or per-agent tri-state override. Expert-enabled agents may flag valid workflows as having unknown fields or omit the flag when generating examples. Update yaml-schema.md and authoring.md (and execution.md if it covers prompt construction) to cover the runtime default, per-agent tri-state semantics, prompt ordering, and the provider-backed-only restriction.
🔵 suggestion src/conductor/expert/loader.py:2508-2517 load_expert_knowledge() calls resource.read_text(...) with no error handling. A missing/corrupted doc surfaces as a raw FileNotFoundError/UnicodeDecodeError from inside agent execution, and because lru_cache doesn't cache exceptions, every expert-enabled agent will repeatedly re-attempt the failing read. Wrap reads in try/except for FileNotFoundError/OSError/UnicodeDecodeError and raise a project-specific error with reinstall guidance.
🔵 suggestion pyproject.toml:64-68 src/conductor/expert/knowledge/*.md is loaded via importlib.resources. Hatchling defaults include the files today, but no test exercises the loader against an installed wheel — a future build-config change could silently drop the docs. Declare an explicit Hatchling artifacts rule for the knowledge .md files, or add a wheel-install smoke test.
🔵 suggestion src/conductor/config/schema.py:159-178 Both reviewers converged: strict rejection of conductor_expert (including false) on script/human_gate/workflow agent types is the intended design — surfaces author confusion early and matches reasoning/retry handling. No change required.
💡 Suggested fix for R2-001 (stale knowledge docs)
     max_agent_iterations: integer   # Max tool-use roundtrips per agent (1-500, optional)
     max_session_seconds: float      # Wall-clock timeout per agent session in seconds (optional)
     default_reasoning_effort: string # Workflow-wide reasoning/thinking effort: low, medium, high, xhigh (optional)
+    conductor_expert: boolean       # Inject bundled Conductor knowledge into provider-backed agents (default: false)
     mcp_servers:                    # MCP server configurations
💡 Suggested fix for R1-002 (loader error handling)
-        text = resource.read_text(encoding="utf-8").strip()
+        try:
+            text = resource.read_text(encoding="utf-8").strip()
+        except (FileNotFoundError, OSError, UnicodeDecodeError) as e:
+            raise RuntimeError(
+                f"Conductor Expert knowledge file '{name}' is missing or unreadable. "
+                "This usually indicates a broken install; try reinstalling conductor."
+            ) from e
💡 Suggested fix for R1-001 (explicit packaging artifacts)
 [tool.hatch.build.targets.wheel]
 packages = ["src/conductor"]
+artifacts = ["src/conductor/expert/knowledge/*.md"]
 exclude = [
     "src/conductor/web/frontend",
 ]

🔎 Unique / Disputed Findings

None — all findings reached consensus in round 1.

📊 Model Comparison

Aspect Reviewer 1 (claude-opus-4.7) Reviewer 2 (gpt-5.5)
Merge Readiness ✅ Ready to merge ⚠️ Needs minor fixes
Critical findings 0 0
Warnings 0 1
Suggestions 3 0
Total findings 3 1

🏁 Verdict

Reviewers agree on the substance: no blockers, one warning worth addressing (stale bundled knowledge docs), and a few polish suggestions. Updating the bundled yaml-schema.md / authoring.md to document the new conductor_expert field is the only change that meaningfully affects the feature's value, since expert-enabled agents will be reading those docs.

Merge readiness: ⚠️ Needs minor fixes — update bundled knowledge docs to cover the new conductor_expert field before merge; loader error handling and explicit packaging artifacts are non-blocking polish.
Review confidence: 82/100 🟡 · reviewer agreement & finding strength — not a PR-quality score

@brrusino brrusino left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline finding

Comment thread src/conductor/expert/knowledge/yaml-schema.md Outdated
@codecov-commenter

codecov-commenter commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.80460% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@085b7a5). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/conductor/skills/loader.py 86.04% 6 Missing ⚠️
src/conductor/skills/registry.py 86.84% 5 Missing ⚠️
src/conductor/executor/agent.py 89.47% 4 Missing ⚠️
src/conductor/config/schema.py 97.61% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #215   +/-   ##
=======================================
  Coverage        ?   88.38%           
=======================================
  Files           ?       66           
  Lines           ?    10722           
  Branches        ?        0           
=======================================
  Hits            ?     9477           
  Misses          ?     1245           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jrob5756 jrob5756 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach discussion: should this be a general "skills" capability instead of a bespoke conductor_expert?

Before a line-by-line review, I want to raise the architectural framing, because I think there's a more extensible shape that also resolves a duplication problem in this PR. (Posting as a pending comment for discussion — not blocking yet.)

1. The Expert mechanism is sound, but the flagship example undersells it

Injecting Conductor knowledge into the instructions_preamble is the right call for the runtime gap #180 describes: agents running inside a workflow (e.g. the conductor watch #181 evaluator) are spawned by the provider with a system prompt and have no skill/plugin harness, so the existing Claude-Code skill genuinely can't reach them.

But the examples/conductor-expert.yaml reviewer is the one case where the existing skill is a real substitute — a human in an IDE with the conductor skill can already "review this workflow.yaml." The differentiator isn't the task, it's autonomous, in-workflow, no-human execution. A stronger example would be a self-contained generate→review→fix loop or a watch-style evaluator with no human in the loop.

2. The bundled knowledge/ docs duplicate the skill docs — and have already diverged

src/conductor/expert/knowledge/{yaml-schema,authoring,execution}.md are copies of plugins/conductor/skills/conductor/references/*. In this very PR they're already out of sync:

doc expert copy skill copy diverged?
execution.md 629 629 identical
authoring.md 910 1096 yes (skill edited here, bundle not)
yaml-schema.md 591 809 yes

This reintroduces the exact "stale / inconsistent" failure mode #180 set out to eliminate — on day one. Whatever we do, there should be a single source of truth, not two hand-maintained copies.

3. The more extensible framing: first-class "skills" in Conductor, with the conductor skill as one bundled opt-in skill

Rather than a bespoke conductor_expert boolean, generalize to a skills: capability (workflow- and agent-level, with per-agent enable/disable). The conductor skill then becomes one opt-in skill — which delivers the "Expert" outcome and eliminates the duplication (the skill dir is the single source). It also lets authors attach any skill (code-review, house style, domain rules), and composes with #135 / the Phase-3 MCP direction.

I verified the SDK reality, because it determines feasibility:

GitHub Copilot SDK (primary provider) Anthropic / Claude provider
Native skill support Yescreate_session(skill_directories=[...], disabled_skills=[...]), confirmed in the installed github-copilot-sdk 0.3.0 Only via the code-execution tool + container + beta headers
Conductor's current path can use it? Yes, trivially — slots into session_kwargs like mcp_servers already does No — the Claude provider uses the raw Messages API with no container

So skill_directories is essentially free on Copilot (we could pass the existing plugins/conductor/skills/conductor/ dir natively). On Claude, native skills would require adopting Anthropic's server-side sandbox; the realistic path is to load the SKILL.md and inject it into the preamble — which is exactly the mechanism this PR already implements. So this PR's machinery becomes the Claude half of a general skills feature.

4. Provider-parity is the real thing to design around

Conductor's hard parity rule assumes identical mechanisms; skills break that (native on Copilot, manual injection on Claude). That's reconcilable — parity is about observable contract, not internals (MCP is already wired per-provider). Same contract ("enabled agent's model sees the skill; opt-in/out behaves identically in YAML"), different plumbing. Three things to settle:

  • Claude token strategy: eager-inject the whole SKILL.md on opt-in (simple, matches what the Copilot CLI does) vs. progressive disclosure via a tool (matches Anthropic's intent + Phase 3). Eager is fine for Phase 1.
  • Executable skill resources: Copilot skills can bundle scripts; on Claude-without-container they can't run. Docs-only skills (like this one) are fine, but a general feature should declare/gate executable skills.
  • Trust & path resolution: loading author-specified dirs is a prompt-injection surface (consider an allowlist); skill paths should resolve relative to the workflow file, not cwd.

Suggestion

Keep the tri-state opt-in and the preamble-injection machinery you've already built (it's good and it's the Claude half), but reframe conductor_expert → a generalized skills capability, make the conductor skill a bundled opt-in skill (no copied docs), and use native skill_directories on the Copilot path. Happy to sketch the concrete schema/provider design if useful.

brrusino added a commit to brrusino/conductor that referenced this pull request Jun 1, 2026
Replaces the bespoke 'conductor_expert' opt-in knowledge base (microsoft#180)
with a generalized 'skills' capability per Jason's review feedback on
PR microsoft#215. Eliminates the duplicated docs in src/conductor/expert/ that
were already diverging from the canonical files under
plugins/conductor/skills/conductor/, and pivots Copilot onto the SDK's
native skill_directories so it benefits from progressive disclosure.

Schema changes:
- AgentDef.conductor_expert: bool | None  ->  skills: list[str] | None
  (tri-state via list presence: omit = inherit, [] = opt-out, [name] = set)
- RuntimeConfig.conductor_expert: bool    ->  skills: list[str] = []
- skills field rejected on script/workflow/human_gate/wait/set/terminate
  agent types (parity with the prior conductor_expert restrictions)
- Field validators reject unknown skill names at workflow-load time

New module conductor.skills:
- registry.py — resolves built-in skill names (currently just 'conductor')
  to on-disk directories. Probes both editable-install and wheel-install
  layouts.
- loader.py — reads SKILL.md + references/*.md for providers that lack
  native skill support, wraps in <skills><skill name='...'>...</skill>
  </skills>. Cached per-directory.

Provider parity (same observable contract, different mechanism):
- AgentProvider.supports_native_skills (default False); execute() gains
  skill_directories kwarg.
- Copilot overrides supports_native_skills=True and forwards resolved
  directories on session_kwargs.skill_directories, so the SDK loads
  skills natively via SKILL.md frontmatter.
- Claude accepts (and ignores) skill_directories; the executor has
  already eager-injected skill content into the rendered prompt.

Executor wiring:
- AgentExecutor takes workflow_skills= (replaces conductor_expert_default)
- _resolve_skills_for_agent does the tri-state resolution
- _build_prompt_prefix branches on provider.supports_native_skills to
  decide whether to eager-inject
- execute() resolves skill dirs for native-skill providers and forwards
  them; TypeError fallback keeps legacy provider stubs working

Bundling:
- pyproject.toml uses hatchling force-include to ship
  plugins/conductor/skills/conductor/ alongside the conductor/ package
  in the wheel (registry probes both layouts)
- src/conductor/expert/ deleted entirely (the canonical docs already
  live under plugins/conductor/skills/conductor/)

Docs and example:
- AGENTS.md, plugins/conductor/skills/conductor/references/{yaml-schema,
  authoring}.md rewritten for the skills model
- examples/conductor-expert.yaml -> examples/skills-self-improving-workflow.yaml
  (generate -> review -> fix loop demonstrating the building blocks
  for the future 'conductor watch' work in microsoft#181)

Tests:
- tests/test_expert/ -> tests/test_skills/ with registry, loader,
  schema, and executor-integration coverage
- tests/test_providers/test_copilot_skills.py covers the end-to-end
  plumbing of skill_directories into create_session
- Integration assertions updated for the renamed RuntimeConfig field

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brrusino

brrusino commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Agreed on all three — taking the full reframe in this PR rather than shipping the bespoke field and ripping it out later.

In scope

  1. conductor_expertskills: list[str] on AgentDef and RuntimeConfig. Tri-state via list presence: omitted = inherit, [] = explicit none, [conductor] = explicit set.
  2. New built-in skill registry in src/conductor/skills/ resolves conductor to the existing plugins/conductor/skills/conductor/ dir, bundled via pyproject.toml artifacts. src/conductor/expert/knowledge/ gone — divergence solved.
  3. Native skill_directories on the Copilot path, passed on create_session alongside mcp_servers.
  4. Eager preamble injection stays as the Claude half.
  5. Better example: replace examples/conductor-expert.yaml with examples/skills-self-improving-workflow.yaml — self-contained generate → review → fix loop, no human in the loop. Directly demonstrates the autonomous in-workflow value (and the Feature: conductor watch — convergence primitive for iterative fix-validate loops #181 enabler) the current reviewer example undersells.

Provider-parity nuance

The Copilot SDK has two skill modes: session-level skill_directories (discoverable, progressive disclosure) and CustomAgentConfig.skills (eager preload). Our Copilot path uses the default agent, not CustomAgentConfig, so the lowest-risk move is skill_directories only.

Parity contract becomes "the agent has access to the named skill" — same as MCP today, mechanism differs by provider. I think this is the right Phase 1 contract. Happy to switch to CustomAgentConfig for stricter eager-load parity if you'd prefer, but it's a bigger refactor.

Deferred to follow-ups

If anything here is off conceptually, happy to revisit.

@brrusino brrusino requested a review from jrob5756 June 1, 2026 22:32
Adds a generalized skills capability so provider-backed agents can opt
into bundled, reusable knowledge or capabilities via a single named
list. Phase 1 ships one built-in skill — `conductor` — sourced from
plugins/conductor/skills/conductor/ (the same canonical content the
Copilot CLI plugin consumes; no duplication).

New code

- src/conductor/skills/ — registry (built-in name → directory),
  loader (SKILL.md + references/*.md → eager preamble), public surface
- examples/skills-self-improving-workflow.yaml — end-to-end example
- tests/test_skills/ — 51 tests covering registry, loader, schema
  field validation, and executor integration on both provider variants

Schema

- AgentDef.skills: list[str] | None — tri-state per-agent opt-in
  (omitted = inherit; [] = explicit none; [name, …] = explicit set).
  Forbidden on script / human_gate / workflow / wait / set / terminate.
- RuntimeConfig.skills: list[str] — workflow-wide default for every
  provider-backed agent.
- field_validator on both sites resolves every name through the
  registry so unknown skills fail at `conductor validate` time.

Provider parity (the user-facing contract is identical — "the agent
has access to the named skill" — but the mechanism differs):

- Copilot: supports_native_skills=True. Executor forwards resolved
  directories on execute(skill_directories=…); the provider passes
  them through to session_kwargs["skill_directories"] for native
  progressive disclosure via SKILL.md frontmatter.
- Claude: supports_native_skills=False (default). Executor reads
  SKILL.md + references/*.md and eager-injects them into the rendered
  prompt inside <skills><skill name="…">…</skill></skills>. The
  provider accepts the skill_directories kwarg for signature parity
  and immediately discards it (documented in the docstring).
- Claude Agent SDK: same eager-inject path as Claude — the upstream
  claude-agent-sdk surfaces no skill kwarg today. If/when it does,
  flip supports_native_skills to True.

ProviderCapabilities

- Adds a skills: bool field to the descriptor (defaults to False so
  unaudited providers fail validation loudly rather than silently).
- All three providers declare skills=True.
- declared_limitations() lists "no skills support" for providers
  that opt out.

Wheel packaging

- pyproject.toml force-includes plugins/conductor/skills/conductor
  under the conductor namespace so the bundled skill ships with the
  installed wheel, not just the source checkout. The registry probes
  both layouts (editable install + wheel install) before falling back
  to a deterministic SkillNotFoundError.

Closes microsoft#180 (Phase 1).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jrob5756 jrob5756 force-pushed the feature/conductor-expert branch from 1462b6b to 42a49ee Compare June 5, 2026 13:29
- Add skill_directories param to _MockProvider.execute for LSP compliance
- Assert agent.dialog is not None in _run_evaluator (guarded by caller)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brrusino

brrusino commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@jrob5756 anything else you'd want to see with this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Conductor Expert — reusable, version-accurate knowledge base for Conductor-aware agents

3 participants