Skip to content

feat(cpp): gaia-bash — native C++ bash coding agent with TUI, API server, MCP server#985

Open
kovtcharov-amd wants to merge 10 commits into
mainfrom
kalin/gaia-bash-agent
Open

feat(cpp): gaia-bash — native C++ bash coding agent with TUI, API server, MCP server#985
kovtcharov-amd wants to merge 10 commits into
mainfrom
kalin/gaia-bash-agent

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

Why this matters

Before: the GAIA C++ framework had an agent loop, LLM client, and tool registry — but no production CLI agent, no interactive TUI, no file I/O tools, no session persistence, and no way for external tools (Claude Code, OpenCode) to use GAIA agents.

After: gaia-bash is a fully functional native binary bash coding agent with five interfaces — interactive TUI, single-query CLI, pipe mode, REST API server, and MCP stdio server — plus a reusable C++ framework that any future agent can build on.

Verified: builds on Windows MSVC 2022, 431/435 tests pass (4 pre-existing WiFi test failures), MCP protocol tested end-to-end (tools/list, tools/call, prompts/list).

Threads

  • C++ framework upgrades (M1): ProcessRunner, FileIOTools, GitTools, ReplRunner (2-thread with Ctrl-C cancel), TuiConsole (FTXUI + markdown renderer), SessionStore, tool argument validation — all reusable by future C++ agents
  • gaia-bash agent (M2): BashAgent with bash_execute + env_inspect tools, bash-expert system prompt, CLI with argument parsing, slash commands (/run, /env)
  • Integration layer: REST API server (OpenAI-compatible /v1/chat/completions, /v1/tools) and MCP stdio server (JSON-RPC tools/list, tools/call, prompts/list) for Claude Code / OpenCode integration
  • Eval framework: 25 scenarios across 5 categories (script writing, review, tool usage, error handling, POSIX compliance) with ground truth and Python adapter

Test plan

  • cmake -B build && cmake --build build on Windows MSVC 2022 — compiles clean
  • tests_mock.exe — 431/435 pass (4 pre-existing WiFi failures)
  • gaia-bash --help — prints usage
  • echo '{"method":"initialize"}' | gaia-bash --mcp — MCP handshake works
  • echo '{"method":"tools/list"}' | gaia-bash --mcp — returns 10 tools with JSON Schema
  • echo '{"method":"tools/call","params":{"name":"bash_execute","arguments":{"command":"echo hello"}}}' | gaia-bash --mcp — executes command, returns stdout
  • Linux/macOS build (needs CI)
  • Interactive TUI mode (needs Lemonade Server + model)
  • API server /v1/chat/completions (needs Lemonade Server)
  • Eval scenario execution (needs Lemonade Server)

Ovtcharov added 4 commits May 6, 2026 11:27
…ls, REPL, TUI, sessions

Before: the C++ framework had an agent loop, LLM client, and tool registry but
lacked file I/O tools, process execution, interactive REPL, session persistence,
and a reactive TUI. Example agents used ad-hoc popen wrappers and blocking
getline loops.

After: six new reusable framework components that any C++ agent can plug into:
- ProcessRunner: cross-platform command execution with timeout, output capping
- FileIOTools: file_read, file_write, file_edit, file_search with security policies
- GitTools: read-only git status/diff/log/show with shell injection prevention
- SessionStore: JSON-based conversation persistence with save/load/resume
- ReplRunner: two-thread REPL with slash commands, Ctrl-C cancel, session auto-save
- TuiConsole: FTXUI-based reactive console with markdown rendering and streaming

Also adds: tool argument schema validation in ToolRegistry, agent cancel support
(requestCancel/isCancelled), history() accessor, FTXUI FetchContent in CMake.
…framework

Before: the C++ framework had reusable components (M1) but no production agent
binary. No way for external tools to interact with GAIA C++ agents.

After: complete gaia-bash coding agent with five interfaces:
- Interactive TUI (default): FTXUI fullscreen with markdown, streaming, slash cmds
- Single query: gaia-bash "write a backup script"
- REST API server (--serve): OpenAI-compatible /v1/chat/completions, /v1/tools
- MCP stdio server (--mcp): JSON-RPC for Claude Code / OpenCode integration
- Pipe mode (--print): stdout-friendly for CI/scripting

Agent tools: bash_execute (with shell detection), env_inspect, plus framework
tools (file_read/write/edit/search, git_status/diff/log/show).

Eval framework: 25 scenarios across 5 categories (script writing, review,
tool usage, error handling, POSIX compliance) with ground truth validation
and a Python adapter for the gaia eval harness.
… linking

Three build fixes found during first real MSVC compilation:

1. NOMINMAX: Windows min/max macros collide with std::min — define NOMINMAX
   before windows.h include in process.cpp.

2. Threaded pipe reading: the original sequential approach (read pipes then
   wait for process, or wait then read) either deadlocked on timeout tests
   or lost output on large-output tests. Fix: read stdout/stderr in
   std::thread workers concurrently with WaitForSingleObject.

3. FTXUI linking for tests: test_tui_console.cpp includes FTXUI headers but
   tests_mock only linked gaia_core (which has FTXUI as PRIVATE). Added
   explicit ftxui::component/dom/screen link to tests_mock when
   GAIA_BUILD_TUI is ON.

Result: 431/435 tests pass on Windows MSVC 2022. The 4 failures are
pre-existing WiFiToolsTest issues unrelated to this work.
The --serve and --mcp flags were stubs printing "not yet implemented".
Now they create real ApiServer and McpServer instances wired to a BashAgent.

MCP mode auto-allows all tool confirmations since the external agent
(Claude Code, OpenCode) handles safety decisions. Verified end-to-end:

  echo '{"jsonrpc":"2.0","id":1,"method":"tools/call",
    "params":{"name":"bash_execute",
    "arguments":{"command":"echo hello"}}}' | gaia-bash --mcp
  → {"stdout":"hello\n","exit_code":0}
@github-actions github-actions Bot added documentation Documentation changes cpp labels May 8, 2026
@itomek itomek assigned itomek and unassigned itomek May 8, 2026
@itomek itomek marked this pull request as ready for review May 8, 2026 21:27
The bash agent's system prompt and 10 tool descriptions need 32K context.
Without this, the first LLM call hit "context size exceeded" and had to retry.

- Set contextSize = 32768 in all three config creation points (interactive,
  serve, MCP modes) in main.cpp
- Add "bash" AgentProfile to AGENT_PROFILES in lemonade_client.py so
  gaia init knows the right context size for the bash agent
Ovtcharov added 3 commits May 20, 2026 15:56
1. bash_tools.cpp: output truncation now reserves space for the
   truncation message so total never exceeds MAX_OUTPUT_BYTES (32KB).

2. bash_eval_adapter.py: fixed success=True on HTTP errors (exception
   handlers now set success=False). Added missing validations for
   expected_tools, tool_args_must_contain, expect_error,
   expect_nonzero_exit, and expect_timeout ground truth fields.

3. bash_ground_truth.json: fixed bash-write-dedup expected_tools to
   include both file_write and bash_execute (matching the scenario).
WiFi tool tests were asserting handler-level error strings but the framework's
parameter validation now runs first, producing a different message format.
Updated tests to use HasSubstr("missing required parameter") matching.

FTXUI shared library: force FTXUI to build static even when BUILD_SHARED_LIBS=ON
since FTXUI doesn't export DLL symbols, causing LNK1181 on Windows.

Install test: disable TUI for the find_package round-trip since FetchContent'd
FTXUI targets can't be re-exported in the install tree.
@github-actions github-actions Bot added the devops DevOps/infrastructure changes label May 20, 2026
…bUI integration

gaia-bash needed a structured output mode for driving a TUI or WebUI frontend.
--json-events emits JSONL events to stdout (thought, goal, tool_call, answer, etc.)
so a parent process can render them. --query pairs with it for single-shot use.

- JsonEventOutputHandler: OutputHandler subclass that serializes agent events as
  one-JSON-object-per-line to an ostream (default stdout)
- structuredEvents config flag: emits parsed events even during streaming so the
  frontend gets both live tokens AND structured agent activity
- GTest::gmock added to test link (used by HasSubstr matchers in WiFi tool tests)
The `--json-events` answer event was missing token usage data, so the
TUI/WebUI had no visibility into how many tokens each query consumed.
Now the answer event includes a `usage` object with `prompt_tokens`,
`completion_tokens`, and `total_tokens` — accumulated across all LLM
calls in a multi-step query — so the frontend can render token
consumption directly from the event stream.

## Test plan
- [ ] `tests_mock --gtest_filter="JsonEventHandlerTest.*"` — all 23
tests pass (2 new: `FinalAnswerWithUsage`,
`FinalAnswerZeroUsageOmitted`)
- [ ] `gaia-bash.exe --json-events --query "what is 2+2?"` — verify
`answer` event includes `usage` when Lemonade returns it
- [ ] `gaia-bash.exe --json-events --query "hello"` — verify `usage` key
is omitted when server returns zero tokens (graceful degradation)

Closes #1205

Co-authored-by: Ovtcharov <kovtchar@amd.com>
Copy link
Copy Markdown
Collaborator

@itomek itomek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at a structural/triage level given the size and language (no C++ toolchain here to build). This is a well-isolated new subsystem: 47 of 52 files live under cpp/, it ships a design doc (docs/plans/bash-agent.mdx) and a CI workflow, has gtest coverage, and touches the existing Python package in only one place (lemonade_client.py, +7 lines). No collisions with the Python agents. Approving; deep line-level C++ review and a build/test run would be a good gate to add in CI before this becomes load-bearing.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp devops DevOps/infrastructure changes documentation Documentation changes llm LLM backend changes performance Performance-critical changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants