IBM VIBE is a conversation-centric testing suite for evaluating AI agents through repeatable runs, inspectable transcripts, and clear execution history.
Use it to script realistic agent conversations, run them against agent configurations, and inspect the resulting sessions, jobs, token usage, similarity scores, and failures.
- Test conversations, not just prompts: model the multi-turn exchanges your users actually have.
- Make failures inspectable: review sessions and transcripts instead of treating an agent run as a black box.
- Compare agent versions: iterate on prompts, tools, and LLM settings with consistent evaluation inputs.
- Keep evaluation local and reproducible: run the maintained TypeScript stack with SQLite-backed storage.
- Node.js
18.17+(20is recommended; see.nvmrc) - npm
Optional:
- Python
3.10+only if you are working on the legacy CrewAI service - Ollama or another local LLM only if your selected agent path requires it
npm install
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env.local
cp agent-service-api/.env.example agent-service-api/.env
npm run devThis starts the three services used by the current conversation-first workflow:
| Service | Default URL | Role |
|---|---|---|
| Frontend | http://localhost:3000 |
UI for conversations, sessions, and analysis |
| Backend | http://localhost:5000 |
System API, storage, job orchestration |
| Agent Service API | http://localhost:5003 |
External API executor and backend job poller |
Open http://localhost:3000, then follow the first-run path:
- Add or confirm an LLM configuration.
- Create or choose an agent.
- Create a conversation script.
- Use Quick execute to enqueue a run.
- Inspect the job and resulting session transcript.
For more detail, see docs/quickstart.md and docs/product-tour.md.
VIBE's preferred workflow is conversation-first:
- Configure the LLM/API and agent version you want to evaluate.
- Choose or create the agent version that should handle the evaluation.
- Script one or more conversations with realistic user and assistant messages.
- Execute a conversation against an agent, which creates a queued job.
- Inspect the session transcript, intermediate outputs, token usage, timing, and scoring signals.
- Iterate on the agent configuration or conversation script and rerun.
Legacy test and suite flows still exist for compatibility, but new work should prefer conversations, sessions, and jobs.
The repository is an npm workspace monorepo:
| Workspace | Technology | Purpose |
|---|---|---|
frontend |
Next.js, TypeScript, Carbon | Web UI for evaluation workflows |
backend |
Express, TypeScript, SQLite | API, persistence, job orchestration |
agent-service-api |
Express, TypeScript | Polls backend jobs and executes external API agents |
packages/* |
TypeScript | Shared contracts, config, and utilities |
agent-service |
Python, FastAPI, CrewAI | Legacy CrewAI path; currently not the maintained stack |
Key integration path:
Frontend -> Backend -> Job queue -> Agent Service API -> Backend -> Sessions/transcripts
The Python agent-service is not started by npm run dev. Prefer backend + agent-service-api unless you are explicitly working on CrewAI integration.
Common commands from the repository root:
npm run dev
npm run format
npm run format:check
npm run lint
npm run typecheck
npm run test:tsEach workspace also exposes its own lint, typecheck, and test scripts if you want to run a single service in isolation.
For multi-instance local setups, use env.instance1.example as a template and create your own env.instance* files locally. Instance env files are intentionally gitignored.
docs/quickstart.md- first local run from a clean checkoutdocs/product-tour.md- how the main product concepts fit togetherdocs/README.md- full documentation indexCONTRIBUTING.md- contributor workflowSECURITY.md- private vulnerability reporting
The repo will read much better on GitHub with a small visual set under docs/assets/:
- Dashboard screenshot with first-run guidance visible
- Conversation editor screenshot
- Quick execute screenshot
- Session transcript screenshot
Add those images to this README once captured from a representative local instance.
Apache-2.0. See LICENSE.