Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .agents/verification.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This file expands [AGENTS.md](../AGENTS.md) for testing, manual UAT, CLI and bro

- GitHub Actions is the authoritative merge gate.
- The `CI` workflow runs build, typecheck, lint, tests, marketplace checks, docs link checks, and eval schema validation on pushes to `main`, pull requests to `main`, and manual dispatches.
- The CI build job publishes a short-lived, commit-addressed build artifact after `bun run build`. It is a reuse aid for workers and workflows only when the manifest's commit SHA, `bun.lock` hash, runner OS/architecture, Bun version source/value, and included output paths match the consuming checkout.
- The build artifact is intentionally limited to compiled outputs such as `packages/core/dist/**`, `packages/sdk/dist/**`, `apps/cli/dist/**`, `apps/dashboard/dist/**`, plus its manifest. It must not contain `node_modules`, Bun caches, `.turbo`, `.cache`, `.tsbuildinfo`, tracker state, evidence, or generated runtime artifacts.
- Run the same core checks locally when you need fast feedback:

```bash
Expand Down
8 changes: 8 additions & 0 deletions .agents/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ bd where
- If you discover you are on a stale base or have uncoordinated dirty files, stop and fix that before changing code.
- Whenever you `git checkout`, `gh pr checkout`, `git pull`, or otherwise switch to a ref that may have changed `package.json` or `bun.lock`, run `bun install` before building or testing.

## Research, Build, and Artifact Reuse

- Research-only workers should inspect files and repository history without running `bun install`, `bun run build`, tests, or evals unless the assigned research explicitly depends on that command. Record the justification when a research worker does run one of those commands.
- Do not copy mutable build outputs from `main` or another worktree as the default way to avoid local builds. Prefer the commit-addressed CI build artifact when one is available for the exact source state.
- A prebuilt CI artifact may be reused only when its manifest matches the consumer's commit SHA, `bun.lock` SHA-256, runner OS/architecture, expected Bun version source/value, and required output paths.
- Implementation workers must rebuild locally for every package whose source they changed. A prebuilt artifact is only a reuse input for untouched packages at the same commit, never a substitute for rebuilding touched source.
- CI build artifacts must contain only the compiled AgentV outputs and manifest. Do not include `node_modules`, Bun cache directories, `.turbo`, `.cache`, `.tsbuildinfo`, local evidence, tracker state, or generated runtime artifacts.

## Planning and Execution

- Use plan mode or an explicit task list for non-trivial work, roughly five or more steps or anything with architectural decisions.
Expand Down
86 changes: 86 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,92 @@ jobs:
- name: Build packages
run: bun run build

- name: Stage build artifact
env:
COMMIT_SHA: ${{ github.sha }}
RUNNER_ARCH_NAME: ${{ runner.arch }}
RUNNER_OS_NAME: ${{ runner.os }}
run: |
set -euo pipefail

artifact_dir="${RUNNER_TEMP}/agentv-build-artifact"
rm -rf "$artifact_dir"
mkdir -p "$artifact_dir"

included_paths=(
"packages/core/dist"
"packages/sdk/dist"
"apps/cli/dist"
"apps/dashboard/dist"
)

for path in "${included_paths[@]}"; do
if [ ! -d "$path" ]; then
echo "::error::Expected build output missing: $path"
exit 1
fi

mkdir -p "$artifact_dir/$(dirname "$path")"
cp -R "$path" "$artifact_dir/$path"
done

forbidden_path=$(find "$artifact_dir" \
\( -path "*/node_modules/*" \
-o -path "*/.bun/*" \
-o -path "*/.turbo/*" \
-o -path "*/.cache/*" \
-o -name "*.tsbuildinfo" \) \
-print -quit)

if [ -n "$forbidden_path" ]; then
echo "::error::Build artifact contains a dependency, cache, or tsbuildinfo path: $forbidden_path"
exit 1
fi

MANIFEST_PATH="$artifact_dir/manifest.json" \
COMMIT_SHA="$COMMIT_SHA" \
RUNNER_ARCH_NAME="$RUNNER_ARCH_NAME" \
RUNNER_OS_NAME="$RUNNER_OS_NAME" \
bun -e '
import { createHash } from "node:crypto";
import { Buffer } from "node:buffer";

const lockfile = await Bun.file("bun.lock").arrayBuffer();
const rootPackageJson = await Bun.file("package.json").json();
const includedPaths = [
"packages/core/dist/**",
"packages/sdk/dist/**",
"apps/cli/dist/**",
"apps/dashboard/dist/**",
];
const manifest = {
commit_sha: process.env.COMMIT_SHA,
bun_lock_sha256: createHash("sha256")
.update(Buffer.from(lockfile))
.digest("hex"),
runner_os: process.env.RUNNER_OS_NAME,
runner_arch: process.env.RUNNER_ARCH_NAME,
bun_version_source: "package.json#packageManager",
bun_version_spec: rootPackageJson.packageManager ?? null,
bun_version: Bun.version,
built_at: new Date().toISOString(),
included_paths: includedPaths,
};

await Bun.write(
process.env.MANIFEST_PATH,
`${JSON.stringify(manifest, null, 2)}\n`,
);
'

- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
name: agentv-build-${{ runner.os }}-${{ runner.arch }}-${{ github.sha }}
path: ${{ runner.temp }}/agentv-build-artifact/
if-no-files-found: error
retention-days: 10

typecheck:
name: Typecheck
runs-on: ubuntu-latest
Expand Down
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ Read the full rationale and examples in [.agents/product-boundary.md](.agents/pr
- For eval execution, experiments, repeat runs, providers, graders, or artifact-layout changes, dogfood with a live provider and a real LLM grader before marking ready. Mock graders, dry-run, and deterministic-only smoke tests are useful plumbing checks, but they are not live dogfood. Use canonical `.agentv/results/<experiment>/<timestamp>` output and publish private evidence. See [.agents/verification.md](.agents/verification.md).
- For browser or screenshot UAT, keep evidence out of the public repo and publish reviewable artifacts to an `agentv-private` evidence branch. See [.agents/verification.md](.agents/verification.md).
- When dogfood or review reveals a durable workflow lesson, capture it in this guide or the relevant `.agents/*.md` guide before merge; do not leave durable agent instructions only in PR comments, Bead comments, or private evidence. Use `docs/solutions/` for fuller reusable writeups.
- Research-only workers must not run `bun install`, `bun run build`, tests, or evals unless the assigned work explicitly needs that command and the worker records why.
- Prefer commit-addressed CI build artifacts over copying mutable main-tree build output. A prebuilt artifact is valid only when its manifest commit SHA, `bun.lock` hash, runner platform, Bun version expectation, and included output paths match the consuming checkout.
- Implementation workers must rebuild any package whose source they changed; never trust a prebuilt artifact for a touched package, and never publish `node_modules`, Bun caches, `.turbo`, `.cache`, or `.tsbuildinfo` as the build artifact.
- Wire formats are `snake_case`; internal TypeScript is `camelCase`. Translate only at the boundary.
- In AgentV, a `project` holds runs, traces, and experiments; a `benchmark` is a curated eval suite. Do not collapse those terms.
- `artifact_pointers` are an offload indirection for large detached payload bytes, such as transcript artifacts. Do not use them as the discovery path for ordinary per-case sidecars; expose those with explicit index/manifest path fields such as `metrics_path`.
Expand Down
Loading