From cece89b89f336f72ec7cc8be7de9b820bafc7c37 Mon Sep 17 00:00:00 2001 From: Simrandeep Singh Date: Wed, 17 Jun 2026 12:04:22 -0700 Subject: [PATCH] Update the Verify skill authoring guide Expands the skill how-to and preview YAML reference: where skill files live (.aviator/verify/skills/, one skill per file; verify_skill_files to point a preview at files elsewhere), referencing credentials via {{ secrets. }} placeholders, declaring what the preview can exercise, and an optional frontmatter description. Retitles the page 'Writing a Verify skill'. --- SUMMARY.md | 2 +- verify/how-to-guides/writing-a-skill-md.md | 168 +++++++++++++-------- verify/reference/preview-yaml.md | 11 ++ 3 files changed, 114 insertions(+), 67 deletions(-) diff --git a/SUMMARY.md b/SUMMARY.md index 642f8ef..fcf9ebd 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -71,7 +71,7 @@ * [How to guides](verify/how-to-guides/README.md) * [Connect a repository](verify/how-to-guides/connect-a-repository.md) * [Writing effective acceptance criteria](verify/how-to-guides/writing-effective-acceptance-criteria.md) - * [Writing a SKILL.md](verify/how-to-guides/writing-a-skill-md.md) + * [Writing a Verify skill](verify/how-to-guides/writing-a-skill-md.md) * [Creating a preview](verify/how-to-guides/creating-a-preview.md) * [Managing previews](verify/how-to-guides/managing-previews.md) * [Seed data for previews](verify/how-to-guides/seed-data-for-previews.md) diff --git a/verify/how-to-guides/writing-a-skill-md.md b/verify/how-to-guides/writing-a-skill-md.md index c3b2b5a..33ea8e7 100644 --- a/verify/how-to-guides/writing-a-skill-md.md +++ b/verify/how-to-guides/writing-a-skill-md.md @@ -1,120 +1,156 @@ -# Writing a SKILL.md +# Writing a Verify skill -A **SKILL.md** is a short context file the scenario runner reads before executing scenarios against your preview. It's how you tell the agent the things it can't infer from your code — test credentials, base URLs, fixture names, gotchas. +A **Verify skill** is a short markdown file that tells Verify how to drive your running app — how to sign in, what's worth checking, and where things live. It's the app-specific knowledge the agent can't infer from your code alone. -You write SKILL files per preview, not per repo. Different previews (staging, sandbox, prod-mirror) often need different context — and they should carry their own. +Verify reads your skill at two points: -> **Planned:** Per-preview skill directories are in development. The `skills_dir` field shown here isn't yet available in `aviator/verify.yaml`; field name and shape may change before release. +* **Planning** — when Verify turns an acceptance criterion into test scenarios, it reads your skill to ground the plan in your real login flow, routes, and what the preview can actually show. +* **Running** — when it drives a headless browser against your preview to capture evidence, it follows your skill's sign-in steps and navigation. -### Where SKILL files live +The test for what belongs here: the things a new engineer would have to ask before they could test your app sensibly are exactly what goes in a skill. -Each preview definition in `aviator/verify.yaml` can declare a `skills_dir`. Every `.md` file under that directory is loaded as context for scenarios that run against that preview. +### Where the skill lives -
A preview's skills_dir resolves to a set of SKILL files loaded by the scenario runner

Each preview carries its own skill set. The runner loads all of them before running scenarios.

+Each preview has one **entry-point** skill at `.aviator/verify/skills/.md`, named after the preview it describes. So the `default` preview reads `.aviator/verify/skills/default.md`. + +The entry point can **reference other files** in your repo. If your guidance grows, split it by concern and point at the pieces from the entry point — Verify reads those too, in place: + +``` +.aviator/verify/skills/ +├── default.md # entry point — points at the files below +├── auth.md # how to sign in +└── app.md # navigation + what's observable +``` + +
Verify reads your skill before planning and running scenarios

Verify reads your skill — and any files it references — before planning and running scenarios.

+ +The skill is read from the **commit under verification**, so it versions with your code — update it in the same change that alters the behavior. + +To point a preview at a file somewhere other than the default location, set `verify_skill`: ```yaml preview: - name: default image: api-preview port: 8000 - skills_dir: .aviator/skills - secrets: - - DB_PASSWORD - - STRIPE_KEY + verify_skill: docs/verify/main.md +``` + +`verify_skill` is a single repo-relative path; when set, it replaces the default `.md` lookup for that preview. See [Preview YAML](../reference/preview-yaml.md). + +### Credentials: never hard-code them + +When a flow needs to log in, **don't put real credentials in the skill.** Store them as account secrets (**Settings → Secrets** in the Aviator UI) and reference them by name with a `{{ secrets. }}` placeholder: + +```markdown +# Signing in + +The app redirects to `/login` on first load. To sign in: + +1. Navigate to the preview URL. +2. Fill the email field with `{{ secrets.app_admin_email }}`. +3. Fill the password field with `{{ secrets.app_admin_password }}`. +4. Click "Log in". ``` -A single `SKILL.md` at the root of `skills_dir` is the common case. For larger projects, split by domain — `AUTH.md`, `PAYMENTS.md`, `FIXTURES.md` — and the runner loads all of them. +When Verify drives the browser it substitutes the real value at the moment it fills the field. The value never appears in the skill, the plan, the prompt, or the run transcript — only the placeholder does. You can reference any account secret this way, and placeholders also work embedded in a string (e.g. `Authorization: Bearer {{ secrets.api_token }}`). + +> This is separate from a preview's `secrets:` list, which injects secrets as **environment variables into the preview container** so your app can boot. The same secret store backs both — `{{ secrets.* }}` is specifically for credentials Verify types into your UI. See [Preview YAML → Secrets](../reference/preview-yaml.md). ### What to include -Aim for the smallest set of facts that lets the agent run a scenario without trial and error. The categories that matter most: +Aim for the smallest set of facts that lets Verify run a scenario without trial and error: -| Category | What to write | -| ---------------------- | ---------------------------------------------------------------------------------------------- | -| **Base URL and ports** | "The API is served at `http://localhost:8000`. WebSocket endpoint at `/ws`." | -| **Auth setup** | How to obtain a token in this preview — test users, login endpoint, hard-coded sandbox tokens. | -| **Test users / orgs** | Named accounts that exist in the seeded data. What plan they're on, what they can access. | -| **Fixtures** | Where seed data lives, named IDs the agent can reference, how to reset state between scenarios. | -| **Side effects** | What's mocked vs. real. "Stripe runs in test mode — no real charges. Email sends are dropped." | -| **Gotchas** | Things that bit a previous run. "First request after boot takes ~3s due to JIT warmup." | +| Category | What to write | +| ------------------------ | ---------------------------------------------------------------------------------------------- | +| **Sign-in** | The login flow, step by step, with `{{ secrets.* }}` placeholders for credentials. | +| **What's observable** | What the running preview can and can't show (see below). | +| **Navigation** | Key routes and how to reach important screens. "The article list is at `/unread/list`." | +| **Test data / fixtures** | Named records in the seeded data, referenced by stable name, not ID. | +| **Side effects** | What's real vs. mocked. "Stripe runs in test mode — no real charges. Email sends are dropped." | +| **Gotchas** | Things that bit a previous run. "First request after boot takes ~3s due to JIT warmup." | -The test is: would a new engineer joining the team need to ask someone these things before they could write a sensible test? If yes, put it in SKILL.md. +### Tell Verify what's observable + +This is the most valuable thing a skill adds. Verify confirms a criterion by **driving the running app and watching what it does** — rendered UI, DOM, computed styles, console output, API responses. It can see that the app *initiated* something, but not a result that a background job, queue, or external system has to produce. + +So call out what your preview can and can't exercise: + +```markdown +## What's observable here +This is the full web UI driven through a browser, so rendered state, DOM, +computed styles, and console output are all fair game. Background workers and +outbound email are NOT exercised in the preview — don't try to verify anything +that depends on them. +``` + +This keeps Verify from burning a run trying to confirm something the preview structurally can't show — it verifies the responsible code path instead. ### What to leave out -The agent already has access to the codebase. Don't repeat what it can read: +Verify can already read your code. Don't repeat what it can see: * **Architecture descriptions.** "We use Express with Postgres" — the agent can see this. Don't restate it. * **Endpoint catalogs.** It will discover endpoints from the router. You don't need to list them. * **Code conventions.** That's invariants, not skills. ([Invariants](../concepts/invariants.md)) * **Implementation history.** "We used to use library X, switched to Y in 2024." Irrelevant for running scenarios. -* **The task at hand.** That's the intent, submitted via MCP per change. +* **The change under test.** That's the intent, submitted via MCP per change. -A bloated SKILL.md hurts more than a thin one. Every irrelevant line dilutes the context and slows the agent down. +A bloated skill hurts more than a thin one. Every irrelevant line dilutes the context and slows the agent down. -### Examples +### Example -**Minimal SKILL.md for a typical API service:** - -```markdown -# API preview +A `default.md` entry point that references two more files in the same directory: -The API runs at http://localhost:8000. +`default.md`: -## Auth -- Get a token: POST /auth/login with `{"email": "test@aviator.dev", "password": "test"}` -- Pass it as `Authorization: Bearer ` on every other request. +```markdown +# App verify guidance -## Test data -- Org `acme` (slug) exists with the `pro` plan. -- User `test@aviator.dev` is an admin of `acme`. -- User `member@aviator.dev` is a non-admin member of `acme`. +Read both of these files (same directory) before planning or driving: -## Reset -- Hit `POST /__test__/reset` to drop user-created records and re-seed. -- Reset is allowed because this preview's `STRIPE_KEY` is a test key. +- `auth.md` — how to sign in. +- `app.md` — getting around, and what the preview can exercise. ``` -That's enough for most scenarios. Notice what's missing: no architecture, no endpoint list, no schema, no history. +`auth.md`: -**Split skills for a multi-domain service:** +```markdown +# Signing in -``` -.aviator/skills/ -├── SKILL.md # base URL, auth, reset -├── PAYMENTS.md # Stripe test keys, idempotency, currency handling -└── FIXTURES.md # detailed seed data, named records -``` +The app gates everything behind a login form at `/login`. -`PAYMENTS.md` carries the payments-specific facts so the auth and fixtures skills stay focused. Files don't need to be named after the runner — split by what's coherent to read in one sitting. +1. Navigate to the preview URL. +2. Fill the email field with `{{ secrets.app_admin_email }}`. +3. Fill the password field with `{{ secrets.app_admin_password }}`. +4. Click "Log in" — you land on the dashboard. +``` -**Skill for a preview that talks to a third party:** +`app.md`: ```markdown -# Stripe sandbox - -Stripe runs in test mode. Every webhook is signed with a fixed test secret -(in env as STRIPE_WEBHOOK_SECRET). +# Driving the app -## Useful test cards -- 4242 4242 4242 4242 — succeeds -- 4000 0000 0000 0002 — declined (generic) -- 4000 0027 6000 3184 — requires 3DS +- The main view is a list of saved items, rendered as cards, at `/unread/list`. +- Settings live at `/config`; per-item actions (archive, star, delete) are on each card. -## Replaying webhooks -- POST /__test__/stripe/replay/ re-fires a stored event. -- Stored events live under tests/fixtures/stripe-events/. +## What's observable here +Rendered UI, DOM, computed styles, and console output are all available as +evidence. Background jobs and outbound email are not exercised in the preview. ``` +A single self-contained `default.md` works just as well — split into referenced files only when one file gets unwieldy. + ### Tips -* **Keep each file short.** If a SKILL file passes ~150 lines, split it. -* **Lead with the facts that change behavior.** Auth and test data first; gotchas last. +* **Keep it short.** If a file passes ~150 lines, split it into referenced files. +* **Lead with the facts that change behavior.** Sign-in and observability first; gotchas last. * **Use stable identifiers.** Reference fixtures by name (`org "acme"`), not by ID. IDs change when seed data is regenerated. -* **Update the skill when you change the seed.** Stale skills produce confidently wrong scenarios. -* **Don't worry about formatting.** The agent reads them as plain text. Standard markdown is enough — no special syntax required. +* **Update the skill when you change the seed or the login flow.** A stale skill produces confidently wrong scenarios. +* **Never put a literal secret in a skill.** Use `{{ secrets. }}` placeholders. ### See also * [Concepts: Invariants](../concepts/invariants.md) — for rules that apply across changes * [How Verify works](../how-it-works.md) — where skills fit in the verification pipeline +* [Preview YAML](../reference/preview-yaml.md) — `verify_skill` and `secrets` diff --git a/verify/reference/preview-yaml.md b/verify/reference/preview-yaml.md index 0bbab31..914ec62 100644 --- a/verify/reference/preview-yaml.md +++ b/verify/reference/preview-yaml.md @@ -29,6 +29,7 @@ If a single preview is declared with no `name`, it's treated as `default`. Alway | `setup` | string | no | Path (in the repo) to a setup script. Defaults to `.aviator/scripts/preview-setup.sh`. Runs after the container starts. | | `teardown` | string | no | Optional path (in the repo) to a teardown script. Runs before the container is destroyed. | | `secrets` | list of strings | no | Account secret keys. Each is injected into the container as an environment variable of the same name. | +| `verify_skill` | string | no | Repo-relative path to this preview's [Verify skill](../how-to-guides/writing-a-skill-md.md) entry point. Overrides the default `.aviator/verify/skills/.md` lookup. | ### Image @@ -48,6 +49,16 @@ secrets: Secrets are managed in the Aviator UI under **Settings → Secrets**. Scoped per account, granted to repos explicitly. The preview container never sees the unresolved name — only the value. +### Verify skill + +By default, Verify reads this preview's app-driving guidance from `.aviator/verify/skills/.md` (the entry-point file may reference other files in the repo). To point the preview at a file elsewhere, set a single path: + +```yaml +verify_skill: docs/verify/main.md +``` + +The path is repo-relative; when set, it replaces the default `.md` lookup. See [Writing a Verify skill](../how-to-guides/writing-a-skill-md.md). + ### Setup script `setup` runs inside the container after start. Use it for things that aren't baked into the image: