ci: trigger dogfood eval pipeline on run-evals PR label by calvarjorge · Pull Request #428 · databricks/appkit

calvarjorge · 2026-06-09T09:26:56Z

What

Adds a GitHub Actions workflow that launches the dogfood eval pipeline (job 398185277057549) for a PR when the run-evals label is present, and re-launches it on every new commit while the label stays on.

How it works

Trigger: pull_request with types: [labeled, synchronize].
- labeled + label is run-evals → run.
- synchronize (new commit) + PR already has run-evals → run.
Commit: passes github.event.pull_request.head.sha (the real PR head commit, never the synthetic merge commit) as the appkit_ref job param, so the pipeline can pull the code. Also sets prompt_preset=custom-pr and tags=appkit_pr:<number>.
Latest wins: concurrency with cancel-in-progress: true (grouped per PR) guarantees the sticky ⏳ Eval running comment always reflects the most recently triggered commit, even if an earlier run's job finishes first.
Comment: a sticky comment (.github/scripts/upsert-eval-comment.cjs) links the evals-monitor PR page (/prs/appkit/<number>) and the triggered job run (run_id from the run-now response — no extra API call).
Auth: OAuth M2M as the apps-mcp-evals-runner service principal. Credentials are scoped to the trigger step only, so the PR-authored comment script never sees them.

Security notes

Uses pull_request (not pull_request_target): repo secrets are withheld from fork PRs, so an external contributor cannot exfil the credentials even by editing the workflow. The run-evals label gate also requires write/triage access.
Action SHAs are pinned.

Required setup before this works

Create the run-evals label in the repo.
Generate an OAuth secret on the apps-mcp-evals-runner SP and add two repo secrets:
- EVALS_DATABRICKS_CLIENT_ID_DOGFOOD
- EVALS_DATABRICKS_CLIENT_SECRET_DOGFOOD
Grant the SP CAN MANAGE RUN on job 398185277057549.

Testing

Because this is a pull_request-triggered workflow, it runs the PR branch's version of the workflow, so it can be exercised on this PR once the label and secrets exist.

This pull request and its description were written by Isaac.

Add a GitHub Actions workflow that launches the dogfood eval pipeline (job 398185277057549) when the `run-evals` label is added to a PR, and re-launches it on each new commit while the label stays on. The real PR head commit is passed as `appkit_ref` so the pipeline can pull the code; `prompt_preset=custom-pr` and `tags=appkit_pr:<number>` are also set. Authenticates as the apps-mcp-evals-runner service principal via OAuth M2M, and posts a sticky "Eval running" comment linking the evals-monitor PR page and the triggered job run. Comment logic lives in .github/scripts/upsert-eval-comment.cjs. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>

Probe dogfood reachability + workspace OIDC discovery and run a forced oauth-m2m authenticated call with debug logging, to pin down the "cannot configure default credentials" failure. To be reverted once auth works. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>

Allow manual runs to probe staging connectivity from an arbitrary runner group (runner_group/runner_labels inputs), since dogfood.staging blocks the default databricks-protected-runner-group at the network edge. A bare dispatch runs only the diagnostic; pass pr_number to also trigger the job and post the comment. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>

The databricks-protected-runner-group's egress to internal Databricks hosts is gated by the GitHub OIDC identity. Without `id-token: write` the egress proxy returns 403 "RBAC: access denied" for every request (incl. anonymous curl to dogfood.staging), which is what broke OAuth M2M. All other Databricks workflows in this repo set this permission. Also revert the temporary manual-dispatch/configurable-runner testing scaffolding; back to label/synchronize on databricks-protected-runner-group. Co-authored-by: Isaac Signed-off-by: Jorge Calvar <jorge.calvar@databricks.com>

calvarjorge · 2026-06-09T14:08:20Z

Closing for now — GitHub Actions runners can't reach dogfood.staging (network perimeter), so we can't trigger the eval job directly from CI. Keeping the branch jorge_calvar/eval_trigger in case we revisit this with a staging-capable runner or inverted trigger. Replacing with a lightweight approach: post a link to the evals-monitor PR page where the eval can be started.

calvarjorge added the run-evals Run the dogfood eval pipeline on this PR label Jun 9, 2026

calvarjorge added 3 commits June 9, 2026 11:35

calvarjorge closed this Jun 9, 2026

calvarjorge mentioned this pull request Jun 9, 2026

ci: post evals-monitor link comment on new PRs #430

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: trigger dogfood eval pipeline on run-evals PR label#428

ci: trigger dogfood eval pipeline on run-evals PR label#428
calvarjorge wants to merge 4 commits into
mainfrom
jorge_calvar/eval_trigger

calvarjorge commented Jun 9, 2026

Uh oh!

calvarjorge commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

calvarjorge commented Jun 9, 2026

What

How it works

Security notes

Required setup before this works

Testing

Uh oh!

calvarjorge commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant