One who reads and announces markings.
Go web service for classifying PDF documents' security markings using Azure AI Foundry GPT vision models. See _project/README.md for full architecture and roadmap.
- Go 1.26+
- Bun
- ImageMagick 7.0+ with Ghostscript (for PDF rendering)
- Docker and Docker Compose
- Air (for Go hot reload in development)
- mise (optional, for task runner shortcuts)
Development runs the Go server on the host with infrastructure (PostgreSQL, Azurite) in Docker.
Start infrastructure:
docker compose up -dRun database migrations:
go run ./cmd/migrate -upStart the dev server (two terminals):
# Terminal 1 — watch and rebuild the web client
cd app && bun run watch
# Terminal 2 — hot reload the Go server
air
# Terminal 2 — hot reload the Go server with Azure Entra auth
HERALD_ENV=auth airThe web client is available at http://localhost:8080/app.
Run the full stack entirely in Docker (app + PostgreSQL + Azurite):
docker compose -f docker-compose.yml -f compose/app.yml up --buildThis builds the Herald Docker image and starts all services with health-conditioned dependencies. The app loads config.docker.json via the HERALD_ENV=docker overlay to resolve container hostnames.
To stop:
docker compose -f docker-compose.yml -f compose/app.yml downThe web client is available at http://localhost:8080/app.
Herald uses mise as a task runner. All tasks can also be run directly with the underlying commands.
| Task | Command | Description |
|---|---|---|
mise run dev |
go run ./cmd/server |
Run the server in development mode |
mise run build |
go build -o bin/server ./cmd/server |
Build the server binary |
mise run test |
go test ./tests/... |
Run all tests |
mise run vet |
go vet ./... |
Run go vet |
mise run migrate:up |
go run ./cmd/migrate -up |
Run all up migrations |
mise run migrate:down |
go run ./cmd/migrate -down |
Run all down migrations |
mise run migrate:version |
go run ./cmd/migrate -version |
Print current migration version |
mise run web:fmt |
cd app && bunx prettier --write client/ |
Format web client source files |
mise run web:build |
cd app && bun run build |
Build the web client |
mise run web:watch |
cd app && bun run watch |
Watch and rebuild the web client |
Config loading follows a layered overlay pattern:
config.json— base configurationconfig.<HERALD_ENV>.json— environment overlay (e.g.,config.docker.json)secrets.json— gitignored secretsHERALD_*environment variables — final overrides
All environment variables use the HERALD_ prefix (e.g., HERALD_SERVER_PORT, HERALD_DB_HOST).
Azure Entra authentication is opt-in. To enable it locally, create a config.auth.json overlay and run with HERALD_ENV=auth.
App registration setup:
-
Register an app in Azure Entra ID (portal → App registrations → New registration)
- Name:
herald - Supported account types: Single tenant
- Redirect URI: SPA platform →
http://localhost:8080/app/
- Name:
-
Expose an API (left sidebar)
- Set Application ID URI:
api://<client-id>(default) - Add a scope (e.g.,
access) — Admin and users can consent
- Set Application ID URI:
-
API permissions (left sidebar)
- Add your app's scope as a delegated permission (e.g.,
api://<client-id>/access) - Grant admin consent
- Add your app's scope as a delegated permission (e.g.,
-
Note the Directory (tenant) ID and Application (client) ID from the Overview page
Create config.auth.json:
{
"auth": {
"auth_mode": "azure",
"tenant_id": "<tenant-id>",
"client_id": "<client-id>",
"scope": "<scope-name>"
}
}The scope field is the bare scope name (e.g., access). The client composes the full api://<client-id>/<scope> format at runtime. When omitted, defaults to access_as_user.
cmd/herald is a standalone, scriptable client for bulk operations against the Herald API — uploading documents, triggering classification, and pulling results back out for ingest into an external program-of-record. It is a stateless, Azure-CLI-style primitive: each command makes exactly one API call and emits the response as JSON to stdout. Orchestration — batching, concurrency, retries — is deliberately left to your shell scripts.
Build:
go build -o bin/herald ./cmd/heraldThe release version is injected at build time (the herald-release workflow does this automatically from the tag; the CLI is versioned independently of the server, starting at v0.1.0):
go build -ldflags "-X github.com/JaimeStill/herald/internal/cli.version=v0.1.0" -o bin/herald ./cmd/heraldConfiguration is resolved by merging layers from lowest to highest precedence:
- built-in defaults (API
http://localhost:8080, outputjson, timeout10m) - profile —
~/.herald/settings.jsonthen~/.herald/secrets.json(%USERPROFILE%\.heraldon Windows) - working directory —
settings.json, thesettings.<HERALD_CLI_ENV>.jsonoverlay, thensecrets.json HERALD_CLI_*environment variables- command-line flags
The profile is the base and the working directory overrides it (local wins), so an operator can configure ~/.herald once and run herald from anywhere, while a project directory can still override per run. All variables use the HERALD_CLI_ prefix so they never collide with the server's HERALD_*, and these files are distinct from the server's config.json/secrets.json. Run herald settings show to print the fully resolved configuration from where you stand (the client secret is redacted unless --show-secrets is passed).
| Global flag | Env | Description |
|---|---|---|
--api <url> |
HERALD_CLI_API |
Herald API base URL |
--scope <scope> |
HERALD_CLI_SCOPE |
full Entra token scope (default api://<client-id>/.default) |
--timeout <dur> |
HERALD_CLI_TIMEOUT |
per-request timeout (e.g. 10m) |
--output json|jsonl |
HERALD_CLI_OUTPUT |
output format |
Auth settings use the HERALD_CLI_AUTH_* namespace (_MODE, _TENANT_ID, _CLIENT_ID, _CLIENT_SECRET, _MANAGED_IDENTITY, _AUTHORITY).
When auth_mode is azure, the CLI acquires an Entra bearer token and sends it on every request. The token's audience must be the Herald API app (api://<client-id>); the server validates the audience and signature only. The --scope / HERALD_CLI_SCOPE value is the full outbound token scope — derived as api://<client-id>/.default from the auth client ID when unset, and operator-overridable to handle environment quirks (e.g. the IL6 trailing-slash or .default vs. delegated-scope differences).
The credential is selected by pkg/auth: if tenant ID, client ID, and client secret are all present, a service-principal ClientSecretCredential is used; otherwise the DefaultAzureCredential chain (az login, managed identity, environment).
A settings.auth.json overlay supplies the azure mode, tenant, and client ID:
{
"auth": {
"auth_mode": "azure",
"tenant_id": "<tenant-id>",
"client_id": "<client-id>"
}
}Option A — interactive (az login, delegated token). Uses your signed-in user identity. The universal Azure CLI client (04b07795-8ddb-461a-bbee-02f9e1bf7b46) must be authorized on the Herald API app registration:
- App registration → Expose an API → add a delegated scope (e.g.
access_as_user). - Under Authorized client applications, add the Azure CLI client ID with that scope, and grant admin consent.
az login --tenant <tenant-id>
HERALD_CLI_ENV=auth ./bin/herald documents listOption B — service principal (client secret, app-only token). The credential the production operator uses. Put the azure auth block in ~/.herald/settings.json, then store the secret with settings secret so it never lands in shell history — it accepts the secret on stdin, which pairs well with Azure Key Vault:
az keyvault secret show --vault-name <vault> --name <secret> --query value -o tsv | herald settings secret
# writes ~/.herald/secrets.json (0600) → { "auth": { "client_secret": "..." } }
herald documents listThe secret can also be passed as a positional argument (herald settings secret <value>) or written to a gitignored secrets.json by hand: { "auth": { "client_secret": "<value>" } }.
If the secret belongs to the same app registration as the API, the derived api://<client-id>/.default scope is already correct. If it belongs to a separate client app, set client_id to that client and override the scope to the API's audience: HERALD_CLI_SCOPE=api://<api-client-id>/.default. App-only .default token issuance may require an Application-type app role on the API granted to the client SP.
herald <command> [flags] [args]
| Command | Description |
|---|---|
documents upload --file <pdf> --external-id <n> --platform <p> |
Upload one file with its external-system linkage; emits the created Document |
documents list [filters] |
One page of documents; emits the raw paginated result (--status, --platform, --external-id, --classification, --page, --page-size, …) |
documents get <id> |
Fetch a single document |
classify <document-id> |
Trigger classification, consume the SSE stream to completion, emit the resulting Classification |
classifications list [filters] |
One page of classifications (--classification, --confidence, --document-id, --page, …) |
classifications get <id> |
Fetch a single classification |
classifications by-document <document-id> |
Fetch a document's classification |
settings show [--show-secrets] |
Print the fully resolved settings from all sources; the client secret is redacted unless --show-secrets |
settings secret [<secret>|-] |
Write the Entra client secret to ~/.herald/secrets.json; reads stdin when the argument is omitted or - |
version |
Print the CLI version |
The external_id + external_platform pair set on upload round-trips on every document response, serving as the join key back to the program-of-record — no separate correlation file is needed.
Because each command is a single API call, batching and concurrency live in your scripts. Example bulk upload → classify → retrieve, capping concurrency with xargs -P:
# Upload a batch (4 at a time), capturing each created document id
jq -c '.[]' batch.json | xargs -P4 -I{} bash -c '
item={}
./bin/herald documents upload \
--file "$(jq -r .file <<<"$item")" \
--external-id "$(jq -r .external_id <<<"$item")" \
--platform "$(jq -r .external_platform <<<"$item")"
' | jq -r .id > ids.txt
# Classify each (2 at a time — one request fans out across pages server-side)
xargs -P2 -a ids.txt -I{} ./bin/herald classify {} > classifications.jsonl
# Pull results for ingest
./bin/herald documents list --status review --output jsonl