feat: universe platform QOL#2
Draft
raisedadead wants to merge 5 commits into
Draft
Conversation
`/healthz` stays an unconditional liveness probe so k8s can tell
the process is alive. `/readyz` is the new readiness probe: it
runs `Health.Ping(ctx)` against the Valkey backend (3s timeout)
then `R2.HasPrefix(ctx, _artemis_readyz_probe_no_match)` against
the bucket (another 3s) and returns either `200 {"ready":true}` or
`503` with a `valkey_unreachable` / `r2_unreachable` code tag in
the body. Valkey probe runs first so an outage there short-circuits
before paying the R2 round-trip — verified by a regression test
that asserts the R2 mock was never called when the Health probe
errors.
A new `RegistryHealth` interface scopes the contract to a single
`Ping(ctx) error` method. `*valkey.Store` already satisfies it so
the only wiring change in cmd/artemis is one extra struct field.
Five handler tests cover the matrix (both up, Valkey down, R2
down, ordering).
Chart-side `readinessProbe` wiring lands in the infra repo in the
next commit; CHANGELOG records the feature under Unreleased.
New /metrics endpoint exposes four prometheus counters:
artemis_registry_refresh_failures_total
run() refresh path in registry/valkey/Reader; an optional
OnRefreshError hook decouples the registry package from
metrics — cmd/artemis sets it to inc the counter.
artemis_alias_drift_total
SitePromote + SiteRollback CAS body-pin mismatches (409).
artemis_promote_legacy_bare_total
Empty-body POST /api/site/{site}/promote — surfaces clients
that have not adopted CAS. Confirms when the v0.3.0 flip to
require body-pin is safe.
artemis_upstream_error_total{op}
Every writeUpstreamError invocation, labelled by op tag
(r2.put.alias, valkey.register, …). Cross-dependency
reliability surface.
`handler.SetMetrics` installs a package-level handle so
writeUpstreamError doesn't need a per-call ref; SitePromote /
SiteRollback use h.Metrics directly via Handlers struct.
AccessLog gains a `/healthz` + `/readyz` + `/metrics` skip-map so
k8s probes (and prometheus scrapes) stop flooding the structured
log. server.New() now takes a prometheus.Gatherer; pass nil to
disable the endpoint in tests.
Verified: `go test ./...` = 258/258 green across 10 packages
(3 new tests cover counter registration, writeUpstreamError
increment, AccessLog skip set). `go vet` clean. `go build` clean.
Two defensive fixes surfaced by the pre-push reviewer:
- `handler.SetMetrics` previously wrote `pkgMetrics` without any
synchronization. The contract is "call once at startup before
serving traffic," but nothing enforced it; a second call from a
rebuild path or a misconfigured test could race against the
request-side read inside `writeUpstreamError`. Wrap the
initialization in `sync.Once`. First caller wins; subsequent
calls are no-ops. sync.Once's happens-before guarantee covers the
read side without an explicit mutex.
- `Reader.invokeOnRefreshError` wraps the consumer-supplied
callback in a panic-recovering shim. A panicking callback in the
refresh path would have killed the `run()` goroutine and frozen
the registry snapshot indefinitely; recover() keeps the
stale-read mode intact and emits a structured log entry naming
the original refresh error so the operator notices the broken
callback.
- Fixed the misleading comment block on `Handlers.Metrics` —
AccessLog does NOT reach for `pkgMetrics`, only
`writeUpstreamError` does.
Verified: `go test ./...` 9/9 packages pass; `go vet ./...` clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.