Skip to content

Expose Prometheus resource-count metrics to power the "FHIR Resources over time" UI chart #185

Description

@smunini

Summary

The UI needs to render a "FHIR Resources over time" chart (see mockup below): a time-series/area chart of cumulative resource counts, broken down by resource type (Patient, Observation, Encounter, Condition, MedicationRequest, DiagnosticReport, Procedure, AllergyIntolerance, …), with a per-type series selector and current totals in the legend.

To serve this, we need to expose resource-count data that can be aggregated over time and by resource type. The natural home for this is Prometheus metrics scraped on an interval, which gives us the time dimension for free.

Current state

There is currently no Prometheus / metrics instrumentation anywhere in the workspace:

  • No prometheus / metrics / opentelemetry dependency in any Cargo.toml (0 occurrences in Cargo.lock).
  • No /metrics endpoint. The only observability surface today is health/probe endpoints in helios-rest (/health, /_liveness, /_readiness in crates/rest/src/handlers/health.rs).
  • Resource counting exists only as on-demand query logic, never recorded as a metric:
    • count(...) on the storage trait — crates/persistence/src/core/storage.rs:368
    • count_resources(tenant, resource_type) and list_resource_types(...)crates/persistence/src/search/reindex.rs:109 (used only for reindex progress)
    • FHIR _summary=countcrates/rest/src/handlers/search.rs:519
  • ROADMAP.md:236 currently marks "No Prometheus/OpenTelemetry integration … Not planned" — this issue would supersede that line, so we should update the roadmap as part of the work.

Note: despite the phrasing "extend our Prometheus metrics," there are no existing metrics to extend — this is foundational work to stand up the metrics facility, with the resource-count metric as its first consumer.

Proposed work

  1. Add a Prometheus metrics facility to the server:

    • Add a metrics registry (e.g. prometheus or metrics + metrics-exporter-prometheus).
    • Expose a GET /metrics endpoint in helios-rest (text exposition format), wired into the hfs binary. Decide auth posture (likely auth-exempt like /health, or gated — see open questions).
  2. Emit a resource-count metric by type (and tenant):

    • A gauge such as hfs_fhir_resources_total{resource_type="Observation", tenant="..."} reflecting the current count per resource type per tenant.
    • Reuse the existing count_resources / list_resource_types persistence logic to populate it, refreshed on a scrape/interval basis (avoid expensive per-scrape full counts on large stores — consider a cached/periodically-refreshed value).
  3. Serve the chart's time series:

    • With Prometheus scraping the gauge over time, the "resources over time" chart is a range query against Prometheus.
    • Confirm whether the UI queries Prometheus directly or whether HFS should expose a small aggregation/query endpoint for the UI to consume (see open questions).

Chart reference (UI mockup)

Cumulative area chart, x-axis = date, y-axis = resource count (0–50k in the mock), one series per selected resource type, multi-select of resource types, legend shows current totals (e.g. Patient 1,204 · Observation 38,910 · Encounter 9,3xx).

Open questions

  • Should /metrics be auth-exempt (like /health) or protected? Multi-tenant metrics may leak tenant existence/volume.
  • Per-tenant label cardinality — acceptable, or aggregate at the exposition layer?
  • Does the UI scrape Prometheus directly (needs a Prometheus deployment + query proxy), or should HFS provide a purpose-built time-series endpoint for the chart? This affects whether we also need historical storage inside HFS vs. relying on Prometheus retention.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions