Skip to content

fix(sof-export): clean up partial results on cancel, plus a TTL reaper (#144)#180

Open
mauripunzueta wants to merge 4 commits into
mainfrom
fix/144-sof-export-cleanup-cancelled
Open

fix(sof-export): clean up partial results on cancel, plus a TTL reaper (#144)#180
mauripunzueta wants to merge 4 commits into
mainfrom
fix/144-sof-export-cleanup-cancelled

Conversation

@mauripunzueta

Copy link
Copy Markdown
Contributor

Summary

Closes #144 and addresses the adjacent storage-hygiene gaps it surfaced in the SQL-on-FHIR async export controller ($viewdefinition-export / $sqlquery-export).

Previously, cancelling a job transitioned it to Cancelled but its already-written output shards were never deleted and stayed downloadable; and finished jobs were never reclaimed at all — the in-process jobs map and the sink output grew unbounded for the process lifetime, despite the completion manifest advertising a 24h Expires.

What changed

1. Cancellation cleanup (#144 as scoped)

  • New delete_job(&job_id) on the ExportSink trait, implemented for all three sinks:
    • FilesystemSinkremove_dir_all on the job dir (missing dir is not an error)
    • InMemorySink — drops entries under the {job_id}/ key prefix
    • S3Sink — paginated list_objects_v2 + delete_object under the job key prefix (idempotent)
  • Called from the cancellation path, and again from the background task if it finishes a job that was cancelled mid-run (covers the write-after-cancel race).
  • read_shard is gated on terminal state so a cancelled job's files 404 even while deletion is still draining.

2. TTL reaper for finished jobs

  • A background reaper deletes a terminal job's output and drops its bookkeeping once it ages past the TTL, bounding both storage and the in-memory map. Aligned with the manifest's advertised 24h Expires.
  • JobStatus::Cancelled now carries cancelled_at, and a terminal_at() helper lets all three terminal states age uniformly.
  • New config: HFS_EXPORT_OUTPUT_TTL (default 86400) and HFS_EXPORT_CLEANUP_INTERVAL (default 300). The interval is clamped to ≥ 1s (tokio::time::interval panics on zero).

3. Failed-job partial cleanup

  • A failed job's result URL returns 500 with no manifest, so its shards are unreachable but were never deleted. They're now removed on failure, and read_shard gates Failed alongside Cancelled.

Deliberately not changed

DELETE on a completed job remains a no-op that preserves the Completed state (the status URL keeps redirecting to the result manifest), per test_export_cancel_after_completion_preserves_completed_state. Completed output is reclaimed by the TTL reaper above, not by DELETE.

Testing

  • New unit tests: cancel_deletes_partial_output_and_download_404s, reap_expired_reclaims_terminal_jobs_only (age threshold, running-immunity, output deletion + bookkeeping removal).
  • Full helios-rest lib suite (282) and the sof_export integration suite (44) pass, including the completion-preservation test.
  • cargo fmt clean; cargo clippy --features s3 --all-targets -D warnings clean.

🤖 Generated with Claude Code

…lled

Per the SQL-on-FHIR operations-common spec (HL7/sql-on-fhir#365), a server
SHOULD clean up partial results when an export is cancelled via DELETE on the
status URL. Previously, cancelling a $viewdefinition-export / $sqlquery-export
job transitioned it to Cancelled but already-written output shards were never
deleted and remained downloadable via GET /export/{job_id}/{filename}.

- Add `delete_job(&self, job_id)` to the `ExportSink` trait and implement it
  for all three sinks: FilesystemSink (remove_dir_all), InMemorySink (drop
  matching keys), S3Sink (paginated list + delete under the job key prefix).
- Call it from the cancellation path, and again from the background task when
  it finishes a job that was cancelled mid-run (covers the write-after-cancel
  race).
- Gate `read_shard` on Cancelled status so a cancelled job's files 404 even
  while deletion is still draining.

Closes #144
…ials

Follow-up to the cancellation cleanup. Two adjacent storage-hygiene gaps in the
SoF async export controller, found while reviewing #144:

1. No retention at all. The completion manifest advertises a 24h `Expires`, but
   nothing ever reclaimed finished jobs — the in-memory `jobs` map and the sink
   output grew unbounded for the process lifetime. Add a background reaper
   (HFS_EXPORT_OUTPUT_TTL, default 24h; HFS_EXPORT_CLEANUP_INTERVAL, default
   300s) that deletes a terminal job's output and drops its bookkeeping once it
   ages past the TTL. `JobStatus::Cancelled` now carries `cancelled_at` and a
   `terminal_at()` helper lets all three terminal states age uniformly.

2. Failed jobs left orphaned partial shards. A failed job's result URL returns
   500 with no manifest, so its shards are unreachable but were never deleted.
   The background task now deletes them on failure, and `read_shard` gates
   `Failed` (alongside `Cancelled`) so a racing poll never serves stale output.

DELETE-on-completed is intentionally left as a no-op that preserves the
Completed state (per test_export_cancel_after_completion_preserves_completed_state);
completed output is reclaimed by the TTL reaper above, not by DELETE.

Tests: reap_expired_reclaims_terminal_jobs_only. Full helios-rest lib suite (282)
and the sof_export integration suite (44) pass; clippy clean with --features s3.
Document the output lifecycle at the module level and flesh out the
per-sink delete_job contracts (idempotency, key-prefix semantics, and why
the S3 path bridges to the blocking pool). Comments only — no behavior change.
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

…2026-0185

Two ambient CI breakages on origin/main, unrelated to this PR's changes:

- clippy 1.91 newly flags collapsible_else_if in sof/emit.rs; collapse the
  nested else { if .. } into else if.
- cargo audit fails on RUSTSEC-2026-0185 (quinn-proto remote memory
  exhaustion), a transitive reqwest QUIC dep. We never accept inbound QUIC,
  so the reassembly path is unreachable; ignore it with justification.
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.50943% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/rest/src/export/sink.rs 41.66% 7 Missing ⚠️
crates/rest/src/export/in_memory.rs 96.12% 6 Missing ⚠️
crates/persistence/src/sof/emit.rs 87.50% 3 Missing ⚠️
crates/rest/src/export/controller.rs 71.42% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@smunini

smunini commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@claude and @mauripunzueta how does this change work in a clustered multi hfs environment?

@claude

claude Bot commented Jul 1, 2026

Copy link
Copy Markdown

Claude finished @smunini's task in 1m 15s —— View job


Todo List

  • Read trigger comment and gather context
  • Investigate how export job state, sink storage, and the TTL reaper behave across multiple hfs instances
  • Answer the clustered/multi-instance question

How this works in a clustered multi-hfs environment

Short answer: it doesn't, cleanly — this export controller ($viewdefinition-export / $sqlquery-export) is single-instance by design, and this PR doesn't change that. It's a pre-existing property of the feature, not a regression introduced here.

Why:

  • InMemoryController (crates/rest/src/export/in_memory.rs:55-64) tracks all job state in a DashMap<JobId, JobStatus> (plus a job_tenants map) that lives entirely in one process's heap. It's constructed once per hfs process in build_app() (crates/rest/src/lib.rs:517-597) — there is no shared/external job store.
  • Each job's work runs in a tokio::spawn task on the instance that received the kick-off POST (ExportJobController::submit, controller.rs:202).
  • The TTL reaper added in this PR (spawn_cleanup in in_memory.rs:117-131) only walks that process's jobs/job_tenants maps — it has no visibility into jobs submitted to sibling instances.
  • Filesystem output (FilesystemSink, default sink) writes to local disk (export_dir, default ./exports) unless that path happens to be a shared volume across pods — not the case in a typical stateless container deployment.

Practical consequence behind a load balancer without sticky sessions:

  1. POST .../$viewdefinition-export lands on instance A, which starts the job and returns Content-Location: /export/{job-id}/status.
  2. A subsequent GET on that status URL may land on instance B. B's jobs map has never heard of {job-id}, so get_status returns None → the handler responds 404, even though the job is running fine on A.
  3. Same problem for DELETE (cancel) and the download route — they only succeed if routed back to the instance that owns the job.
  4. If using S3Sink, the output bytes are centrally visible to all instances (good), but the job bookkeeping (status/progress/manifest, and this PR's cancel/reap deletion logic) is still local to instance A, so B still 404s on status/cancel even though it could technically read/delete the S3 objects if it had the job's metadata.

Mitigations today (not implemented in this PR):

  • Session affinity / sticky routing at the load balancer, keyed on job ID or client IP, so all requests for a given export land on the instance that started it.
  • Run the SOF export feature on a single instance (or scale it out as an independent deployment) while horizontally scaling the rest of the FHIR API.

Contrast with the codebase's other export system: $export (FHIR Bulk Data Access, crates/persistence/src/core/bulk_export_worker.rs) is built for clustering — job state lives in BulkExportJobStore, backed by the same SQLite/Postgres database as FHIR resources, with claim/lease semantics (HFS_BULK_EXPORT_LEASE_DURATION, HFS_BULK_EXPORT_HEARTBEAT_INTERVAL) so any instance can pick up work and any instance can answer a status/cancel/download request. See .claude/skills/bulk-data-export/SKILL.md's "Multi-instance Recipe" for the documented pattern (Postgres + S3/MinIO). The SOF export controller in this PR doesn't share that architecture — it would need an equivalent shared-store rework (DB- or Redis-backed job state, plus routing cancel/reap through it) to be safe in a clustered deployment. That's a reasonable follow-up issue if multi-instance SOF export is a real deployment target, but it's out of scope for #144/this PR, which is only fixing cleanup hygiene within the existing single-instance model.

No code changes made — this was a question, not an implementation request.

@smunini

smunini commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@aacruzgon and @mauripunzueta , this is another case where we need a cluster-capable job store. Relates to #169 and #170 and #150

We need a discussion document that describes the areas of hfs that need to be modified, via an environment variable config setting to use a unified cluster-capable job store and document it's use for those who want to run hfs in a cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SoF export: clean up partial results when a job is cancelled

2 participants