feat(otel): xtest tracing plugin + otdf-local Jaeger wiring (DSPX-3635)#549
feat(otel): xtest tracing plugin + otdf-local Jaeger wiring (DSPX-3635)#549dmihalcik-virtru wants to merge 1 commit into
Conversation
Add the client half of DSPX-3635 end-to-end tracing. - otdf-local: `up --tracing` starts the Jaeger docker-compose profile and injects OTLP trace config into the generated platform and KAS configs; `env` exports OTEL_EXPORTER_OTLP_ENDPOINT when tracing is active. - xtest: new fixtures/tracing.py wraps each test in a pytest.test span, propagates TRACEPARENT to the SDK CLI subprocess, and prints a Jaeger trace URL on failure. Enabled by --tracing or OTEL_EXPORTER_OTLP_ENDPOINT; a strict no-op otherwise. Signed-off-by: Dave Mihalcik <dmihalcik@virtru.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Code Review
This pull request introduces opt-in end-to-end OpenTelemetry tracing support for the local test environment and integration tests, enabling trace propagation and Jaeger UI integration. The feedback highlights a critical issue where the Jaeger container can be left running after stopping services because the tracing profile is omitted during 'docker compose down'. Additionally, suggestions are provided to improve type safety in the tracing fixture by using 'Any' instead of 'object' to eliminate type ignores, and to expand test failure detection to cover the setup and teardown phases rather than just the main test execution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def _compose_cmd(self, *args: str) -> list[str]: | ||
| """Build a `docker compose` command, opting into the `tracing` profile | ||
| (which includes the Jaeger all-in-one container) when tracing is enabled. | ||
| """ | ||
| cmd = ["docker", "compose", "-f", str(self._compose_file)] | ||
| if self.settings.tracing: | ||
| cmd += ["--profile", "tracing"] | ||
| return cmd + list(args) |
There was a problem hiding this comment.
When running otdf-local down, the CLI runs in a new process where self.settings.tracing is False (since --tracing is only an option on the up command). Consequently, _compose_cmd will not include the tracing profile, and the Jaeger container will be left running (orphaned) after down. We should ensure the tracing profile is included when stopping services (i.e., when "down" is in the arguments).
| def _compose_cmd(self, *args: str) -> list[str]: | |
| """Build a `docker compose` command, opting into the `tracing` profile | |
| (which includes the Jaeger all-in-one container) when tracing is enabled. | |
| """ | |
| cmd = ["docker", "compose", "-f", str(self._compose_file)] | |
| if self.settings.tracing: | |
| cmd += ["--profile", "tracing"] | |
| return cmd + list(args) | |
| def _compose_cmd(self, *args: str) -> list[str]: | |
| """Build a `docker compose` command, opting into the `tracing` profile | |
| (which includes the Jaeger all-in-one container) when tracing is enabled. | |
| """ | |
| cmd = ["docker", "compose", "-f", str(self._compose_file)] | |
| if self.settings.tracing or "down" in args: | |
| cmd += ["--profile", "tracing"] | |
| return cmd + list(args) |
| import logging | ||
| import os | ||
| from collections.abc import Iterator | ||
| from dataclasses import dataclass |
There was a problem hiding this comment.
Import Any from typing to allow proper typing of the TracingSession dataclass fields without requiring module-level imports of OpenTelemetry packages.
| import logging | |
| import os | |
| from collections.abc import Iterator | |
| from dataclasses import dataclass | |
| import logging | |
| import os | |
| from collections.abc import Iterator | |
| from dataclasses import dataclass | |
| from typing import Any |
| tracer: object # opentelemetry.trace.Tracer | ||
| provider: object # opentelemetry.sdk.trace.TracerProvider |
There was a problem hiding this comment.
Typing tracer and provider as object requires using # type: ignore[attr-defined] when calling methods on them. Typing them as Any avoids type-checking errors and allows removing the type ignores.
| tracer: object # opentelemetry.trace.Tracer | |
| provider: object # opentelemetry.sdk.trace.TracerProvider | |
| tracer: Any # opentelemetry.trace.Tracer | |
| provider: Any # opentelemetry.sdk.trace.TracerProvider |
| from opentelemetry.propagate import inject | ||
|
|
||
| tracer = _tracing.tracer | ||
| with tracer.start_as_current_span("pytest.test") as span: # type: ignore[attr-defined] |
There was a problem hiding this comment.
| rep = getattr(request.node, "rep_call", None) | ||
| if rep is not None and rep.failed: | ||
| span.set_status(trace.Status(trace.StatusCode.ERROR)) | ||
| print(f"\nTrace: {trace_url}") |
There was a problem hiding this comment.
Currently, only rep_call is checked to determine if the test failed. However, integration tests can also fail during the setup or teardown phases (e.g., if a fixture fails to initialize or clean up). We should check all three phases (setup, call, teardown) to ensure the span status is correctly set to ERROR and the trace URL is printed for any failure.
| rep = getattr(request.node, "rep_call", None) | |
| if rep is not None and rep.failed: | |
| span.set_status(trace.Status(trace.StatusCode.ERROR)) | |
| print(f"\nTrace: {trace_url}") | |
| failed = False | |
| for phase in ("setup", "call", "teardown"): | |
| rep = getattr(request.node, f"rep_{phase}", None) | |
| if rep is not None and rep.failed: | |
| failed = True | |
| break | |
| if failed: | |
| span.set_status(trace.Status(trace.StatusCode.ERROR)) | |
| print(f"\nTrace: {trace_url}") |



DSPX-3635 — End-to-end OpenTelemetry tracing (test harness)
Part of the DSPX-3635 effort to thread a single distributed trace from the xtest
pytest harness → SDK CLI → platform/KAS, so a failing test links directly to one
Jaeger trace showing the full request chain.
Companion PR: opentdf/platform#3722 (Go CLI propagation + platform log correlation)
What's here
up --tracingstarts the existing Jaegertracingdocker-compose profile (UI :16686, OTLP :4317) — no new container.server.traceOTLP config into the generated platform and each KAS config (settings.trace_config_updates()), reusing the existingcopy_yaml_with_updatesseam.envexportsOTEL_EXPORTER_OTLP_ENDPOINTwhen the running config has tracing enabled.fixtures/tracing.py— wraps each test in apytest.testspan (attrs:test.name/sdk/container), injectsTRACEPARENTinto the env the SDK CLI subprocess inherits, and printsTrace: <jaeger-url>on failure.--tracingorOTEL_EXPORTER_OTLP_ENDPOINT; a strict no-op otherwise (no OTel imports/cost).Testing
ruff check/ruff format/pyrightclean onotdf-localandxtest.--tracingregistered on bothpytestandotdf-local up.How to run the loop
Expect a
Trace:URL per test showingpytest.test → cli.encrypt/decrypt → platform Rewrap → KAS Rewrap, with matchingtrace_idin the platform/KAS logs.Out of scope (follow-ups)