feat(otel): end-to-end tracing — Go CLI propagation + platform log correlation (DSPX-3635)#3722
feat(otel): end-to-end tracing — Go CLI propagation + platform log correlation (DSPX-3635)#3722dmihalcik-virtru wants to merge 2 commits into
Conversation
Thread end-to-end OpenTelemetry tracing through the Go CLI and add trace/log correlation on the platform, per DSPX-3635. - otdfctl: new pkg/tracing (no-op unless OTEL_EXPORTER_OTLP_ENDPOINT is set), a root span per command that continues TRACEPARENT from the environment, and an otelconnect client interceptor via sdk.WithExtraClientOptions so platform/KAS calls carry the trace. Context is threaded into CreateTDFContext (encrypt) and Reader.Init (decrypt) so the KAS Rewrap joins the trace; a cli.OnExit hook flushes spans on os.Exit paths. - service/logger: ContextHandler stamps trace_id/span_id from the active span onto every log record. Signed-off-by: Dave Mihalcik <dmihalcik@virtru.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements end-to-end OpenTelemetry tracing for the Go CLI, allowing distributed traces to be threaded from the test harness through the CLI and into the platform and KAS services. By injecting trace and span IDs into log records and ensuring proper context propagation, this change significantly enhances observability and simplifies the debugging of distributed requests across the platform. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. A trace in the dark, Spans connect the system's heart, Logs now find their way. Footnotes
|
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Benchmark Statistics
Bulk Benchmark Results
TDF3 Benchmark Results:
|
There was a problem hiding this comment.
Code Review
This pull request introduces end-to-end OpenTelemetry tracing support across the Go CLI (otdfctl), platform, and KAS, allowing trace propagation from client commands to downstream services and correlating logs with active traces. Feedback on the changes suggests filling or removing empty placeholder sections in the design specification document (spec/DSPX-3635.md) and logging a warning or debug message if OpenTelemetry initialization fails instead of silently ignoring the error.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ## Problem / Motivation | ||
| _Why does this work need to happen? What is the user/business pain?_ | ||
|
|
||
| ## Proposed Solution | ||
| _What will you build, at a functional level? Sketch the approach._ | ||
|
|
||
| ## Inputs / Outputs / Contracts | ||
| _Function signatures, data shapes, API contracts, CLI flags._ | ||
|
|
||
| ## Edge Cases & Constraints | ||
| _Boundary conditions, error states, performance limits, security considerations._ | ||
|
|
||
| ## Out of Scope | ||
| _What this work item explicitly does not cover._ | ||
|
|
||
| ## Acceptance Criteria | ||
| - [ ] _Clear, testable condition_ | ||
| - [ ] _…_ |
There was a problem hiding this comment.
The specification file contains several empty placeholder sections (e.g., Problem / Motivation, Proposed Solution, Inputs / Outputs / Contracts, etc.) with template instructions. Since the summary section at the top already contains a lot of details, please either fill in these sections with the relevant details or remove them to keep the document clean and professional.
| exporter, err := otlptracegrpc.New(ctx, exporterOptions(endpoint)...) | ||
| if err != nil { | ||
| // Tracing is best-effort: never block the CLI because a collector is | ||
| // unreachable or misconfigured. | ||
| return func() {}, false | ||
| } |
There was a problem hiding this comment.
When OTEL_EXPORTER_OTLP_ENDPOINT is set but otlptracegrpc.New fails to initialize (e.g., due to an invalid endpoint format), the error is silently ignored. Since slog is already initialized before tracing.Init is called, it would be highly beneficial to log a debug or warning message here to aid in troubleshooting misconfigured tracing environments.
DSPX-3635 — End-to-end OpenTelemetry tracing (platform + Go CLI)
Part of the DSPX-3635 effort to thread a single distributed trace from the xtest
pytest harness → SDK CLI → platform/KAS, and to correlate logs with traces.
Companion PR: opentdf/tests#549 (pytest plugin +
otdf-localJaeger wiring)What's here
pkg/tracing— OTLP/gRPC tracer init that is a strict no-op unlessOTEL_EXPORTER_OTLP_ENDPOINTis set (no startup cost/failure for normal users).cmd/root.go, continuing an inboundTRACEPARENTfrom the environment.otelconnectclient interceptor viasdk.WithExtraClientOptions, so platform/KAS calls carry the trace.CreateTDFContext, decrypt callsReader.Init(ctx)beforeio.Copy(the SDK otherwise unwraps withcontext.Background()).cli.OnExithook flushes spans onos.Exitpaths (defers don't run there — and failing decrypts are exactly when the trace is wanted).ContextHandlernow stampstrace_id/span_idfrom the active span onto every log record → log↔trace correlation.Testing
go build ./...clean (otdfctl + service);go testpassing forlogger,otdfctl/pkg/{tracing,handlers,cli}; new unit tests added.golangci-lintclean on changed packages.Out of scope (follow-ups)
java-sdk/cmdline) and JS Node CLI (web-sdk/cli) propagation.