Improve native trace exporter#5817
Conversation
|
|
Thank you for updating Change log entry section 👏 Visited at: 2026-06-04 08:12:23 UTC |
d5fa393 to
bd141e3
Compare
b49856e to
2b00aa3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf8ee694ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d8f4277 to
8bf4660
Compare
bf8ee69 to
444d0e5
Compare
Introduce the `Datadog::Tracing::Transport::Native::TracerSpan` class backed by libdatadog's `ddog_TracerSpan` FFI. `TracerSpan._native_from_span(span)` reads instance variables directly from a `Datadog::Tracing::Span` (`@name`, `@service`, `@meta`, etc.) and constructs a Rust-heap-allocated span via the C FFI, including: - Scalar fields (name, service, resource, type, IDs, timestamps) - 128-bit trace ID split via `trace_id_t` struct into (low, high) 64-bit halves - String tags (meta) and numeric tags (metrics) via safe hash iteration that stashes errors instead of raising inside `rb_hash_foreach` This is the first part of the trace exporter C extension; `TraceExporter` and `Response` classes will follow.
Introduce the `Datadog::Tracing::Transport::Native::TraceExporter` class backed by libdatadog's `ddog_TraceExporter` FFI. `TraceExporter._native_new(url, tracer_version, language, ...)` builds a Rust `TraceExporter` through the config/build pattern: - Validates argument types upfront (before allocating Rust resources) - Creates a `ddog_TraceExporterConfig`, applies each setting via `SET_CONFIG` macro with cleanup-on-error - Builds the exporter and wraps it in Ruby `TypedData` The `TraceExporter` is freed via `ddog_trace_exporter_free` on GC. `Response` and `_native_send_traces` will follow.
Introduce `Datadog::Tracing::Transport::Native::Response` with the full set of predicate methods expected by the `Writer`: - `ok?`, `internal_error?`, `server_error?`, `client_error?`, `not_found?`, `unsupported?` - `trace_count` and `payload` (agent response body for sampling rate feedback) Two C helpers build responses from the native side: - `create_ok_response`: success with optional payload - `create_error_response`: classifies `ddog_TraceExporterErrorCode` into client/server/internal error categories These will be used by `_native_send_traces` in the next step.
Implement `TraceExporter#_native_send_traces(traces)` which takes an `Array<Array<Span>>`, converts each span to Rust via the C FFI, groups them into trace chunks, and sends them to the Datadog Agent. Key safety patterns: - `rb_ensure` guarantees Rust-allocated trace chunks are freed if a Ruby exception fires during span conversion - `rb_thread_call_without_gvl2` releases the GVL during the blocking network send so other Ruby threads can run - The "gvl2" variant is used (not plain `gvl`) to prevent automatic interrupt raising before Rust response/error objects are inspected and freed - A retry loop handles the case where an interrupt causes `gvl2` to return before the send function actually runs The agent's response body (containing `rate_by_service` for sampling feedback) is extracted and surfaced via `Response#payload`. This completes the C extension for the native trace transport.
Classes created with `rb_define_class_under` are attached to their parent module's constant table, making them reachable by GC. Marking them as GC roots with `rb_global_variable` is unnecessary.
Delegate to the existing helper instead of duplicating the `RSTRING_PTR`/`RSTRING_LEN` construction and type check inline.
Use a typed function pointer and a `static inline` function instead of a preprocessor macro. Same codegen, easier to debug and type-check.
Adopt the same pattern used by the profiling extension: a `process_pending_interruptions` callback paired with a `check_if_pending_exception` wrapper that returns the pending state as an integer instead of requiring manual `rb_protect` at the call site.
Replace the `ddog_TraceExporterResponse **response_out` double pointer with a `ddog_TraceExporterResponse *response` field and pass its address to the FFI call. This removes one level of indirection and keeps all send-related state within the args struct.
Use a descriptive variable name for the cast data pointer.
Move error handling into `send_chunks_without_gvl`: extract the error code enum and free the `ddog_TraceExporterError` before returning. After reacquiring the GVL only the plain `error_code` and `failed` flag remain, eliminating the need to free Rust-allocated error objects on every code path.
Accept keyword arguments via `rb_scan_args(argc, argv, "0:", ...)` instead of 9 positional VALUE parameters. This makes call sites self-documenting and prevents silent argument ordering mistakes. The Ruby call becomes: TraceExporter._native_new(url:, tracer_version:, language:, ...)
Define Response in Ruby with keyword-argument `initialize` and `attr_reader` accessors. The C extension loads the class at init time via `rb_require` and calls `Response.new(ok:, ...)` through `rb_funcallv_kw` instead of manually setting instance variables. This gives Ruby's JIT better visibility into the object shape and reduces the amount of C code.
Count entries skipped due to unexpected types during hash iteration and log a warning after the loop completes. Previously these entries were silently dropped, making it difficult to diagnose missing tags.
Replace two `rb_funcall` calls (`to_i` + `nsec`) with a single `rb_time_timespec` call that extracts the underlying `struct timespec` directly from the Ruby Time object. Zero method dispatch overhead.
Replace two `rb_funcall` calls (`&` and `>>`) with direct extraction via `rb_big_pack`, which copies the Bignum magnitude into a word buffer without method dispatch or temporary object allocation. A Fixnum fast path handles the common case of 64-bit trace IDs with a simple `FIX2LONG` cast.
Construct a `ddog_TracerSpanFields` struct with designated initializers and pass it by reference to `ddog_tracer_span_new`. This matches the updated libdatadog FFI signature and is self-documenting at the call site.
Hoist `span_count` before `ddog_tracer_trace_chunks_begin_chunk` and pass it as the capacity argument to pre-allocate the inner span vector.
The error path is unreachable in practice (the handle is always valid), but the return value must still be freed to avoid leaking the Rust-allocated error object.
444d0e5 to
dc44d9b
Compare
8bf4660 to
44ba8ad
Compare
create_ok_response/create_error_response build the Response object from C with keyword arguments via rb_funcallv_kw + RB_PASS_KEYWORDS, both of which were introduced in Ruby 2.7. On Ruby 2.5/2.6 the extension failed to compile (implicit declaration of rb_funcallv_kw, RB_PASS_KEYWORDS undeclared), fatal under -Werror-implicit-function-declaration. Probe for the function in extconf.rb and, when absent, pass the kwargs hash as a trailing positional argument, which pre-2.7 Ruby treats as keywords.
dc44d9b to
0d9c573
Compare
0d9c573 to
20a4b5d
Compare
BenchmarksBenchmark execution time: 2026-06-17 17:43:53 Comparing candidate commit 128a133 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 1 unstable metrics.
|
The native trace exporter / transport specs need the libdatadog_api extension, so they can't run in spec:main (no compile). Put them in a dedicated NATIVE_TRANSPORT_SPECS group and a new spec:native_transport task (enhance([:compile])): excluded from spec:main, listed in spec:all, and added to the Matrixfile. Keep them OUT of core_with_libdatadog_api: that pool also runs the fork-heavy ProcessDiscovery specs, and a live native exporter (SharedRuntime threads) present when ProcessDiscovery forks deadlocks the child until fork safety lands (#5835). A separate task runs them in their own process.
Switch from `ddog_TraceExporterError *` to `ddog_MaybeError` for all tracer construction functions (`span_new`, `set_meta`, `set_metric`, `trace_chunks_new`, `begin_chunk`, `push_span`). Use the existing `read_ddogerr_string_and_drop` helper via a new `check_maybe_error` inline function. `send_trace_chunks` retains `ddog_TraceExporterError` because callers need the error code for HTTP response classification.
Replace the separate `*const uint8_t` return + `out_len` out-parameter with a single `ddog_ByteSlice` struct, matching the updated libdatadog FFI signature.
The tracer construction functions return `ddog_TraceExporterError *` again, matching the libdatadog FFI. Restore `check_exporter_error` usage at the call sites and drop the `ddog_MaybeError`-based handling. This reverts the consumer-side `ddog_MaybeError` adaptation: the structured `ExporterError` type with error codes is retained upstream instead.
20a4b5d to
128a133
Compare
What does this PR do?
Brings the native trace exporter C extension (introduced in #5690) up to the
current libdatadog
mainFFI. Net effect: the response body is now extracted viaddog_trace_exporter_response_get_bodyreturning addog_ByteSlice(instead ofthe older raw pointer + out-length pair).
The branch history also records an exploration that was tried and reverted:
construction functions were briefly switched to
ddog_MaybeErrorand thenrestored to
ddog_TraceExporterError, because upstream keptExporterError(its
codeis needed to classify HTTP responses). The commits are kept ratherthan squashed to preserve that review trail.
Motivation:
#5690 is an intentional "historical intermediate point" that compiles against an
earlier libdatadog FFI. This PR is the focused diff that adapts it to the merged
mainAPI, kept as a separate PR (rather than folded into #5690) to preserve thereview history and cross-linking.
Part of the stack: #5690 -> #5817 -> #5699 -> #5835. A separate change
(#5859) bumps the
libdatadoggem to 35.0.0, which is the released version thatactually carries this
ByteSliceAPI.Change log entry
None.
Additional Notes:
AI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
Covered by the native transport specs introduced in the stack
(
spec/datadog/tracing/transport/native/). Building the C extension against alibdatadog that exposes the
ddog_ByteSliceget_bodyAPI exercises the change;CI runs the suite once the gem is bumped to 35.0.0 (#5859).