APMSP-3555 DI: add timeout to regexp evaluation in EL matches operator#5908
APMSP-3555 DI: add timeout to regexp evaluation in EL matches operator#5908p-datadog wants to merge 11 commits into
Conversation
The matches operator in the Dynamic Instrumentation expression language compiles a string into a Regexp and applies it. Pathological patterns (catastrophic backtracking) can otherwise run for many seconds on short inputs and stall the thread executing the probe. This change caps the regexp evaluation time at 500ms per matches call: - On Ruby 3.2+, Regexp.new is called with the timeout: keyword, which interrupts the matcher at every backtrack step. The engine raises Regexp::TimeoutError when the budget is exhausted. - On Ruby < 3.2, the matcher is wrapped in Timeout.timeout(0.5). The Onigmo engine on those versions polls for pending interrupts during matching, so Timeout.timeout's Thread#raise does bound the runtime, though with ~100ms granularity (verified locally on Ruby 2.6 through 3.1). The new spec at spec/datadog/di/el/evaluator_spec.rb parameterizes the timeout value over two cases (0.2s and 1.0s) and asserts that the observed wall-clock elapsed time tracks the configured value. Running both cases together demonstrates that the timeout is actually controlling behavior rather than the regexp finishing naturally or some unrelated bound being hit. Verified locally on Ruby 2.5 (DI-skipped), 2.6, 2.7, 3.0, 3.1, 3.2, 3.3, and 3.4.
|
Thank you for updating Change log entry section 👏 Visited at: 2026-06-23 21:03:13 UTC |
Typing analysisNote: Ignored files are excluded from the next sections.
|
BenchmarksBenchmark execution time: 2026-06-25 16:48:39 Comparing candidate commit e15c7e9 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 1 unstable metrics.
|
The previous spec used the bare literals 0.05, 0.6, and 1.5 for the lower-bound clock-skew margin and the per-case upper bounds. Replace them with two named locals: - clock_skew_margin_seconds = 0.05 - overshoot_budget_seconds = 0.5 The upper bound for each case is now derived as timeout_seconds + overshoot_budget_seconds instead of being passed as a positional argument to the shared example, so both bounds in both cases come from the same named constants applied uniformly to each configured timeout. Also named the haystack_length local (was '* 30' inline) and added a short comment on the pattern explaining why it triggers catastrophic backtracking on every supported Ruby.
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: e15c7e9 | Docs | Datadog PR Page | Give us feedback! |
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR adds a runtime cap to Dynamic Instrumentation’s Expression Language matches operator to prevent pathological regular expressions (catastrophic backtracking) from stalling probe execution threads. It implements a 500ms-per-call timeout using Ruby 3.2+’s in-engine regexp timeout, with a Timeout.timeout fallback on older Rubies, and adds coverage to ensure the cap is effective.
Changes:
- Add
MATCHES_TIMEOUT_SECONDS = 0.5and enforce it inEvaluator#matches(Regexp engine timeout on Ruby 3.2+,Timeout.timeoutfallback otherwise). - Add a new spec that verifies normal matching behavior and that a pathological pattern is interrupted near the configured timeout.
- Update the RBS signature to include the new timeout constant.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
lib/datadog/di/el/evaluator.rb |
Enforces a per-call timeout for EL matches depending on Ruby version. |
spec/datadog/di/el/evaluator_spec.rb |
Adds regression tests for both correct matching and timeout behavior on pathological regexps. |
sig/datadog/di/el/evaluator.rbs |
Declares the new MATCHES_TIMEOUT_SECONDS constant in the Evaluator signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f234821e14
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three review-driven changes to Evaluator#matches:
1. Switch from `haystack =~ re` to `re.match?(haystack)` so the
thread-local match data ($~, $1, ...) is not mutated as a side
effect of DI expression evaluation. The customer thread runs through
DI evaluation; previously, any subsequent customer code reading $~
on that thread would see DI-controlled match data.
2. Rewrite the method-body comment to be honest about the Ruby < 3.2
path: Onigmo's interrupt-poll mechanism reliably bounds many
pathological patterns but not all (see Ruby bug #18144). The old
wording ("~100ms granularity") framed this as a precision issue
when in practice some patterns are not bounded at all.
3. Add a YARD docstring to #matches covering @param, @return, and the
two @raise variants (Regexp::TimeoutError on Ruby 3.2+,
Timeout::Error on older Rubies).
No behavior change for well-formed patterns. Tests continue to pass:
- spec/datadog/di/el/evaluator_spec.rb: 4 examples, 0 failures
- spec/datadog/di/el/integration_spec.rb: 143 examples, 0 failures
The `matches` operator recompiled its Regexp on every probe firing
(`Regexp.new(needle, timeout:)` / `Regexp.compile(needle)` ran per call),
on the application thread between enter_probe/leave_probe. When the needle
is a string literal -- the common case, e.g. `matches(ref('var'),
"hello[a-z]")` -- the pattern is known at expression-compile time and need
not be recompiled per call.
The Compiler now precompiles literal needles once during compilation
(Compiler#precompile_regexp), stores them on the Compiler (`#regexps`),
threads them through Expression into the Evaluator instance, and emits
`matches_compiled(<haystack>, <index>)` instead of `matches(...)`. The
per-match timeout (MATCHES_TIMEOUT_SECONDS) is baked into the compiled
Regexp on Ruby 3.2+ exactly as before; on Ruby < 3.2 the Timeout.timeout
wrapping stays at match time. Both the runtime and precompiled paths share
Evaluator.compile_regexp (build) and the private #apply_match (apply with
timeout), so the version-split logic lives in one place.
Dynamically-computed needles (non-literal second argument) keep compiling
at evaluation time via #matches. Invalid literal patterns fall back to the
runtime path so they still raise at evaluation time as before, rather than
moving the RegexpError to expression-compile time.
Tests: extended evaluator_spec to cover #matches_compiled correctness and
timeout, the Compiler precompilation/dynamic/invalid-fallback branches, and
parametrized the timeout examples over both paths. integration_spec now
passes compiler.regexps to Expression so its literal `matches` case
exercises the precompiled path end to end.
Verified functionally on Ruby 3.2.3 via a standalone harness exercising the
real EL classes: compile -> Expression -> evaluate for literal needles,
matches/matches_compiled correctness (incl. nil haystack -> false), and
timeout firing (Regexp::TimeoutError, ~0.2s) on both paths. rspec, Steep,
and the Ruby < 3.2 Timeout.timeout path were not run locally (bundle install
fails with Bundler::PermissionError writing the gem dir); the di:di_with_ext
CI matrix (2.6-4.0) validates those.
The matches-precompilation change accumulated precompiled Regexps in a Compiler instance variable (@regexps) populated as a side effect of the recursive compile walk, then pulled back out by the caller via an attr_reader and threaded into Expression. That made a previously stateless AST->String transformer stateful, smuggled a second result out through an accessor instead of a return value, and let regexps accumulate across calls if a Compiler instance were reused (no guard enforced one-compile-per-instance). Make Compiler#compile return [code, regexps] and thread the regexp accumulator as an explicit local parameter through compile_partial and precompile_regexp. The Compiler holds no per-compilation state and is safe to reuse. Callers destructure the pair; Expression and Evaluator are unchanged. Verified: spec/datadog/di/el/{evaluator,integration}_spec.rb (154 examples, 0 failures) and the touched probe_notification_builder evaluate_template example pass; steep check clean on compiler.rb, expression.rb, probe_builder.rb. (The probe_notification_builder exception_message failures are the uncompiled libdatadog_api C extension, unrelated to this change.)
Address review feedback on the matches-timeout change:
- Type `matches_compiled`/`apply_match` haystack as String in the RBS,
matching the sibling `matches` signature, instead of untyped. Verified
with `steep check`.
- Remove `matches` from TWO_ARG_METHODS: it is now intercepted by the
dedicated `when 'matches'` branch (which precedes the generic two-arg
branch), so the membership was dead. Add a comment noting the special
casing.
- Fix the stale `compiled:` documentation in the string.yml `matches`
integration case: a literal needle now compiles to
`matches_compiled(ref('var'), 0)`, not `matches(...)`.
spec/datadog/di/el (integration + evaluator) passes, 154 examples.
Replace post-construction setter injection (@evaluator.regexps =) with constructor injection. The Evaluator now takes regexps in initialize and exposes it read-only; the public attr_writer and the lazy `@regexps ||= []` default are gone, so regexps is set once and cannot be mutated afterward. Steep types Class#new as () -> untyped and cannot see the inherited initializer through the dynamically created Evaluator subclass in Expression, so the cls.new(regexps) call carries a documented steep:ignore. The RBS for initialize is also corrected: it previously claimed a Context parameter the implementation never had. spec/datadog/di/el, probe_notification_builder, probe_builder pass (182 examples); steep check clean on the three el files.
|
@codex review |
| # frozen_string_literal: true | ||
|
|
||
| require 'timeout' if RUBY_VERSION < '3.2' | ||
|
|
| def self.compile_regexp(needle) | ||
| if RUBY_VERSION >= '3.2' | ||
| Regexp.new(needle, timeout: MATCHES_TIMEOUT_SECONDS) | ||
| else | ||
| Regexp.compile(needle) | ||
| end | ||
| end |
| def apply_match(re, haystack) | ||
| if RUBY_VERSION >= '3.2' | ||
| re.match?(haystack) | ||
| else | ||
| Timeout.timeout(MATCHES_TIMEOUT_SECONDS) do | ||
| re.match?(haystack) | ||
| end | ||
| end | ||
| end |
| # @param dsl_expr [String] human-readable DSL form, kept for debugging. | ||
| # @param compiled_expr [String] Ruby source produced by Compiler#compile. | ||
| # @param regexps [Array<Regexp>] regexps precompiled from literal | ||
| # `matches` needles (Compiler#regexps), looked up by the compiled | ||
| # expression via Evaluator#matches_compiled. |
| let(:expected_timeout_error) do | ||
| if RUBY_VERSION >= '3.2' | ||
| Regexp::TimeoutError | ||
| else | ||
| Timeout::Error | ||
| end | ||
| end |
|
Codex Review: Didn't find any major issues. 🚀 Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Replace bare RUBY_VERSION string comparisons in the EL matches timeout code with Datadog::RubyVersion.is?, the codebase's canonical version check. Datadog::RubyVersion.is? compares via Gem::Version semantics, avoiding the lexical-comparison hazard of bare RUBY_VERSION (e.g. "3.2.10" < "3.2.3" as strings) and matching the existing usage in di/base.rb and di/component.rb. Adds require_relative '../../ruby_version' to evaluator.rb because the load-time `require 'timeout'` guard now calls the helper before the rest of the library is loaded; ruby_version.rb only depends on rubygems, so it is safe to require on the DI path. Verified: steep check clean (full-context run including the inline-checked ruby_version.rb); spec/datadog/di/el 154 examples, 0 failures.
What does this PR do?
Caps the regexp evaluation time of the
matchesoperator in the Dynamic Instrumentation expression language (lib/datadog/di/el/evaluator.rb) at 500ms per call.Regexp.new(needle, timeout: 0.5). The regexp engine itself enforces the cap at every backtrack step and raisesRegexp::TimeoutErrorwhen the budget is exhausted.Regexp.compile(needle)is preserved and the=~is wrapped inTimeout.timeout(0.5). Onigmo on those versions polls for pending interrupts during matching, soTimeout.timeout'sThread#raisedoes bound the runtime (~100ms granularity).The timeout is a class-level constant
Datadog::DI::EL::Evaluator::MATCHES_TIMEOUT_SECONDS, kept private (@api private) like the rest of the evaluator.Motivation:
matchescompiles a string into a Regexp and applies it. Pathological patterns (catastrophic backtracking) can otherwise run for many seconds on short haystacks and stall the thread executing the probe before the probe's post-execution circuit breaker (check_and_disable_if_exceeded,lib/datadog/di/instrumenter.rb:598) gets a chance to fire.Capping at the regexp engine bounds the operator's wall-clock cost independently of pattern shape or haystack content.
Change log entry
Yes. DI: cap regexp evaluation time in the expression language
matchesoperator at 500ms.Additional Notes:
The fix is a bit ugly but a better fix requires rewriting the entire compilation and evaluation logic which is not in scope for this PR.
Regexp.new(..., timeout:)is documented at https://docs.ruby-lang.org/en/3.2/Regexp.html#class-Regexp-label-Timeout. It interrupts the matcher itself, unlikeTimeout.timeout, which uses a separate sleeper thread and delivers the interrupt viaThread#raise. For thematchesoperator both are safe because the matcher is a side-effect-free pure computation, and the EL is restricted to side-effect-free operators by design.How to test the change?
New spec at
spec/datadog/di/el/evaluator_spec.rbcovers:^([a-z]+)*\1$against('a' * 30) + '1'— over 5 seconds baseline on every supported Ruby) is interrupted after approximately the configured timeout.For case (2) the timeout is parameterized over two values (0.2s and 1.0s) via
stub_constonMATCHES_TIMEOUT_SECONDS. Each case asserts that the elapsed wall-clock time tracks the configured timeout. Running both cases together demonstrates that the timeout is actually controlling behavior rather than the regexp finishing naturally or some unrelated bound being hit.The spec was verified locally on Ruby 2.5 (skipped — DI requires 2.6+), 2.6, 2.7, 3.0, 3.1, 3.2, 3.3, and 3.4. The existing
spec/datadog/di/el/integration_spec.rb(143 examples, including thematchescases inintegration_cases/string.yml) continues to pass.