Skip per-tick sampling of profiler-internal threads#5955
Conversation
|
6656528 to
3b3cf59
Compare
|
In the |
10efbe0 to
b3d4ae0
Compare
|
Before: After: cpu_sampling_overhead and serialization_overhead drop and all regular samples are skipped for the 2 profiler-internal threads. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3d4ae0f42
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // situation we stop immediately and never even start the sampling trigger loop. | ||
| if (state->stop_thread == rb_thread_current()) return Qnil; | ||
|
|
||
| thread_context_collector_mark_current_thread_as_profiler_internal(); |
There was a problem hiding this comment.
Avoid charging sampler sleep as overhead
When no_signals_workaround_enabled is true, run_sampling_trigger_loop calls grab_gvl_and_sample() from this same worker thread, making CpuAndWallTimeWorker the current_thread in thread_context_collector_sample. Because this line now marks that current thread as internal, the new skip path returns before updating its previous-sample timestamps, and the immediately following record_sampling_overhead(state, current_thread_context) consumes the whole interval since the last tick as profiler overhead; no-signal profiles will therefore attribute the worker's sleep/wait time to overhead instead of recording one internal-thread sample per period.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Alas, this was a nice and simple change but it won't be so simple anymore 😓
CpuAndWallTimeWorker and IdleSamplingHelper threads are now sampled only once per profiling period (via the on-serialize flush) instead of every 10ms tick, reducing profiling overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BenchmarksBenchmark execution time: 2026-06-25 15:49:45 Comparing candidate commit e2c83f2 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 48 metrics, 1 unstable metrics.
|
In no-signals workaround mode, the worker thread itself runs sampling via grab_gvl_and_sample. Since the worker is now skipped in the sample loop (timestamps not updated), record_sampling_overhead would attribute the entire sleep interval as overhead. Fix by temporarily setting timestamps to sample-start, then restoring them so on_serialize can still accumulate the full reporting period. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b3d4ae0 to
4909597
Compare
…iler-internal threads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mples on_serialize now force-samples profiler-internal threads, which could satisfy the samples.any? check before any signal-triggered sample ran. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CpuAndWallTimeWorker and IdleSamplingHelper threads are now sampled only once per profiling period (via the on-serialize flush) instead of every 10ms tick, reducing profiling overhead.
What does this PR do?
^
Motivation:
Reduce profiler overhead
Change log entry
Yes, Changed: Profiling: Reduce profiler overhead by sampling profiler-internal threads only once per minute.
Additional Notes:
How to test the change?