Skip to content

Type stats follow renaming#5805

Open
marcotc wants to merge 3 commits into
masterfrom
typing-stats-deterministic-comparison
Open

Type stats follow renaming#5805
marcotc wants to merge 3 commits into
masterfrom
typing-stats-deterministic-comparison

Conversation

@marcotc

@marcotc marcotc commented May 22, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Makes typing stats comparison follow renames and line moves for RBS declarations and steep:ignore comments.

Motivation:

Without this change, type reports currently show file renames as "introduced + cleared".

Example:

Untyped methods (+3 -3)
Partially typed methods (+5 -5)

Change log entry

None.

Additional Notes:

Tests were added for .github/scripts/typing_stats_compare.rb, for both old and new behavior.
To ensure test run, I added it to the same Raketask as our custom Rubocop cops (spec:custom_cop), but renamed the task to spec:tooling.
These tests do not run on spec:main.

How to test the change?

bundle exec rake spec:tooling

@marcotc marcotc self-assigned this May 22, 2026
@marcotc marcotc added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label May 22, 2026
@marcotc marcotc force-pushed the typing-stats-deterministic-comparison branch 2 times, most recently from 3fd76e5 to 4d05ff2 Compare May 22, 2026 05:33
@marcotc marcotc force-pushed the typing-stats-deterministic-comparison branch from 4d05ff2 to 258e456 Compare May 22, 2026 05:33
@marcotc marcotc marked this pull request as ready for review May 22, 2026 05:35
@marcotc marcotc requested a review from a team as a code owner May 22, 2026 05:35

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 258e456f4c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/scripts/typing_stats_compute.rb Outdated
path: path.to_s,
line: ignore.line
}
ignored_source = source_lines[ignore.line - 1]&.strip

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Distinguish block steep ignores by their ignored code

When the suppression is a block form (# steep:ignore:start), ignore.line points at the start comment itself, so ignored_source is also just that comment. Since the new comparison key intentionally ignores line numbers, two block suppressions with the same start comment in the same file become indistinguishable; removing one block and adding an unrelated block elsewhere in that file will cancel out and the typing-stats comment will hide the newly introduced suppression.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, now that we're not comparing with line numbers anymore but the content of the ignored lines, we should probably use everything that is between the steep:ignore:start and steep:ignore:end (which would also correctly report any change between start and end). But maybe this should be for another PR

@datadog-official

datadog-official Bot commented May 22, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 97.07% (-0.03%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 6331a4e | Docs | Datadog PR Page | Give us feedback!

@vpellan vpellan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thank you! I never really had time to go back to this tooling, but this should prevent a lot of noise due to renaming files or adding/removing lines. I still requested a change because as-is, this PR might not work as expected (see my comment in AST traversal)

Comment thread .github/scripts/typing_stats_compute.rb Outdated
path: path.to_s,
line: ignore.line
}
ignored_source = source_lines[ignore.line - 1]&.strip

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, now that we're not comparing with line numbers anymore but the content of the ignored lines, we should probably use everything that is between the steep:ignore:start and steep:ignore:end (which would also correctly report any change between start and end). But maybe this should be for another PR

Comment thread .github/scripts/typing_stats_compute.rb Outdated
::RBS::AST::Declarations::Class,
::RBS::AST::Declarations::Interface
ast_traversal(declaration.members, result)
ast_traversal(declaration.members, result, context + [declaration.name.to_s])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you rename a file, you'll usually also rename the corresponding Module, including in the .rbs file, which means the context will be different. I don't see any operation on context in typing_stats_compare.rb so I assume it will report them as two different declaration and not fix the issue you're trying to solve here. We could probably normalize the file name, and when comparing the context, check every module name against their corresponding renamed one (although there's most likely some edge cases but it would work most of the time)

I'd also add a test for that (the current tests do not rename modules)

@pr-commenter

pr-commenter Bot commented May 27, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-05-27 21:51:05

Comparing candidate commit ba9e851 in PR branch typing-stats-deterministic-comparison with baseline commit f87530c in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 45 metrics, 1 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants