Skip to content

Optimize type checking range lookup with binary search#7

Open
DebanKsahu wants to merge 7 commits into
LukasNiessen:mainfrom
DebanKsahu:perf/_in_type_checking
Open

Optimize type checking range lookup with binary search#7
DebanKsahu wants to merge 7 commits into
LukasNiessen:mainfrom
DebanKsahu:perf/_in_type_checking

Conversation

@DebanKsahu

Copy link
Copy Markdown
Contributor

Pull Request

Key changes include:

Functional Improvements:

  • Improved the logic for checking if an AST node is inside a TYPE_CHECKING block by sorting the ranges and using bisect_right for efficient range lookup in _in_type_checking, enhancing both correctness and performance.
  • Ensured that the list of ranges for TYPE_CHECKING blocks is always sorted before use, which is necessary for the new binary search approach.

Code Style and Readability:

  • All other changes are due to ruff format and ruff check --fix

Change Type

  • Bug fix
  • Feature
  • Documentation
  • Refactoring
  • Improvement

@DebanKsahu DebanKsahu requested a review from LukasNiessen as a code owner May 28, 2026 05:37
@lukasniessen-bain

Copy link
Copy Markdown
Collaborator

Thanks for the contribution. I don’t think we should merge the binary-search optimization for _in_type_checking() as part of this PR.

The current implementation uses any(start <= lineno <= end for start, end in ranges), which is very straightforward and easy to reason about. In practice, I would expect the number of TYPE_CHECKING blocks per file to be very small, often zero or one, so the performance benefit from replacing this with sorting plus bisect_right() is likely to be negligible.

Even if the new implementation is correct, it adds complexity to code that is currently simple. It also makes the behavior harder to reason about in edge cases such as overlapping or nested ranges, because the binary-search version only checks one candidate range instead of naturally checking all ranges. From a KISS perspective, I don’t think the likely performance gain justifies the added complexity and maintenance cost here.

The normalization change looks reasonable to me, though: precomputing the normalized Python file set avoids rebuilding it repeatedly and keeps the code simple.

Could you please update the PR so it only includes the normalized file set change, and leaves the _in_type_checking() implementation as-is?

@DebanKsahu

Copy link
Copy Markdown
Contributor Author

Sure.

…ith binary search

Now `_find_type_checking_ranges()` will return a range array sorted by start line, and the lineno for any import node will be matched using binary search with the help of the `bisect` library for faster lookup.

Currently, this assumes that there are no overlapping ranges in the ranges array. Overlapping ranges can occur only when there are nested `if TYPE_CHECKING:` blocks.
Use `isinstance()` to ensure `node.lineno` is an integer before checking TYPE_CHECKING ranges.

This avoids returning `Any` from `_in_type_checking()` and fixes the mypy `no-any-return` error.
Replaced the inline normalization of file paths with a precomputed set
to improve performance and reduce redundant computations during graph
extraction.
@DebanKsahu DebanKsahu force-pushed the perf/_in_type_checking branch from 1ae06b8 to bab0ec2 Compare June 15, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants