Skip to content

Implement KmerTable in Rust#918

Open
padix-key wants to merge 2 commits into
biotite-dev:mainfrom
padix-key:rust-kmertable
Open

Implement KmerTable in Rust#918
padix-key wants to merge 2 commits into
biotite-dev:mainfrom
padix-key:rust-kmertable

Conversation

@padix-key

Copy link
Copy Markdown
Member

Pursuing #688, this PR replaces KmerTable and BucketKmerTable with a cleaner and faster Rust implementation.

An important changed implementation detail is the layout of the table itself: Previously the it was implemented as table of pointers, where each pointer maps to a separately allocated array. Now all k-mers are part of the same data array and the main table contains the offset to this data array. This avoids millions of allocations and removes the need to store the length of each array, reducing the memory requirements substantially.

Furthermore, a bug is fixed where ignore masks for k-mers from spaced k-mer alphabets are incorrectly applied.

@codspeed-hq

codspeed-hq Bot commented Jun 27, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 58.36%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 27 improved benchmarks
❌ 2 (👁 2) regressed benchmarks
✅ 74 untouched benchmarks
⏩ 4 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
test_pickle_and_unpickle[KmerTable-None] 37.7 ms 9.4 ms ×4
test_pickle_and_unpickle[KmerTable-11*11*1*1***111] 37.5 ms 9.6 ms ×3.9
benchmark_indexing_from_kmers[None-KmerTable] 11.3 ms 5.2 ms ×2.2
benchmark_indexing_from_kmers[11*11*1*1***111-KmerTable] 11.4 ms 5.3 ms ×2.2
benchmark_indexing_from_kmers[None-BucketKmerTable(100000)] 9.1 ms 4.3 ms ×2.1
benchmark_indexing_from_kmers[11*11*1*1***111-BucketKmerTable(100000)] 9.1 ms 4.3 ms ×2.1
benchmark_indexing_from_kmer_selection[11*11*1*1***111-KmerTable] 10.4 ms 5.2 ms ×2
benchmark_indexing_from_kmer_selection[None-KmerTable] 10.4 ms 5.2 ms +99.93%
benchmark_indexing_from_kmer_selection[11*11*1*1***111-BucketKmerTable(100000)] 8.2 ms 4.4 ms +87.46%
benchmark_indexing_from_kmer_selection[None-BucketKmerTable(100000)] 8.1 ms 4.4 ms +86.58%
test_pickle_and_unpickle[BucketKmerTable(10000)-11*11*1*1***111] 7.3 ms 4.1 ms +75.6%
benchmark_match_kmer_selection[KmerTable-11*11*1*1***111] 433.9 µs 247.2 µs +75.54%
test_pickle_and_unpickle[BucketKmerTable(10000)-None] 7.3 ms 4.2 ms +75.5%
benchmark_match_kmer_selection[KmerTable-None] 434.4 µs 247.8 µs +75.32%
benchmark_match[KmerTable-None] 456.9 µs 303.1 µs +50.73%
benchmark_match_kmer_selection[BucketKmerTable(10000)-11*11*1*1***111] 514.5 µs 354.4 µs +45.19%
benchmark_match_kmer_selection[BucketKmerTable(10000)-None] 515.2 µs 355.2 µs +45.01%
benchmark_match[KmerTable-11*11*1*1***111] 504.7 µs 368 µs +37.12%
benchmark_match[BucketKmerTable(10000)-None] 540.1 µs 424.7 µs +27.15%
benchmark_indexing_from_sequences[None-KmerTable] 10.8 ms 8.5 ms +27.12%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing padix-key:rust-kmertable (e2219a2) with main (881eba6)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant