fix: improve FM index query performance by jackye1995 · Pull Request #7507 · lance-format/lance

jackye1995 · 2026-06-28T15:49:46Z

Summary:

Improve FM contains queries so normal reads demand-load needed wavelet blocks and page nearby blocks instead of prewarming whole partitions.
Add chunked explicit FM prewarm, parallel partition loading/search, and resumable partition builds.
Add FM contains benchmark and FM index management tooling.

Optimizations applied:

Parallel FM search across partitions/segments.
Removed query-time full-partition prewarm; explicit prewarm still fully warms the index.
Chunked contiguous wavelet-row prewarm with LANCE_FMINDEX_PREWARM_CHUNK_BYTES and LANCE_FMINDEX_PREWARM_CHUNK_CONCURRENCY.
Cold-read demand paging with LANCE_FMINDEX_DEMAND_PAGE_BYTES to reduce object-store RPS.
Parallelized FM metadata/partition loading.
Added resumable FM partition creation and explicit --index-uuid recovery support.
Final benchmark layout: 1 logical FM segment, LANCE_FMINDEX_PARTITION_ROWS=100000, large partition-byte cap, 1,000 partitions.

Performance:
Dataset: 100M-row az://datasets/mmlb/mmlb_100m_fts_en_fm_20260626.lance.
Query workload: 4 sampled 5-term patterns from summary_in_image, contains(full_content, pattern), k=100, _rowid only (projection=[], row_id=true).

Run	Index layout	Prewarm	Query result
Baseline	6 segments / 10k partitions	6,525s / 108.8m	1t: 1.58 qps, mean 633ms, p95 768ms; 8t: 12.48 qps, mean 320ms, p95 320ms
Demand-load + chunked prewarm	6 segments / 10k partitions	2,182s / 36.4m, 3.0x faster	1t: 5.38 qps, mean 185ms, p95 307ms; 8t: 12.12 qps, mean 326ms, p95 330ms
Single segment	1 segment / 100k-row partitions	356s / 5.94m, 18.3x faster than baseline	1t: 22.38 qps, mean 44.6ms, p95 54.4ms; 8t: 84.40 qps, mean 42.8ms, p95 47.3ms

Index size:

Final FM index UUID 78600545-625f-40e2-8790-204f35097ec0: 1,000 partition files, 920,064,767,792 bytes / 856.9 GiB / 0.837 TiB.
Previous 6-segment FM layout was 920,202,650,467 bytes / 857.0 GiB, so the final relayout is effectively size-neutral.

codecov · 2026-06-28T16:38:56Z

Codecov Report

❌ Patch coverage is 56.38629% with 560 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/bin/fm_contains_bench.rs	0.00%	379 Missing ⚠️
rust/lance/src/bin/fm_index_tool.rs	0.00%	98 Missing ⚠️
rust/lance-index/src/scalar/fmindex.rs	89.71%	45 Missing and 38 partials ⚠️

📢 Thoughts on this report? Let us know!

Xuanwo · 2026-06-29T08:18:53Z

 ) -> Result<()> {
    let texts = std::mem::replace(partition, Vec::with_capacity(max_rows.min(PARTITION_SIZE)));
    *partition_bytes = 0;
+    if let Some(file) = completed_files.remove(&partition_id) {


Reusing an existing partition solely by partition_id can publish stale data. If a retry uses the same UUID after the input or partition sizing changed, this drops the freshly built texts and returns the old file; the loader later scans all part_*_fm.lance files in the directory, so stale partitions can still be included in exact contains results.

Xuanwo · 2026-06-29T08:18:53Z

+
+    #[arg(
+        long,
+        default_value = "az://datasets/mmlb/mmlb_100m_fts_en_fm_20260626.lance"


This write-capable tool defaults to a real Azure dataset. With ambient credentials, running fm_index_tool drop or create without --uri can delete or replace the shared index because those actions call drop_index / .replace(true).

Xuanwo

Thank you!

jackye1995 added 9 commits June 28, 2026 08:48

perf(index): parallelize FMIndex query path

108c46b

perf: add direct FM contains benchmark

3c1e57a

fix: parse FM benchmark thread list

9dd9f3e

fix: demand load fm index queries

4a7e2f7

fix: prewarm fm index partitions in chunks

e2c3107

fix: tune fm index prewarm chunks

159f6d8

fix: reduce fm index cold read rps

1690b21

chore: add fm index management tool

d2ddc94

fix: resume fm index partition builds

18e2ee3

github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 28, 2026

fix: gate fm index tool behind cli feature

0a43448

Xuanwo reviewed Jun 29, 2026

View reviewed changes

fix: harden fm index resume and rank boundary

e9f4ede

Xuanwo approved these changes Jun 29, 2026

View reviewed changes

jackye1995 merged commit 6d02a57 into lance-format:main Jun 29, 2026
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve FM index query performance#7507

fix: improve FM index query performance#7507
jackye1995 merged 11 commits into
lance-format:mainfrom
jackye1995:jack/fix-fmindex-query-performance

jackye1995 commented Jun 28, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 28, 2026 •

edited

Loading

Uh oh!

Xuanwo Jun 29, 2026

Uh oh!

Xuanwo Jun 29, 2026

Uh oh!

Xuanwo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jackye1995 commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xuanwo Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jackye1995 commented Jun 28, 2026 •

edited

Loading

codecov Bot commented Jun 28, 2026 •

edited

Loading