Skip to content

feat(fts)!: make v2 the default index format#7512

Open
BubbleCal wants to merge 3 commits into
mainfrom
yang/fts-v2-default-index-param
Open

feat(fts)!: make v2 the default index format#7512
BubbleCal wants to merge 3 commits into
mainfrom
yang/fts-v2-default-index-param

Conversation

@BubbleCal

@BubbleCal BubbleCal commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Feature

What is the new feature?

This makes FTS v2 the default format for newly-created inverted / full-text indexes and replaces the previous environment-variable switch with an explicit format_version index creation parameter.

Why do we need this feature?

FTS v2 is the latest format, but format selection should be part of the index creation API instead of process-wide environment state. Existing v1 indexes must remain queryable and must continue to be maintained as v1 after append, incremental indexing, and optimize.

How does it work?

  • Defaults new FTS index creation to v2.
  • Adds explicit format_version handling in Rust params and exposes it through Python and Java index creation APIs.
  • Preserves the stored FTS format when deriving params from existing indexes.
  • Ensures mem-wal / maintained-index paths use the resolved index format instead of the default.
  • Restores FTS format from IndexMetadata.index_version when rebuilding mem-wal FTS config, mapping legacy 0 and v1 1 to v1 and 2 to v2.
  • Updates compatibility tests so old wheels can still use LANCE_FTS_FORMAT_VERSION while new code uses explicit format_version.

Breaking Change

BREAKING CHANGE: Newly-created FTS / inverted indexes now default to v2 instead of v1. Workflows that require the v1 layout, including compatibility with older Lance readers, must pass format_version=1; LANCE_FTS_FORMAT_VERSION no longer controls new Lance index creation.

Compatibility

Existing v1 FTS indexes remain readable. The regression coverage includes explicit v1 append + optimize_indices(OptimizeOptions::append()) and verifies the resulting FTS index metadata remains v1.

Verification

  • GitHub Actions: 30/30 checks passed on commit e6a2ca759.
  • git diff --check passed locally.
  • Required local Rust / Python / Java checks could not run in this environment because cargo, uv, and a Java Runtime are not installed locally; CI covered the PR checks.

@github-actions github-actions Bot added A-python Python bindings A-index Vector index, linalg, tokenizer A-java Java bindings + JNI enhancement New feature or request and removed A-python Python bindings A-index Vector index, linalg, tokenizer A-java Java bindings + JNI labels Jun 29, 2026
@github-actions github-actions Bot added A-python Python bindings A-index Vector index, linalg, tokenizer A-java Java bindings + JNI labels Jun 29, 2026
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

@BubbleCal BubbleCal marked this pull request as ready for review June 29, 2026 14:12
@BubbleCal

Copy link
Copy Markdown
Contributor Author

@claude review

@BubbleCal BubbleCal changed the title feat(fts): make v2 the default index format feat(fts)!: make v2 the default index format Jun 29, 2026

@Xuanwo Xuanwo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mem-wal maintained-index path should have end-to-end v1 coverage before this lands. The key compatibility contract is that an existing v1 FTS index remains v1 after maintenance; if this path accidentally flushes the maintained index as v2, FTS queries may still pass while older readers lose compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer A-java Java bindings + JNI A-python Python bindings breaking-change enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants