Skip to content

Core: Allow InMemoryCatalog instances to share a JVM-wide store#16841

Draft
RussellSpitzer wants to merge 3 commits into
apache:mainfrom
RussellSpitzer:core-inmemory-shared-state
Draft

Core: Allow InMemoryCatalog instances to share a JVM-wide store#16841
RussellSpitzer wants to merge 3 commits into
apache:mainfrom
RussellSpitzer:core-inmemory-shared-state

Conversation

@RussellSpitzer

@RussellSpitzer RussellSpitzer commented Jun 17, 2026

Copy link
Copy Markdown
Member

Why

Some test harnesses (and other host runtimes) build multiple InMemoryCatalog objects that should logically refer to the same catalog — for example, a runtime that constructs a separate validation catalog alongside its primary one, or that clones a session and rebuilds its catalog manager. With the existing per-instance state, those objects don't see each other's writes.

This PR adds a way for two instances to share a single in-memory backing store within the JVM.

What

  • New catalog property in-memory-catalog.shared-store-id (SHARED_STORE_ID). When two InMemoryCatalog instances are initialized with the same value they share namespaces, tables, views, and the synchronization monitor used for writes. When unset, each instance keeps a private store, exactly as before. Empty and whitespace-only values are rejected so that config layers that map a missing property to "" cannot silently flip every catalog into shared mode.
  • Per-instance maps now live in a private Store holder; the catalog holds a volatile Store store reference for safe publication when initialize() selects a fresh or shared store.
  • Mutating operations synchronize on the Store instance instead of this, so all catalogs pointing at the same shared store serialize through the same monitor. The class Javadoc calls this out: the catalog instance's intrinsic lock is no longer the synchronization root.
  • initialize() is safe to call again on the same instance: it closes the previous CloseableGroup (and the FileIO / metrics reporter it holds) and reselects the store from the new properties. Re-initialization concurrent with in-flight catalog operations is not supported and is documented on the Javadoc.
  • close() clears the maps only when the instance owns a private store; for shared stores it leaves state intact so other instances continue to observe it.
  • New public static clearSharedStore(String) lets harnesses release a shared store between runs. Calling it on an active id leaves any live instances pointing at the orphaned store; new instances initialized with the same id get a fresh one.
  • A single childNamespacesOf(Store, Namespace) stream helper backs both dropNamespace and listNamespaces(Namespace) so the two cannot drift on prefix-collision edge cases (a.b vs a.bb.c in particular).

Tests

TestInMemoryCatalog adds:

  • Bidirectional sharing of namespaces and tables (forward and reverse, separate tests).
  • Bidirectional sharing of views.
  • Sharing of renameTable, setProperties, and removeProperties (one test each).
  • dropNamespace empty-check honored across instances sharing a store.
  • dropNamespaceRejectsSiblingPrefixCollisions covering a.b vs a.bb.c.
  • closeOnPrivateInstanceClearsState (asserts on the closed instance directly) and closeOnSharedInstancePreservesStore.
  • clearSharedStoreRemovesStateForFutureInstances, clearSharedStoreLeavesLiveInstancesPointingAtTheirStore, and clearSharedStoreIsNoOpForUnknownId.
  • emptyStringSharedStoreIdIsRejected and whitespaceSharedStoreIdIsRejected for the new validation contract.
  • reinitializeWithoutSharedStoreIdSwitchesToPrivateStore and reinitializeAcrossSharedStoreIdsRetargetsBackingStore for the in-place re-init transitions.
  • concurrentInitializeWithSameSharedStoreIdConvergesOnOneStoreCountDownLatch race so threads block until released, exercising computeIfAbsent.
  • concurrentCreateOfSameTableThroughSharedStoreSerializes — multiple instances race to create the same table identifier; exactly one wins, the rest see AlreadyExistsException. Captures any unexpected runtime exception in an AtomicInteger and asserts it stays zero, so a CommitFailedException (or anything else) surfaces with a clear diagnostic instead of being swallowed by Future.get().

The inherited CatalogTests suite continues to pass.

Compatibility

  • API: only additions. New public constant SHARED_STORE_ID and new public static method clearSharedStore(String). revapi is clean.
  • Behavior: instances that don't set SHARED_STORE_ID behave exactly as before — the shared-store path is fully opt-in.

Validation

  • ./gradlew :iceberg-core:test passes.
  • ./gradlew :iceberg-core:revapi clean.
  • ./gradlew :iceberg-core:checkstyleMain :iceberg-core:checkstyleTest clean.
  • ./gradlew :iceberg-core:spotlessCheck clean.

Follow-up

A separate Spark v4.1 test PR (#16839) will adopt this property to share a backing store between Spark's catalog and the validation catalog. That PR is currently held; it will be rebased onto this once it merges.

Add a new catalog property `in-memory-catalog.instance-id`. When two
InMemoryCatalog instances are initialized with the same value, they
share namespaces, tables, views, and the synchronization monitor used
by writes. When unset, each instance keeps a private store as before.

This is useful for harnesses that build multiple catalog objects which
should logically refer to the same catalog (for example, a host
runtime that builds a separate validation catalog alongside its
primary one).

Internal changes:
- Move the per-instance maps into a private `Storage` holder; hold a
  `volatile Storage` field so re-initialization is safe to publish.
- Lock on the `Storage` instance instead of `this`, so all instances
  pointing at the same shared store serialize through the same monitor.
- Make `initialize()` idempotent: any subsequent call resets the
  storage based on whether `INSTANCE_ID` is present.
- `close()` now only clears the in-memory maps when the instance owns
  a private store; for shared stores it leaves state intact so other
  instances continue to observe it. The new behavior is documented on
  `close()`.
- A package-private `clearInstanceState(String)` (annotated
  `@VisibleForTesting`) drops a shared store explicitly.

Tests cover bidirectional sharing of namespaces, tables, and views;
isolation across distinct instance ids; isolation between two
catalogs without an instance id; that `close()` clears private state
but preserves shared state; that `clearInstanceState` drops the store
and is a no-op for unknown ids; that `initialize()` is idempotent
when the property is removed; and that concurrent initializers
racing on the same id converge on a single shared store.
- Rename `INSTANCE_ID` -> `SHARED_STORE_ID` and the property string to
  `in-memory-catalog.shared-store-id`. The value identifies a store
  multiple instances opt into, not "this instance"; renaming aligns the
  constant with the implementation (`Store`/`SHARED_STORES`/`store`).
- Rename `clearInstanceState` -> `clearSharedStore` and make it
  `public` (still `@VisibleForTesting`). The previous package-private
  visibility was unreachable from consumers in other packages, which
  made the combination of public property + package-private cleanup
  inconsistent.
- Move the ownership flag off the inner `Store` holder onto the
  catalog as `private boolean ownsStore`, so `Store` does not encode
  consumer-facing lifecycle semantics.
- Close any prior `closeableGroup` in `initialize()` before swapping
  state, fixing a leak when the catalog is re-initialized in place.
- Make the volatile-snapshot pattern internally consistent:
  `listTables`, `listViews`, and `listNamespaces(Namespace)` capture a
  single `Store currentStore = store` and use it for both the
  existence check and the listing. `dropNamespace` no longer recurses
  back through the public list APIs (which would re-read the volatile
  field); it counts entries directly on the captured snapshot.
- `close()` is null-safe on `closeableGroup` and uses the new
  `ownsStore` field to decide whether to clear maps.
- Restore the `// List only the child-namespaces roots.` comment that
  was deleted by accident.

Tests:
- Rename `InMemoryCatalog.INSTANCE_ID` references and the
  `clearInstanceState` calls accordingly.
- Drop the `final` modifier on a local variable.
- Split `closeClearsPrivateStateButPreservesSharedState` into two
  separate tests (`closeOnPrivateInstanceClearsState`,
  `closeOnSharedInstancePreservesStore`).
- Split the bidirectional sharing test into two
  (`sharedStoreSharesNamespacesAndTablesFromFirstToSecond`,
  `sharedStoreSharesWritesFromSecondBackToFirst`).
- Rename the misleading `initializeIsIdempotentWhenInstanceIdRemoved`
  test to `reinitializeWithoutSharedStoreIdSwitchesToPrivateStore` —
  what it actually exercises is a storage-mode swap.
- Replace `concurrentInitializeWithSameInstanceIdSharesStorage` with
  a `CountDownLatch`-driven race that all threads block on before
  calling `initialize()`, so the test actually exercises
  `computeIfAbsent`. Cleanup uses the project's `boolean threw`
  pattern.
- Add `concurrentCreateOfSameTableThroughSharedStoreSerializes` —
  multiple instances race to `create` the same table identifier and
  exactly one wins with `AlreadyExistsException` for the rest. This
  pins down the "instances pointing at the same store serialize
  through the same monitor" claim.
- Add coverage for `renameTable`, `setProperties`, `removeProperties`,
  and `dropNamespace` (with the empty-check enforced) under shared
  storage.
- Add `clearSharedStoreLeavesLiveInstancesPointingAtTheirStore` to
  document the live-instance/orphan semantics.
- Add `emptySharedStoreIdValuePartitionsItsOwnStore` to lock down the
  empty-string id handling.
Fixes the build-checks (17) checkstyle failure introduced in round 2 and
applies follow-on review feedback.

InMemoryCatalog
- Drop @VisibleForTesting on `clearSharedStore`. The annotation+`public`
  combination tripped the project's RegexpMultiline rule (@VisibleForTesting
  members must be package-private). The method's intended use is from
  consumers in other packages (the Spark follow-up at apache#16839), so it is
  legitimate public API. Updated the Javadoc to match.
- Reject empty/whitespace-only `SHARED_STORE_ID` values in `initialize()`
  with `Preconditions.checkArgument`. Config layers that map a missing
  property to `""` would otherwise flip every catalog into shared mode
  silently.
- Document that `initialize()` is repeatable in place (closes the
  previous CloseableGroup before reselecting the store) and that there
  is no thread-safety guarantee for re-init concurrent with in-flight
  operations.
- Document on the class Javadoc that mutating operations synchronize on
  the backing store, not the catalog instance, so the catalog's
  intrinsic lock is not the synchronization root.
- Move `closePreviousCloseableGroup()` (renamed from `closePreviousResources`)
  to run before store selection in `initialize()`. The "close old, open
  new" idiom now reads in order.
- Replace `countChildNamespaces` with a shared `childNamespacesOf` stream
  helper used by both `dropNamespace` and `listNamespaces(Namespace)` so
  the two cannot drift apart on prefix-collision edge cases. Switches
  from string-prefix matching (`DOT.join(...).startsWith(prefix)`) to
  array-prefix matching, which is robust against e.g. `a.b` vs `a.bb.c`.
- Add `Store.clear()` so `close()` reads as a single intent rather than
  three separate `.clear()` calls.

TestInMemoryCatalog
- Replace `emptySharedStoreIdValuePartitionsItsOwnStore` with
  `emptyStringSharedStoreIdIsRejected` and `whitespaceSharedStoreIdIsRejected`,
  matching the new `initialize()` contract.
- Fix `closeOnPrivateInstanceClearsState` to actually exercise the
  close-clears path: it now asserts on the *closed* instance directly,
  not on a freshly reopened one.
- Split `sharedStoreSharesRenameTableAndNamespacePropertyEdits` into
  three independent tests (`sharedStoreSharesRenameTable`,
  `sharedStoreSharesSetProperties`, `sharedStoreSharesRemoveProperties`).
- Add `dropNamespaceRejectsSiblingPrefixCollisions` covering `a.b` vs
  `a.bb.c` for the new `childNamespacesOf` helper.
- Add `reinitializeAcrossSharedStoreIdsRetargetsBackingStore` covering
  the shared-A → shared-B `initialize()` transition (the previous test
  only covered shared → private).
- `concurrentCreateOfSameTableThroughSharedStoreSerializes` now records
  unexpected exceptions in an `AtomicInteger` and asserts it stays at
  zero, so a CommitFailedException or similar surfaces with a clear
  diagnostic instead of being swallowed by `Future.get()`.
- Extract `@AfterEach`-driven `newTrackedCatalog` / `reservedSharedStoreId`
  helpers and use them throughout the new tests, removing repeated
  `try { ... } finally { close(); clearSharedStore(); }` boilerplate.
- Capitalize `ID` consistently in test descriptions.
- Import `java.util.concurrent.TimeoutException` and
  `java.util.Collections` instead of fully-qualifying inline.
- Harmonize lambda catch-variable names in
  `concurrentCreateOfSameTableThroughSharedStoreSerializes`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant