Skip to content

bundle: read deployment state from DMS (experimental.record_deployment_history)#5355

Open
shreyas-goenka wants to merge 29 commits into
mainfrom
shreyas-goenka/cli-state-read-from-dms
Open

bundle: read deployment state from DMS (experimental.record_deployment_history)#5355
shreyas-goenka wants to merge 29 commits into
mainfrom
shreyas-goenka/cli-state-read-from-dms

Conversation

@shreyas-goenka

@shreyas-goenka shreyas-goenka commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Reads the direct engine's resource state from the deployment metadata service
(DMS) when experimental.record_deployment_history is enabled.

This is a no-op unless the experimental flag is on. With the flag off, state is
read entirely from the local resources.json, exactly as before.

How it works

State has two parts: identity (lineage + serial, always from
resources.json) and resources (the deployed set). The lineage is the DMS
deployment id, so a later deploy must reuse it rather than mint a new one.

The DMS read is folded directly into StateDB.Open — no separate reader
abstraction:

func (db *DeploymentState) Open(ctx, path, withRecovery, withWrite, dmsClient sdkbundle.BundleInterface) error
  • dmsClient == nil → file-only behavior (WAL recovery + migration), unchanged.
  • dmsClient != nil and the file has a lineage → DMS is the source of truth:
    Open keeps the file's identity but overlays the resource set read from DMS
    (ListResourcesAll, keys re-prefixed with resources.).

DMS is trusted only once a version has completed successfully
(deploymentHasSuccessfulVersion). If the flag was just enabled on an existing
direct deployment, or the initial DMS deploy failed, there is no successful
version yet, so Open falls back to the local file and existing resources are
neither re-created nor lost. The version check pages through ListVersions
(newest-first) and stops at the first success, so it does not fetch the whole
version history; denormalizing this onto the deployment to avoid the listing
entirely is a planned API-side follow-up.

cmd/bundle/utils/process.go passes the client (b.WorkspaceClient(ctx).Bundle)
only when the flag is set; reads open the state write-disabled, so no lineage is
minted on read.

Review notes

This revision addresses review feedback to drop the StateReader interface and
its readLocalDatabase, which duplicated StateDB.Open. The version-completion
gate and resource fetch now live in bundle/direct/dstate/dms.go.

Tests

  • Unit tests in bundle/direct/dstate/dms_test.go cover the Open selection
    (DMS-owned vs file fallback vs nil client), the version-completion gate, and
    the resource mapping.
  • Acceptance test acceptance/bundle/dms/read seeds a completed DMS version +
    resources and asserts the plan derives create/update/delete from DMS state.

shreyas-goenka added a commit that referenced this pull request May 28, 2026
The deadcode lint check flagged this as unreachable on PR #5355's CI.
ManagedState returned (string, bool) but no caller existed — every consumer
uses IsManagedState (the bool wrapper). Drop it; reintroduce only if a
caller needs the raw value.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented May 28, 2026

Copy link
Copy Markdown
Collaborator

Commit: f50dfc0

Run: 27268974849

Env 🟨​KNOWN 🔄​flaky 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 15 261 929 10:10
💚​ aws windows 7 15 263 927 12:14
💚​ aws-ucws linux 7 15 357 843 17:30
💚​ aws-ucws windows 7 15 359 841 12:17
💚​ azure linux 1 17 264 927 9:49
💚​ azure windows 1 17 266 925 11:33
🔄​ azure-ucws linux 1 1 17 361 839 15:14
💚​ azure-ucws windows 1 17 364 837 17:56
💚​ gcp linux 1 17 260 930 10:18
💚​ gcp windows 1 17 262 928 14:59
23 interesting tests: 15 SKIP, 7 KNOWN, 1 flaky
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 💚​R 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🔄​ TestFetchRepositoryInfoAPI_FromRepo ✅​p ✅​p ✅​p ✅​p ✅​p ✅​p 🔄​f ✅​p ✅​p ✅​p
Top 44 slowest tests (at least 2 minutes):
duration env testname
6:34 aws windows TestAccept
6:22 gcp windows TestAccept
5:58 azure-ucws windows TestAccept
5:43 azure windows TestAccept
5:38 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
5:09 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
5:07 aws-ucws windows TestAccept
4:54 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:38 aws-ucws linux TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct
4:35 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:29 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:59 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:53 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:46 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:45 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=terraform
3:36 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:30 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:28 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:26 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:15 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:12 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:11 aws-ucws linux TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform
3:02 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:59 azure-ucws windows TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=direct
2:56 aws-ucws linux TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform
2:54 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure linux TestAccept
2:47 azure-ucws linux TestAccept
2:45 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:44 aws-ucws linux TestAccept
2:44 gcp linux TestAccept
2:41 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:40 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:34 azure-ucws windows TestAccept/bundle/generate/auto-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:34 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 azure-ucws windows TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=terraform
2:29 gcp windows TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:25 azure-ucws windows TestAccept/bundle/destroy/jobs-and-pipeline/DATABRICKS_BUNDLE_ENGINE=direct
2:24 azure-ucws windows TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=terraform
2:21 azure-ucws windows TestAccept/bundle/resources/model_serving_endpoints/basic/DATABRICKS_BUNDLE_ENGINE=direct
2:15 aws-ucws linux TestAccept/bundle/resources/volumes/recreate/DATABRICKS_BUNDLE_ENGINE=direct
2:14 azure-ucws windows TestAccept/bundle/deploy/files/no-snapshot-sync/DATABRICKS_BUNDLE_ENGINE=direct
2:04 azure-ucws linux TestAccept/bundle/deployment/bind/job/generate-and-bind/DATABRICKS_BUNDLE_ENGINE=terraform
2:03 aws-ucws linux TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct

@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/cli-state-read-from-dms branch from 0e706b8 to 71a41d9 Compare May 31, 2026 23:44
@shreyas-goenka shreyas-goenka changed the title [bundle] Read deployment state from Deployment Metadata Service (step 4) bundle/statemgmt: add StateReader abstraction for file and DMS state May 31, 2026
@shreyas-goenka shreyas-goenka changed the base branch from shreyas-goenka/bundle-dms-lock-impl to main May 31, 2026 23:45
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/cli-state-read-from-dms branch from 71a41d9 to 3a3613e Compare June 1, 2026 15:28
@shreyas-goenka shreyas-goenka changed the title bundle/statemgmt: add StateReader abstraction for file and DMS state bundle: read deployment state from DMS (gated by experimental.record_deployment_history) Jun 1, 2026
@shreyas-goenka shreyas-goenka changed the base branch from main to shreyas-goenka/bundle-dms-implementation June 1, 2026 15:28
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch from 1e8ba45 to 76bf017 Compare June 1, 2026 15:31
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/cli-state-read-from-dms branch from 3a3613e to 825f83a Compare June 1, 2026 15:39
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/bundle-dms-implementation branch 5 times, most recently from 2faa22b to 063dad9 Compare June 1, 2026 16:46
@shreyas-goenka shreyas-goenka force-pushed the shreyas-goenka/cli-state-read-from-dms branch from 5833370 to a9c126f Compare June 1, 2026 16:50
Introduce a StateReader interface with two implementations that populate the
direct engine's resource-state DB:

- file-based, reading the local resources.json (delegates to
  DeploymentState.Open, preserving WAL recovery + migration)
- DMS-based, reading from the deployment metadata service via the SDK
  Bundle.ListResourcesAll

This is a self-contained, drop-in abstraction with unit tests; it is not yet
wired into the deploy path. Integration (selecting the reader by managed-state
and sourcing the deployment ID) follows once the DMS lock and op-reporting PRs
land.

Co-authored-by: Isaac
Wire the StateReader into the direct-engine read path. NewStateReader selects
the DMS reader when experimental.record_deployment_history is enabled and a
prior deployment exists (lineage recorded in resources.json), otherwise the
local file reader. process.go replaces the direct StateDB.Open call with the
selector, so plan/deploy/summary read resource state from the deployment
metadata service under the flag.

The deployment ID is the state lineage (matching the lock package), read from
the local resources.json; with no lineage yet (first deploy) there is nothing
in DMS, so the local file reader is used.

Co-authored-by: Isaac
When record_deployment_history is enabled on an existing direct deployment, the
lineage is already in resources.json but DMS has no resources for it. Reading an
empty set would make the plan re-create every resource. The DMS reader now keeps
resources.json's resources until DMS has its own (recorded by the next deploy).
…sfully

When record_deployment_history is on and resources.json has a lineage, DMS is
treated as authoritative only if the deployment has a version that completed
successfully. Otherwise — DMS not initialized for this deployment yet, or the
initial DMS deploy failed — defer to the local file so existing resources are
neither re-created nor lost. Replaces the earlier empty-resource-set heuristic.
…ersion

Add CreateVersion/CompleteVersion/ListVersions handlers so tests can mark a
deployment's version completed, and have the dms/read test record a successful
version so DMS is the authoritative source for the read.
Comment thread bundle/statemgmt/statereader.go Outdated
//
// Deployment state has two parts:
//
// - Identity: the lineage (the deployment id) and serial. These always live in

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we integrate with the worksapce asset platform then the identity will no longer come from resources.json. It'll instead from directly from workspace.state_path configured for the bundle - which would point to a deployment entity.

Comment thread bundle/statemgmt/statereader.go Outdated
// did not complete successfully, DMS state is absent or partial and callers
// should fall back to the local file.
func deploymentHasSuccessfulVersion(ctx context.Context, client sdkbundle.BundleInterface, deploymentID string) (bool, error) {
versions, err := client.ListVersionsAll(ctx, sdkbundle.ListVersionsRequest{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this client-side and not server-side? Will this work when I have 10k versions deployed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list API returns in reverse order w.r.t version ID (so latest version first).

But your point is completely fair. We should denormalize this into the deployment so we prevent this listing.

Do you think that is a blocker or can we followup once we have the API side changes? It'll be a relatively fast followup (needs a bit of design work)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this API is paginated, can we process it page by page?

Comment thread bundle/statemgmt/statereader.go Outdated

// readLocalDatabase parses the local resources.json file. A missing file yields
// an empty database (no lineage), which callers read as "nothing deployed yet".
func readLocalDatabase(path string) (dstate.Database, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it reimplements StateDB.Open, why?

Address review: drop the StateReader interface and its readLocalDatabase
duplicate of StateDB.Open. Open now takes a deployment-metadata-service
client (nil = file-only) and overlays DMS resource state on top of the
local identity (lineage/serial) when DMS owns the deployment. The
version-completion gate and resource fetch move into the dstate package.

Co-authored-by: Isaac
Address review: deploymentHasSuccessfulVersion no longer materializes the
full version history. Versions are listed newest-first, so iterating page
by page and stopping at the first successful version typically reads just
one page even for deployments with thousands of versions. The resource
fetch builds its map from the paginated iterator as well.

Co-authored-by: Isaac
// overlayDMSState replaces the file-derived resource state with the state
// recorded in the deployment metadata service, when DMS owns this deployment.
// The caller holds db.mu and has already populated db.Data from the file.
func (db *DeploymentState) overlayDMSState(ctx context.Context, client sdkbundle.BundleInterface) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a serial based comparision and override here as a followup.

Git Bash rewrites leading-slash arguments like /api/2.0/... to
C:/Program Files/Git/api/..., so the raw DMS api calls hit no stub on
Windows. Prefix them with MSYS_NO_PATHCONV=1, matching acceptance/cmd/api.

Co-authored-by: Isaac
Self-review follow-ups: restore the note that an authoritative DMS
resource set is trusted even when empty (lost when the reader moved into
dstate), and add a unit test asserting the version scan stops at the
first success rather than consuming the full paginated list.

Co-authored-by: Isaac
Collapse the four TestOpenWithDMS subtests into a table asserting the
resulting key->ID map, and fold the standalone early-exit test into the
version-gate table as a wantNexts column, dropping the iterator injection
hook from the fake client.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants