fix: Azure/GCP infrastructure fixes (Postgres groundwork)#88
Conversation
…at instead of fatal drift
Greptile SummaryThis PR fixes several Azure and GCP infrastructure issues that were blocking the upcoming Postgres work. All changes are targeted bug fixes with accompanying regression tests.
Confidence Score: 5/5All four fixes are self-contained and well-tested; no change widens an attack surface or modifies shared state in a way that could affect unrelated resources. Each fix addresses a clearly described regression with a targeted change and a companion regression test. The GCP subnet name extraction, the Azure build heartbeat resolution, the Azure worker CNAME split, and the Terraform VNet wiring all follow existing patterns in the codebase and the snapshot diffs confirm the expected Terraform output. No files require special attention.
|
| Filename | Overview |
|---|---|
| crates/alien-infra/src/build/azure.rs | Heartbeat now lazily resolves resource_prefix, managed_environment_id, and managed_identity_id for imported builds instead of raising a non-retryable RESOURCE_DRIFT error; regression test verifies binding params are non-None after the first tick. |
| crates/alien-infra/src/network/gcp_import.rs | Extracts subnetwork_name from the subnet self-link via rsplit('/'); empty-string guard handles trailing slashes; mirrors the Azure importer's pattern. |
| crates/alien-infra/src/worker/azure.rs | Adds container_app_url to store the raw Container App ingress host separately from the potentially-overridden public url; build_outputs now targets container_app_url with fallback to url, and the field is set at create, update, and heartbeat. |
| crates/alien-infra/src/worker/azure_import.rs | Initialises container_app_url: None in the importer; the heartbeat rebuilds it on the first tick. |
| crates/alien-terraform/src/emitters/azure/container_apps_environment.rs | Adds VNet integration (infrastructure_subnet_id, internal_load_balancer_enabled=false) when a Network resource is present; infrastructure_subnet_id correctly handles both create/use-default (managed resource) and ByoVnetAzure (data source) modes. |
| crates/alien-terraform/src/emitters/azure/network.rs | Adds Microsoft.App/environments delegation block to the private subnet in create_topology (UseDefault and Create modes); BYO mode expects the pre-existing subnet to already be delegated. |
| crates/alien-infra/tests/importers.rs | New gcp_network_import_derives_subnetwork_name test verifies the importer reconstructs subnetworkName from the self-link URL. |
| crates/alien-terraform/tests/generator/azure_full_stack_tests.rs | Adds whitespace-normalised literal assertions for VNet wiring (infrastructure_subnet_id, internal_load_balancer_enabled, delegation) on top of the existing snapshot. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph GCP["GCP Network Import Fix"]
GI["GcpNetworkImporter"] -->|"subnet_self_links[0]"| SL["rsplit('/').next()"]
SL -->|"subnetwork_name"| GNC["GcpNetworkController\n(subnetwork_name set)"]
GNC --> VPC["get_vpc_access()\nDirect VPC Egress ✓"]
end
subgraph AzureBuild["Azure Build Heartbeat Fix"]
IB["Imported Build\nReady state\n(resource_prefix=None)"] --> HB["ready() heartbeat"]
HB -->|"is_none()"| RP["resource_prefix resolved\nfrom ctx"]
HB -->|"is_none()"| ME["managed_environment_id\nresolved"]
HB -->|"is_none()"| MI["managed_identity_id\nresolved"]
RP --> BP["get_binding_params()\nreturns Some(…) ✓"]
ME --> BP
MI --> BP
end
subgraph AzureWorker["Azure Worker DNS Fix"]
CAP["Container App\ncreate/update"] -->|"extract_url"| CAU["container_app_url\n(ingress host)"]
PU["public_urls override"] --> URL["url\n(public FQDN)"]
CAU --> BO["build_outputs()"]
URL -->|"fallback only"| BO
BO --> LB["LoadBalancerEndpoint\ndns_name = container_app_url ✓"]
end
subgraph TF["Azure Terraform VNet Wiring"]
NS["azurerm_subnet\n(private)"] -->|"delegation"| DEL["Microsoft.App/environments"]
CAE["azurerm_container_app_environment"] -->|"infrastructure_subnet_id"| NS
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
subgraph GCP["GCP Network Import Fix"]
GI["GcpNetworkImporter"] -->|"subnet_self_links[0]"| SL["rsplit('/').next()"]
SL -->|"subnetwork_name"| GNC["GcpNetworkController\n(subnetwork_name set)"]
GNC --> VPC["get_vpc_access()\nDirect VPC Egress ✓"]
end
subgraph AzureBuild["Azure Build Heartbeat Fix"]
IB["Imported Build\nReady state\n(resource_prefix=None)"] --> HB["ready() heartbeat"]
HB -->|"is_none()"| RP["resource_prefix resolved\nfrom ctx"]
HB -->|"is_none()"| ME["managed_environment_id\nresolved"]
HB -->|"is_none()"| MI["managed_identity_id\nresolved"]
RP --> BP["get_binding_params()\nreturns Some(…) ✓"]
ME --> BP
MI --> BP
end
subgraph AzureWorker["Azure Worker DNS Fix"]
CAP["Container App\ncreate/update"] -->|"extract_url"| CAU["container_app_url\n(ingress host)"]
PU["public_urls override"] --> URL["url\n(public FQDN)"]
CAU --> BO["build_outputs()"]
URL -->|"fallback only"| BO
BO --> LB["LoadBalancerEndpoint\ndns_name = container_app_url ✓"]
end
subgraph TF["Azure Terraform VNet Wiring"]
NS["azurerm_subnet\n(private)"] -->|"delegation"| DEL["Microsoft.App/environments"]
CAE["azurerm_container_app_environment"] -->|"infrastructure_subnet_id"| NS
end
Reviews (2): Last reviewed commit: "fix(azure-build): resolve resource_prefi..." | Re-trigger Greptile
…uilds Heartbeat resolved managed_environment_id and managed_identity_id for imported (Frozen) builds but left resource_prefix None, so get_binding_params returned None and the build could not submit jobs until an update ran. Resolve it alongside the others; the test now asserts binding params become non-None.
The Postgres resource runtime, stacked on the infra fixes (#88): the resource model and bindings, the Local (developer) controller, the cloud client SDKs for the managed Postgres backends, and the TypeScript SDK surface. The part worth careful review is how the generated DB password is handled. ## How the local DB password flows The password is generated once and has to reach a linked worker for a direct connection, without ever landing in control-plane state. So it runs through two separate channels: 1. **It is stripped from the synced binding params and never written to serialized controller state** — so it cannot reach control-plane storage or status responses. 2. It is handed to the worker at runtime through the worker's environment, resolved per request, never persisted. Step 1 is the property that matters, and review caught a real gap there: the password was reaching the synced channel. Fixed by stripping it in `get_binding_params` — the `#[serde(skip)]` on the field alone was not enough. ## What's in the layer - Resource model + bindings (`alien-core`) — the Postgres resource, its binding shapes, the heartbeat data. - The Local controller (`alien-local` + `alien-infra/src/postgres/local.rs`) — runs an embedded Postgres for local development. - Cloud client SDKs (`alien-aws-clients` / `gcp` / `azure`) — thin wrappers over the managed cloud Postgres APIs (Aurora, Cloud SQL, Flexible Server). - SDK surface (`packages/core`, `packages/sdk`) — the generated schemas and the TypeScript binding. ## How I tested - `cargo test` across the touched crates (`alien-core`, `alien-bindings`, `alien-local`, `alien-infra`), including the binding round-trip and encoding-parity tests. - The local embedded-Postgres integration test (`alien-local`). - Exercised end to end in the full Postgres cloud e2e (the setup layer, #90, stacks on this). Security walk for the password (this PR touches the secret): - Synced and persisted state never carry the password — the round-trip test asserts it is absent from the serialized binding params. - The runtime worker-env delivery is per request and not persisted. - Errors from the secret path are redacted (request body scrubbed before it can reach an error chain). - The one gap that existed (password on the synced channel) is the one this PR fixes. Nothing else turned up.
Four pre-existing Azure/GCP infrastructure bugs, surfaced while getting the Postgres cloud e2e green. They are independent of the Postgres feature but sit underneath it, so the runtime (#89) and setup (#90) stack on this PR. Splitting them out lets them land and be reviewed on their own.
What was broken, and what I did
Four small, self-contained fixes:
subnetwork_name, so VPC egress to a private PSC Cloud SQL had nothing to resolve against. It now parses the name out of the subnet self-link on import.resource_prefixunset; the controller treated those as drift and failed. The heartbeat now resolves all three from their dependencies, so an imported build can submit jobs without waiting for an update.Files touched
crates/alien-infra/src/network/gcp_import.rs— subnet name on importcrates/alien-infra/src/build/azure.rs— heartbeat resolves env, identity, prefixcrates/alien-infra/src/worker/azure.rs(+azure_import.rs) — CNAME targets the ingress hostcrates/alien-terraform/src/emitters/azure/*— Container Apps environment VNet integrationcrates/alien-infra/tests/importers.rs+ the azure generator/snapshot tests — coverageHow I tested
cargo test -p alien-infraand thealien-terraformazure generator + snapshot tests pass.Base of the stacked Postgres work; #89 (runtime) and #90 (setup) build on it. Supersedes the infrastructure portion of the original combined PR.