Enable live catalog with region-aware fallback engine#789
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR replaces the C++ SDK’s embedded/static model catalog with a live Azure Foundry catalog client, adds automatic region detection from azureml-served-by-cluster, and introduces a region-fallback engine used by both catalog queries and model-registry SAS resolution so downloads target the region that served the catalog entry.
Changes:
- Removed the static snapshot catalog + snapshot generation tool, and always build/use the live Azure catalog client.
- Added
RegionFallback+ status-aware HTTP response plumbing to support regional retries and sticky-region behavior. - Stamped models with
detected_regionand used it to choose the model registry region when resolving download SAS URIs (with configurable fallback disable).
Reviewed changes
Copilot reviewed 32 out of 33 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk_v2/cpp/tools/catalog_snapshot/main.cc | Removed the snapshot generation tool entrypoint (static catalog pipeline retired). |
| sdk_v2/cpp/tools/catalog_snapshot/CMakeLists.txt | Removed CMake target for the snapshot tool. |
| sdk_v2/cpp/test/sdk_api/shared_test_env.h | Test env now always points at the live Azure catalog URL (no static path). |
| sdk_v2/cpp/test/internal_api/static_catalog_test.cc | Removed tests that exercised the embedded static catalog client. |
| sdk_v2/cpp/test/internal_api/region_fallback_test.cc | Added unit tests for region fallback chain building, retryability, sticky behavior, and exhaustion. |
| sdk_v2/cpp/test/internal_api/model_info_test.cc | Added JSON round-trip coverage for ModelInfo.detected_region. |
| sdk_v2/cpp/test/internal_api/download_test.cc | Updated registry client tests for status-aware HTTP + fallback; added download region-resolution tests. |
| sdk_v2/cpp/test/internal_api/catalog_cache_test.cc | Added cache save/load coverage for detected_region. |
| sdk_v2/cpp/test/internal_api/c_api_test.cc | Relaxed catalog population assertion to avoid network-dependent expectations. |
| sdk_v2/cpp/test/internal_api/azure_catalog_test.cc | Expanded live catalog client tests to cover region detection, URL rewriting, stamping, and fallback behavior. |
| sdk_v2/cpp/test/CMakeLists.txt | Updated unit test sources: added region_fallback_test, removed static_catalog_test, always includes azure_catalog_test. |
| sdk_v2/cpp/src/util/region_fallback.h | New public/internal utility interface for region-aware retry/fallback execution. |
| sdk_v2/cpp/src/util/region_fallback.cc | New implementation with proximal-region mapping, retryable classification, sticky region tracking, and diagnostics. |
| sdk_v2/cpp/src/model_info.h | Added detected_region field to model metadata. |
| sdk_v2/cpp/src/model_info.cc | Implemented JSON serialization/deserialization for detected_region (omitted when empty). |
| sdk_v2/cpp/src/manager.cc | Defaulted catalog region to auto and plumbed DisableRegionFallback into download + catalog components. |
| sdk_v2/cpp/src/http/http_client.h | Introduced HttpResponse and non-throwing Http*WithResponse APIs (status + headers + body). |
| sdk_v2/cpp/src/http/http_client.cc | Implemented response-aware GET/POST (including header capture + transport failure as status==0). |
| sdk_v2/cpp/src/download/model_registry_client.h | Updated registry client API to accept per-call region and use response-aware HTTP + fallback. |
| sdk_v2/cpp/src/download/model_registry_client.cc | Implemented region-fallback-driven registry resolution and parsing of response bodies. |
| sdk_v2/cpp/src/download/download_manager.h | Added config-region tracking and disable_region_fallback plumbing. |
| sdk_v2/cpp/src/download/download_manager.cc | Resolved registry region via explicit config → detected region → eastus; wired fallback-enabled registry client. |
| sdk_v2/cpp/src/catalog/static_catalog_client.cc | Removed embedded snapshot-backed catalog implementation. |
| sdk_v2/cpp/src/catalog/live_catalog_client_stub.cc | Removed stub that previously threw when live client sources were absent. |
| sdk_v2/cpp/src/catalog/catalog_snapshot_data.h | Removed snapshot data header (embedded bytes no longer part of build). |
| sdk_v2/cpp/src/catalog/catalog_client.h | Updated contract/docs: MakeCatalogClient now always constructs live Azure client and accepts region/fallback controls. |
| sdk_v2/cpp/src/catalog/catalog_client.cc | Retained cached-model + BYO synthesis helper; removed static-vs-live dispatch logic. |
| sdk_v2/cpp/src/catalog/azure_model_catalog.h | Default catalog URL is now always Azure; added region + fallback configuration fields. |
| sdk_v2/cpp/src/catalog/azure_model_catalog.cc | Plumbed region/fallback into MakeCatalogClient and persisted fetched model infos (incl. region) to cache. |
| sdk_v2/cpp/src/catalog/azure_catalog_client.h | New live Azure catalog client interface with region detection and fallback-aware fetching. |
| sdk_v2/cpp/src/catalog/azure_catalog_client.cc | New live Azure catalog implementation: request shaping, region detection, URL rewriting, filter-set atomicity, and fallback. |
| sdk_v2/cpp/CMakeLists.txt | Removed tools option + static snapshot wiring; always builds live catalog + region fallback sources. |
636f28d to
a827ec8
Compare
…o baijumeswani/catalog
| constexpr const char* kDefaultRegion = "eastus"; | ||
|
|
||
| // The catalog and registry gateways reject requests without this User-Agent (HTTP 400). | ||
| constexpr const char* kUserAgent = "AzureAiStudio"; |
There was a problem hiding this comment.
Is now a good time to try a different value here to differentiate traffic?
There was a problem hiding this comment.
Unfortunately, no. If we change this to anything other than AzureAiStudio, we receive HTTP 400 response. The server rejects our requests. This is probably by design since they don't requests from random User-Agents.
Will work with the catalog team and update this in a separate commit.
| constexpr const char* kUserAgent = "AzureAiStudio"; | ||
|
|
||
| /// Strip leading/trailing whitespace. | ||
| std::string Trim(const std::string& s) { |
There was a problem hiding this comment.
Should we put all string utils in a single file under src/util? Could move to_lower from src/utils.h and maybe MakeString from src/util/make_string.h into that as well.
| const std::string url = regional && pinned_region | ||
| ? BuildRegionalUrl(url_prefix_, url_suffix_, *pinned_region) | ||
| : BuildRequestUrl(base_url_, regional, region_, url_prefix_, url_suffix_); | ||
| FL_THROW(FOUNDRY_LOCAL_ERROR_INTERNAL, |
There was a problem hiding this comment.
Should we have a different status for this? I would typically use 'internal' for some implementation logic failure (i.e. our code is broken or did something unexpected), and I wonder if this failure is more likely to be network related.
| ILogger& logger) | ||
| ILogger& logger, bool disable_region_fallback) | ||
| : cache_directory_(std::move(cache_directory)), | ||
| config_region_(NormalizeConfiguredRegion(catalog_region)), |
There was a problem hiding this comment.
nit: If we're not std::move'ing catalog_region now should we take it by reference or maybe as a std::string_view given we're calling to_lower on it?
There was a problem hiding this comment.
Should we have 'real' tests in test/sdk_api for any of the new changes?
Replaces the static, file-based model catalog with a live Azure Foundry catalog client, and makes both catalog queries and model downloads region-aware. The client now detects the caller's Azure region from cluster headers, queries that region's catalog endpoint, and transparently retries against nearby regions when an endpoint is unhealthy. Downloads resolve their SAS URI from the same region that served the model's catalog entry.
This ports three behaviors from the C# core into the C++ SDK:
azureml-served-by-clusterheaders (instead of always usingeastus).How it works
catalog_region:""/"auto"→ detect from headers; any other value → explicit override.DisableRegionFallback: disables cross-region retries (single-region behavior).