feat(input-mapping): load workspace tables via input-mapping-load endpoint (AJDA-2841)#511
feat(input-mapping): load workspace tables via input-mapping-load endpoint (AJDA-2841)#511odinuv wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the input-mapping table loading flow to use a two-phase “queue + completion” API and switches workspace table loads to the new workspaces/{id}/input-mapping-load endpoint so the server resolves per-table load type (CLONE/VIEW/COPY) from the input-mapping Configuration\Table payload.
Changes:
- Switch workspace loads from the legacy
/loadpayload builders to passing the parsed (snake_case) IM table configuration intoqueueInputMappingLoad(). - Introduce
TableLoadQueueInterfaceplus concrete queue types (WorkspaceLoadQueue,TableExportQueue) and refactor strategies/Reader toprepareAndExecuteTableLoads()+waitForTableLoadCompletion(). - Normalize legacy
use_viewintoload_typein configuration parsing and update tests accordingly; bumpkeboola/storage-api-clientto^18.9.0.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/input-mapping/src/Table/Strategy/AbstractWorkspaceStrategy.php | Switch workspace job enqueueing to queueInputMappingLoad() and refactor completion to queue-based materialization. |
| libs/input-mapping/src/Table/Strategy/WorkspaceLoadQueue.php | Implement TableLoadQueueInterface; stamp queue with strategy identity + destination. |
| libs/input-mapping/src/Table/Strategy/TableLoadQueueInterface.php | New interface describing job ids, tables, and queue provenance. |
| libs/input-mapping/src/Table/Strategy/TableExportQueue.php | New queue type for file-based exports (Local/S3/ABS). |
| libs/input-mapping/src/Table/Strategy/AbstractStrategy.php | Introduce unified completion phase (waitForTableLoadCompletion) that awaits jobs and builds Result. |
| libs/input-mapping/src/Table/Strategy/Local.php | Convert Local table export to queue-based async export + completion download + manifest write. |
| libs/input-mapping/src/Table/Strategy/S3.php | Convert S3 export to queue-based async export + completion manifest write with federated token. |
| libs/input-mapping/src/Table/Strategy/ABS.php | Convert ABS export to queue-based async export + completion manifest write with federated token. |
| libs/input-mapping/src/Table/StrategyInterface.php | Replace downloadTable/handleExports with prepareAndExecuteTableLoads/waitForTableLoadCompletion. |
| libs/input-mapping/src/Reader.php | Refactor table download to start+finish via queue and add strategy-mismatch guard on completion. |
| libs/input-mapping/src/Table/Options/InputTableOptions.php | Add workspace-load configuration passthrough and shared adaptive changed_since resolution helper. |
| libs/input-mapping/src/Configuration/Table.php | Normalize legacy use_view into load_type (uppercase) and omit use_view from parsed output. |
| libs/input-mapping/composer.json | Bump keboola/storage-api-client to ^18.9.0 (needed for new workspace load endpoint / exporter API). |
| libs/input-mapping/azure-pipelines.tests.yml | Modifies pipeline test commands (currently masking failures). |
| libs/input-mapping/tests/Table/Strategy/WorkspaceLoadQueueTest.php | New unit tests for WorkspaceLoadQueue behavior. |
| libs/input-mapping/tests/Table/Strategy/TableExportQueueTest.php | New unit tests for TableExportQueue behavior and job-id key casting. |
| libs/input-mapping/tests/Table/Strategy/LocalStrategyTest.php | Update Local strategy tests to assert queue-based export behavior. |
| libs/input-mapping/tests/Table/Strategy/S3StrategyTest.php | Update S3 strategy tests to assert queue-based export behavior. |
| libs/input-mapping/tests/Table/Strategy/BigQueryTest.php | Update BigQuery tests to assert prepareTableLoadsToWorkspace() instruction output. |
| libs/input-mapping/tests/Table/Strategy/AbstractWorkspaceStrategyTest.php | Update/replace unit coverage to match queue completion behavior + explicit VIEW handling. |
| libs/input-mapping/tests/Table/Strategy/AbstractFileStrategyTest.php | Update abstract file strategy test to new queue-based API. |
| libs/input-mapping/tests/Table/Options/InputTableOptionsTest.php | Update tests for load_type and legacy use_view normalization behavior. |
| libs/input-mapping/tests/Configuration/TableConfigurationTest.php | Update table config normalization expectations (drop use_view, add load_type cases). |
| libs/input-mapping/tests/Table/TableDefinitionResolverTest.php | Update expected table definition output to remove use_view. |
| libs/input-mapping/tests/ReaderTest.php | Update Reader tests for queue-based API + add strategy mismatch coverage. |
| libs/input-mapping/tests/Helper/RealDevStorageTableRewriteHelperTest.php | Update expected normalized table definitions (no use_view). |
| libs/input-mapping/tests/Helper/FakeDevStorageTableRewriteHelperTest.php | Update expected normalized table definitions (no use_view). |
| libs/input-mapping/tests/Functional/DownloadTablesWorkspaceSnowflakeTest.php | Update functional assertions for unified workspaceLoad job + metrics merging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!$plan->preserve) { | ||
| $this->logger->info('Cleaning workspace and loading tables.'); | ||
| } | ||
| if ($cloneCount > 0) { | ||
| $this->logger->info(sprintf('Cloning %s tables to workspace.', $cloneCount)); | ||
| } | ||
| if ($copyCount > 0) { | ||
| $this->logger->info(sprintf('Copying %s tables to workspace.', $copyCount)); | ||
| } | ||
| $this->logger->info(sprintf('Loading %s tables to workspace.', count($allInputs))); |
…point (AJDA-2841) Delegate workspace table loading to the SAPI workspaces input-mapping-load endpoint, which accepts the keboola/input-mapping Configuration\Table payload (snake_case) unchanged and resolves the load type per item server-side (CLONE > VIEW > COPY). executeTableLoadsToWorkspace now hands each table its parsed configuration as-is instead of pre-deciding the load type and translating to the SAPI camelCase /load shape. The adaptive changed_since marker still cannot be resolved by the endpoint, so it is resolved client-side to a concrete unix timestamp (extracted into a shared resolveAdaptiveChangedSince helper reused by the legacy /load path).
…e endpoint (AJDA-2841) Remove the client-side load-type decider entirely (LoadTypeDecider, WorkspaceLoadPlan, WorkspaceTableLoadInstruction, WorkspaceLoadType) and the plan-based prepare/execute methods. AbstractWorkspaceStrategy::prepareAndExecuteTableLoads now forwards each table's parsed snake_case configuration (including any explicit load_type, with adaptive changed_since resolved client-side) to the workspaces input-mapping-load endpoint, which resolves CLONE/VIEW/COPY server-side. Tests updated accordingly: decider/plan unit tests removed, AbstractWorkspaceStrategyTest rewritten around the pass-through behaviour, functional log assertions dropped, job-param assertions now check the server-resolved loadType, and the adaptive case moved to DownloadTablesWorkspaceSnowflakeAdaptiveTest.
…DA-2841) The 'days not supported on workspace' validation lived in getStorageApiLoadOptions, which the workspace strategy no longer calls. Restore it in getStorageApiWorkspaceLoadConfiguration so workspace input mapping still fails fast on days instead of forwarding it to the endpoint (where it is only deprecated).
…y-pair login (AJDA-2841) The default workspace login type provisions a Snowflake TYPE=LEGACY_SERVICE user, which Snowflake no longer accepts on some accounts (workspace creation fails with 'invalid value LEGACY_SERVICE for property TYPE'). Request snowflake-service-keypair so connection creates a TYPE=SERVICE key-pair user instead. The tests never read the workspace credentials directly (loads go through the workspace id via the API), so dropping password credentials is safe.
…none (AJDA-2841) The default login type provisions a Snowflake TYPE=LEGACY_SERVICE user (rejected by Snowflake on some accounts) and service-keypair requires a publicKey. The tests never use the workspace credentials directly, so create the workspace with no direct credentials.
1c0fa56 to
7251d06
Compare
input-mapping tests were tolerated while the workspace input-mapping-load refactor stabilized. The suite is fixed, so empty the allow-list and let any test failure block the barrier again.
…-2841) Drop the source_search input-mapping feature: the source_search config node, the TableDefinitionResolver that resolved it to a concrete source id, the Reader wiring, and all related tests. source is now required. Already removed from all relevant configurations.
…lution (AJDA-2841) strtotime() returns false on failure but 0 for the unix epoch, which is a valid timestamp. The truthy check rejected it; compare strictly against false instead. (Copilot review.)
keboola-pr-reviewer-bot
left a comment
There was a problem hiding this comment.
Verdict: needs_human (risk 4/5) · profile ajda
PR exceeds size cap (3090 LOC across 23 files).
ondrajodas
left a comment
There was a problem hiding this comment.
ještě vyzkoušet pls na canary :)
| ->enumNode('load_type') | ||
| ->values(['COPY', 'CLONE', 'VIEW', 'AUTO']) | ||
| ->beforeNormalization() | ||
| ->ifString() |
There was a problem hiding this comment.
tak tohle je asi vždycky string ne?
There was a problem hiding this comment.
no beforenormalization to může bejt cokoliv, tady jde o to aby se to neposlalo do strtoupper
|
|
||
| // converting to unix timestamp https://keboolaglobal.slack.com/archives/C054VSPFVST/p1723555870048739?thread_ts=1723531121.814779&cid=C054VSPFVST | ||
| // strtotime() returns false on failure; 0 (the unix epoch) is a valid timestamp, so compare strictly. | ||
| $unixTimestamp = strtotime($lastImportDateString); |
There was a problem hiding this comment.
bylo by dobrý tu otestovat i skutečný datum... viděl jsem relativní nebo adaptive zápis
Tested on canary-orionValidated
The key check: per-table load-type resolution is now done server-side via BigQuery — project 503 workspace, load job 1075416All 13 inputs matched expectations, verified against the workspace objects:
Snowflake — project 219 workspace, load job 1075417All 10 inputs as expected — CLONE for plain tables and cloneable linked aliases, COPY for every filter/column/limit/
Full pipeline (not just the load endpoint)
Data App |
…eryTest (AJDA-2841) Review feedback: getStorageApiLoadOptions (the legacy camelCase /load options builder) is unused now that workspace loads forward the snake_case config to input-mapping-load. Remove it along with the now-orphaned getColumnTypes helper and ColumnType type alias, and fix the docblock that referenced it. Replace its unit tests with coverage of getStorageApiWorkspaceLoadConfiguration (snake_case passthrough, absolute + adaptive changed_since, days rejection). Delete BigQueryTest, which only retained a trivial getWorkspaceType assertion after the decider removal.
… test (AJDA-2841) getStorageApiWorkspaceLoadConfiguration returns array<string, mixed>; assertIsArray narrows column_types before the nested offset access so phpstan level max is satisfied.
…JDA-2841) BigQueryTest.php was removed; clear its references in phpunit.xml.dist (the BigQuery testsuite entry that caused 'Test file not found', and the CommonPart1 exclude). The BigQuery suite keeps the functional BigQuery tests.
ondrajodas
left a comment
There was a problem hiding this comment.
už snad jen tohle - https://github.com/keboola/platform-libraries/pull/511/changes#r3433663696
a za mě dobrý 👍
Link to issue
https://linear.app/keboola/issue/AJDA-2841/input-mapping-refactor-before-removal-of-the-load-decider
This is the follow-up the AJDA-2841 refactor unblocked: the clone/copy/view decision is delegated to Connection and the client-side decider is removed.
Description
Workspace table loading is switched from the SAPI
workspaces/{id}/load/load-cloneendpoints to the newworkspaces/{id}/input-mapping-loadendpoint, which accepts thekeboola/input-mappingConfiguration\Tablepayload (snake_case) unchanged and resolves the load type per item server-side (CLONE > VIEW > COPY, with feature gating bybigquery-default-im-viewto switch VIEW > CLONE > COPY). The job echoes the resolvedloadTypeper item back inoperationParams.Concretely:
AbstractWorkspaceStrategy::prepareAndExecuteTableLoadsnow forwards each table's parsed configuration as-is (incl. any explicitload_type) toqueueInputMappingLoad. The client-side decider and its supporting types are deleted:LoadTypeDecider,WorkspaceLoadPlan,WorkspaceTableLoadInstruction,WorkspaceLoadType, plus theprepareTableLoadsToWorkspace/executeTableLoadsToWorkspace/decideTableLoadMethodmethods. The old/loadand/load-clonecalls are no longer used by the library.changed_sincemarker to a concrete timestamp against the input state, and rejecting thedaysoption on workspace loads.load_typeis case-insensitive (normalized to upper-case in config parsing); the legacyuse_view: truecontinues to normalize toload_type: VIEW.source_searchremoved — thesource_searchconfig node and theTableDefinitionResolverthat resolved it to a concretesourceare deleted;sourceis now required. (Already removed from all relevant configurations in https://linear.app/keboola/issue/AJDA-2879/zbavit-se-source-search-ktery-se-nepouziva-a-cely-to-desne-komplikuje.)ALLOW_FAILallow-list now that the suite is green again.Justification
See the linked issue. Delegating the load-type decision to Connection removes a duplicated, drift-prone client-side decider and lets the library pass its own config through unchanged.
Plans for Customer Communication
None.
Impact Analysis
All workspace (Snowflake / BigQuery) input-mapping consumers switch endpoints; the load-type decision moves server-side, so decisions can differ from the old client-side decider where feature gating diverges. The SQL editor is unaffected — it calls
Reader::prepareAndExecuteTableLoads(unchanged signature). File/local (S3/ABS/Local) strategies are unaffected. Verified on canary BigQuery/Snowflake (see below).Deployment Plan
Release a tagged version.
Rollback Plan
Revert this PR.
Post-Release Support Plan
None.