Skip to content

feat(input-mapping): load workspace tables via input-mapping-load endpoint (AJDA-2841)#511

Open
odinuv wants to merge 12 commits into
mainfrom
odin/AJDA-2841-2
Open

feat(input-mapping): load workspace tables via input-mapping-load endpoint (AJDA-2841)#511
odinuv wants to merge 12 commits into
mainfrom
odin/AJDA-2841-2

Conversation

@odinuv

@odinuv odinuv commented Jun 8, 2026

Copy link
Copy Markdown
Member

Link to issue

https://linear.app/keboola/issue/AJDA-2841/input-mapping-refactor-before-removal-of-the-load-decider

This is the follow-up the AJDA-2841 refactor unblocked: the clone/copy/view decision is delegated to Connection and the client-side decider is removed.

Description

Workspace table loading is switched from the SAPI workspaces/{id}/load / load-clone endpoints to the new workspaces/{id}/input-mapping-load endpoint, which accepts the keboola/input-mapping Configuration\Table payload (snake_case) unchanged and resolves the load type per item server-side (CLONE > VIEW > COPY, with feature gating by bigquery-default-im-view to switch VIEW > CLONE > COPY). The job echoes the resolved loadType per item back in operationParams.

Concretely:

  • Load-type decision removed from the library. AbstractWorkspaceStrategy::prepareAndExecuteTableLoads now forwards each table's parsed configuration as-is (incl. any explicit load_type) to queueInputMappingLoad. The client-side decider and its supporting types are deleted: LoadTypeDecider, WorkspaceLoadPlan, WorkspaceTableLoadInstruction, WorkspaceLoadType, plus the prepareTableLoadsToWorkspace/executeTableLoadsToWorkspace/decideTableLoadMethod methods. The old /load and /load-clone calls are no longer used by the library.
  • Kept client-side (the bits the endpoint can't do): resolving the adaptive changed_since marker to a concrete timestamp against the input state, and rejecting the days option on workspace loads.
  • load_type is case-insensitive (normalized to upper-case in config parsing); the legacy use_view: true continues to normalize to load_type: VIEW.
  • source_search removed — the source_search config node and the TableDefinitionResolver that resolved it to a concrete source are deleted; source is now required. (Already removed from all relevant configurations in https://linear.app/keboola/issue/AJDA-2879/zbavit-se-source-search-ktery-se-nepouziva-a-cely-to-desne-komplikuje.)
  • CI: input-mapping is removed from the ALLOW_FAIL allow-list now that the suite is green again.

Justification

See the linked issue. Delegating the load-type decision to Connection removes a duplicated, drift-prone client-side decider and lets the library pass its own config through unchanged.

Plans for Customer Communication

None.

Impact Analysis

All workspace (Snowflake / BigQuery) input-mapping consumers switch endpoints; the load-type decision moves server-side, so decisions can differ from the old client-side decider where feature gating diverges. The SQL editor is unaffected — it calls Reader::prepareAndExecuteTableLoads (unchanged signature). File/local (S3/ABS/Local) strategies are unaffected. Verified on canary BigQuery/Snowflake (see below).

Deployment Plan

Release a tagged version.

Rollback Plan

Revert this PR.

Post-Release Support Plan

None.

@linear

linear Bot commented Jun 8, 2026

Copy link
Copy Markdown

AJDA-2841

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the input-mapping table loading flow to use a two-phase “queue + completion” API and switches workspace table loads to the new workspaces/{id}/input-mapping-load endpoint so the server resolves per-table load type (CLONE/VIEW/COPY) from the input-mapping Configuration\Table payload.

Changes:

  • Switch workspace loads from the legacy /load payload builders to passing the parsed (snake_case) IM table configuration into queueInputMappingLoad().
  • Introduce TableLoadQueueInterface plus concrete queue types (WorkspaceLoadQueue, TableExportQueue) and refactor strategies/Reader to prepareAndExecuteTableLoads() + waitForTableLoadCompletion().
  • Normalize legacy use_view into load_type in configuration parsing and update tests accordingly; bump keboola/storage-api-client to ^18.9.0.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
libs/input-mapping/src/Table/Strategy/AbstractWorkspaceStrategy.php Switch workspace job enqueueing to queueInputMappingLoad() and refactor completion to queue-based materialization.
libs/input-mapping/src/Table/Strategy/WorkspaceLoadQueue.php Implement TableLoadQueueInterface; stamp queue with strategy identity + destination.
libs/input-mapping/src/Table/Strategy/TableLoadQueueInterface.php New interface describing job ids, tables, and queue provenance.
libs/input-mapping/src/Table/Strategy/TableExportQueue.php New queue type for file-based exports (Local/S3/ABS).
libs/input-mapping/src/Table/Strategy/AbstractStrategy.php Introduce unified completion phase (waitForTableLoadCompletion) that awaits jobs and builds Result.
libs/input-mapping/src/Table/Strategy/Local.php Convert Local table export to queue-based async export + completion download + manifest write.
libs/input-mapping/src/Table/Strategy/S3.php Convert S3 export to queue-based async export + completion manifest write with federated token.
libs/input-mapping/src/Table/Strategy/ABS.php Convert ABS export to queue-based async export + completion manifest write with federated token.
libs/input-mapping/src/Table/StrategyInterface.php Replace downloadTable/handleExports with prepareAndExecuteTableLoads/waitForTableLoadCompletion.
libs/input-mapping/src/Reader.php Refactor table download to start+finish via queue and add strategy-mismatch guard on completion.
libs/input-mapping/src/Table/Options/InputTableOptions.php Add workspace-load configuration passthrough and shared adaptive changed_since resolution helper.
libs/input-mapping/src/Configuration/Table.php Normalize legacy use_view into load_type (uppercase) and omit use_view from parsed output.
libs/input-mapping/composer.json Bump keboola/storage-api-client to ^18.9.0 (needed for new workspace load endpoint / exporter API).
libs/input-mapping/azure-pipelines.tests.yml Modifies pipeline test commands (currently masking failures).
libs/input-mapping/tests/Table/Strategy/WorkspaceLoadQueueTest.php New unit tests for WorkspaceLoadQueue behavior.
libs/input-mapping/tests/Table/Strategy/TableExportQueueTest.php New unit tests for TableExportQueue behavior and job-id key casting.
libs/input-mapping/tests/Table/Strategy/LocalStrategyTest.php Update Local strategy tests to assert queue-based export behavior.
libs/input-mapping/tests/Table/Strategy/S3StrategyTest.php Update S3 strategy tests to assert queue-based export behavior.
libs/input-mapping/tests/Table/Strategy/BigQueryTest.php Update BigQuery tests to assert prepareTableLoadsToWorkspace() instruction output.
libs/input-mapping/tests/Table/Strategy/AbstractWorkspaceStrategyTest.php Update/replace unit coverage to match queue completion behavior + explicit VIEW handling.
libs/input-mapping/tests/Table/Strategy/AbstractFileStrategyTest.php Update abstract file strategy test to new queue-based API.
libs/input-mapping/tests/Table/Options/InputTableOptionsTest.php Update tests for load_type and legacy use_view normalization behavior.
libs/input-mapping/tests/Configuration/TableConfigurationTest.php Update table config normalization expectations (drop use_view, add load_type cases).
libs/input-mapping/tests/Table/TableDefinitionResolverTest.php Update expected table definition output to remove use_view.
libs/input-mapping/tests/ReaderTest.php Update Reader tests for queue-based API + add strategy mismatch coverage.
libs/input-mapping/tests/Helper/RealDevStorageTableRewriteHelperTest.php Update expected normalized table definitions (no use_view).
libs/input-mapping/tests/Helper/FakeDevStorageTableRewriteHelperTest.php Update expected normalized table definitions (no use_view).
libs/input-mapping/tests/Functional/DownloadTablesWorkspaceSnowflakeTest.php Update functional assertions for unified workspaceLoad job + metrics merging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +124 to +127
if (!$plan->preserve) {
$this->logger->info('Cleaning workspace and loading tables.');
}
if ($cloneCount > 0) {
$this->logger->info(sprintf('Cloning %s tables to workspace.', $cloneCount));
}
if ($copyCount > 0) {
$this->logger->info(sprintf('Copying %s tables to workspace.', $copyCount));
}
$this->logger->info(sprintf('Loading %s tables to workspace.', count($allInputs)));
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
Comment thread libs/input-mapping/azure-pipelines.tests.yml Outdated
odinuv added 6 commits June 17, 2026 14:29
…point (AJDA-2841)

Delegate workspace table loading to the SAPI workspaces input-mapping-load endpoint, which accepts the keboola/input-mapping Configuration\Table payload (snake_case) unchanged and resolves the load type per item server-side (CLONE > VIEW > COPY).

executeTableLoadsToWorkspace now hands each table its parsed configuration as-is instead of pre-deciding the load type and translating to the SAPI camelCase /load shape. The adaptive changed_since marker still cannot be resolved by the endpoint, so it is resolved client-side to a concrete unix timestamp (extracted into a shared resolveAdaptiveChangedSince helper reused by the legacy /load path).
…e endpoint (AJDA-2841)

Remove the client-side load-type decider entirely (LoadTypeDecider, WorkspaceLoadPlan, WorkspaceTableLoadInstruction, WorkspaceLoadType) and the plan-based prepare/execute methods. AbstractWorkspaceStrategy::prepareAndExecuteTableLoads now forwards each table's parsed snake_case configuration (including any explicit load_type, with adaptive changed_since resolved client-side) to the workspaces input-mapping-load endpoint, which resolves CLONE/VIEW/COPY server-side.

Tests updated accordingly: decider/plan unit tests removed, AbstractWorkspaceStrategyTest rewritten around the pass-through behaviour, functional log assertions dropped, job-param assertions now check the server-resolved loadType, and the adaptive case moved to DownloadTablesWorkspaceSnowflakeAdaptiveTest.
…DA-2841)

The 'days not supported on workspace' validation lived in getStorageApiLoadOptions, which the workspace strategy no longer calls. Restore it in getStorageApiWorkspaceLoadConfiguration so workspace input mapping still fails fast on days instead of forwarding it to the endpoint (where it is only deprecated).
…y-pair login (AJDA-2841)

The default workspace login type provisions a Snowflake TYPE=LEGACY_SERVICE user, which Snowflake no longer accepts on some accounts (workspace creation fails with 'invalid value LEGACY_SERVICE for property TYPE'). Request snowflake-service-keypair so connection creates a TYPE=SERVICE key-pair user instead. The tests never read the workspace credentials directly (loads go through the workspace id via the API), so dropping password credentials is safe.
…none (AJDA-2841)

The default login type provisions a Snowflake TYPE=LEGACY_SERVICE user (rejected by Snowflake on some accounts) and service-keypair requires a publicKey. The tests never use the workspace credentials directly, so create the workspace with no direct credentials.
@ondrajodas ondrajodas changed the base branch from main to ondra/AJDA-2322-fix June 17, 2026 12:31
@odinuv odinuv requested a review from Copilot June 17, 2026 12:44
@odinuv odinuv requested a review from ondrajodas June 17, 2026 12:47
@odinuv

odinuv commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Comment thread libs/input-mapping/src/Table/Options/InputTableOptions.php
Comment thread libs/input-mapping/src/Table/Strategy/AbstractWorkspaceStrategy.php
odinuv added 3 commits June 17, 2026 17:12
input-mapping tests were tolerated while the workspace input-mapping-load refactor stabilized. The suite is fixed, so empty the allow-list and let any test failure block the barrier again.
…-2841)

Drop the source_search input-mapping feature: the source_search config node, the TableDefinitionResolver that resolved it to a concrete source id, the Reader wiring, and all related tests. source is now required. Already removed from all relevant configurations.
…lution (AJDA-2841)

strtotime() returns false on failure but 0 for the unix epoch, which is a valid timestamp. The truthy check rejected it; compare strictly against false instead. (Copilot review.)
Base automatically changed from ondra/AJDA-2322-fix to main June 17, 2026 17:27
@odinuv odinuv marked this pull request as ready for review June 17, 2026 19:18

@keboola-pr-reviewer-bot keboola-pr-reviewer-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: needs_human (risk 4/5) · profile ajda

PR exceeds size cap (3090 LOC across 23 files).

@ondrajodas ondrajodas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ještě vyzkoušet pls na canary :)

Comment thread .github/workflows/ci.yml
->enumNode('load_type')
->values(['COPY', 'CLONE', 'VIEW', 'AUTO'])
->beforeNormalization()
->ifString()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tak tohle je asi vždycky string ne?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no beforenormalization to může bejt cokoliv, tady jde o to aby se to neposlalo do strtoupper

Comment thread libs/input-mapping/src/Table/Options/InputTableOptions.php Outdated
Comment thread libs/input-mapping/tests/Table/Strategy/BigQueryTest.php Outdated

// converting to unix timestamp https://keboolaglobal.slack.com/archives/C054VSPFVST/p1723555870048739?thread_ts=1723531121.814779&cid=C054VSPFVST
// strtotime() returns false on failure; 0 (the unix epoch) is a valid timestamp, so compare strictly.
$unixTimestamp = strtotime($lastImportDateString);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bylo by dobrý tu otestovat i skutečný datum... viděl jsem relativní nebo adaptive zápis

Comment thread libs/input-mapping/src/Table/Options/InputTableOptions.php Outdated
@odinuv

odinuv commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Tested on canary-orion

Validated keboola/input-mapping ref 6fa9f88b (= this PR @ 6a3123df) end-to-end across every consuming service and both workspace backends. Each consumer was pinned to that IM reference:

The key check: per-table load-type resolution is now done server-side via input-mapping-load. For each scenario I confirmed not just the resolved loadType in the job's operationParams, but the actual workspace object afterwards (via INFORMATION_SCHEMA) — a VIEW is really a view, a CLONE a clone, a COPY a base table.

BigQuery — project 503 workspace, load job 1075416

All 13 inputs matched expectations, verified against the workspace objects:

  • VIEW — linked bucket (alias); linked bucket with use_view:false (flag ignored); explicit load_type: view
  • CLONE — plain table; explicit load_type: CLONE (+dropTimestampColumn); cloneable table with use_view:false; in-project auto-sync alias
  • COPY (base table) — linked bucket with load_type: copy override; where eq/ne; column whitelist; limit; changed_since: -2 days (resolved to _timestamp ge …)

Snowflake — project 219 workspace, load job 1075417

All 10 inputs as expected — CLONE for plain tables and cloneable linked aliases, COPY for every filter/column/limit/changed_since/override case (confirmed via INFORMATION_SCHEMA; on Snowflake both clones and copies materialize as TABLE).

  • Forcing load_type: view on a non-viewable source correctly errored: Load type "VIEW" is not available … Possible load types: COPY, CLONE — the endpoint validates explicit load types instead of honoring them blindly (an improvement over the removed client-side decider).
  • A natural Snowflake VIEW wasn't exercised — it needs an external-schema / read-only-storage bucket, which the test project lacks. The VIEW path is covered live on BigQuery and at the rule level otherwise.

Full pipeline (not just the load endpoint)

  • Data Loader APIsandbox restore + load job (project 219) produced all destination tables and their .manifest files → manifest writing intact.
  • Job-runner — transformations on both backends (BigQuery job 477076, Snowflake job 477098) ran against the loaded workspace and read back the expected object types.

Data App

BigQuery: https://connection.canary-orion.keboola.dev/admin/projects/503/data-apps/01kvcszye1dnw222f2q23c5mb0/advanced-settings

Snowflake: https://connection.canary-orion.keboola.dev/admin/projects/219/data-apps/01kvd4s4mx8wzfzxce5msffan9/advanced-settings

odinuv added 2 commits June 18, 2026 10:22
…eryTest (AJDA-2841)

Review feedback: getStorageApiLoadOptions (the legacy camelCase /load options builder) is unused now that workspace loads forward the snake_case config to input-mapping-load. Remove it along with the now-orphaned getColumnTypes helper and ColumnType type alias, and fix the docblock that referenced it. Replace its unit tests with coverage of getStorageApiWorkspaceLoadConfiguration (snake_case passthrough, absolute + adaptive changed_since, days rejection). Delete BigQueryTest, which only retained a trivial getWorkspaceType assertion after the decider removal.
… test (AJDA-2841)

getStorageApiWorkspaceLoadConfiguration returns array<string, mixed>; assertIsArray narrows column_types before the nested offset access so phpstan level max is satisfied.
…JDA-2841)

BigQueryTest.php was removed; clear its references in phpunit.xml.dist (the BigQuery testsuite entry that caused 'Test file not found', and the CommonPart1 exclude). The BigQuery suite keeps the functional BigQuery tests.
@odinuv odinuv requested a review from ondrajodas June 18, 2026 10:10

@ondrajodas ondrajodas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants