Skip to content

feedback od Jasnost projektu #415

@padak

Description

@padak

Consolidated feedback from the jasnost project (open-source, consent-based process discovery — https://github.com/padak/jasnost), which uses Keboola as its data plane and is dogfooding kbagent.

Context

jasnost's hosted processor (a Keboola Data App) consumes exactly two things from a project: workspace SQL (Query Service) and Storage Files. Both are fixed, typed operations — not AI-driven exploration.

Today the processor can't cleanly consume kbagent in-process: kbagent is a CLI + HTTP daemon, so a hosted Data App either runs it as a sidecar (extra process, boot-time install of a heavy dep, stateful config) or shells out. As a result we ended up:

  • hand-rolling a Query Service client (query_service.py) that mirrors keboola-agent-cli's own client, because there was no importable one, and
  • depending on kbcstorage for Files (which covers the Storage API only — not Query Service).

The 4 asks below would let jasnost consolidate on kbagent as a library, with no sidecar — deleting both query_service.py and the kbcstorage dependency. Ordered by impact.

1. First-class importable Python client (not just CLI + serve) — highest impact

A stateless, in-process client:

from keboola_agent import Client  # name TBD
c = Client(url=KBC_URL, token=KBC_TOKEN)
rows  = c.query(workspace_id, "SELECT ...")          # -> list[dict]
fid   = c.files.upload(path_or_bytes, tags=[...], permanent=True)
metas = c.files.list(tag="...")
data  = c.files.read_bytes(file_id)                  # -> bytes

No daemon, no shell-out, no config-dir. This single change lets any in-process consumer (a Data App, a transformation, any service) use kbagent's Query Service + Storage without a sidecar.

Evidence of the gap: we copied kbagent's Query Service client logic into our own module rather than importing it. (If such a client already exists and we missed it, this becomes "document + surface it".)

2. Stateless / env-only auth path

Allow the client (and serve) to operate purely from KBC_URL + KBC_TOKEN with no project add / config-dir — an implicit single-project, 12-factor mode. Removes the boot-time project add ceremony in ephemeral containers and multi-tenant setups.

Today this works but is stateful: kbagent project add --project <alias> --url $KBC_URL (token from KBC_TOKEN env), then kbagent serve — the daemon must be seeded with config at boot.

3. Structured query results (not CSV text)

query(...) currently returns CSV text the caller must parse, then lowercase headers (Snowflake folds unquoted aliases to UPPERCASE) and coerce types. We carry _parse_csv + _to_int / _coerce_bool shims purely to tame this. A structured return (JSON / typed records) — or at least a documented, stable contract — would remove the fragile parsing layer.

4. Uniform file read + stable list shape

  • A files.read_bytes(file_id) -> bytes convenience (bytes without a temp-dir + signed-url dance).
  • A list-item shape that always carries (id, name, tags, created, url), so callers don't branch on "does this item have a signed url?".

Context: our read_file(meta) had to handle local (list items carry a signed url) vs deployed (kbcstorage list items don't → download by id) differently. A uniform shape collapses that to one path.


Points 1 + 2 are the real unlock: with them, "kbagent inside the Data App, no sidecar" becomes the obvious choice for jasnost — and a clean pattern for any Python service on Keboola.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions