The customer-facing CLI for the tracebloc declarative ingestion path. Wraps the same POST /internal/submit-ingestion-run protocol the tracebloc/ingestor Helm chart uses, so any cluster running the parent tracebloc/client chart can be targeted directly from a developer's workstation.
v0.1.0-alpha is published — the first release, a pre-release for early testers, cut from develop. Every phase of the v0.1 roadmap (#148–#153) is merged. The binary implements version, completion, ingest validate, cluster info, and the full dataset push flow — local schema validation, cluster discovery, data staging, submission, and Job watching, end to end.
dataset push covers 9 of 10 task categories: image_classification, object_detection, keypoint_detection, text_classification, masked_language_modeling, tabular_classification, tabular_regression, time_series_forecasting, and time_to_event_prediction. semantic_segmentation is pending mask-sidecar support upstream (data-ingestors#136); instance_segmentation is not yet implemented.
The release pipeline has shipped v0.1.0-alpha: cosign-signed binaries for linux/darwin/windows (amd64/arm64), SHA256SUMS, and the install scripts. Install it via Customer experience or build from source. (The install.tracebloc.io shortcut, the latest one-liner, and a Homebrew tap all wait for the first stable v* tag — see below.)
The Helm chart remains a sibling interface for the Kubernetes-native workflow: helm install tracebloc/ingestor --set-file ingestConfig=./ingest.yaml (see the chart's README).
The chart works well for one persona — a Kubernetes-fluent ML engineer at a customer who already has data staged on a shared cluster filesystem. For everyone else (data scientists with local data, ML engineers with cloud sources, platform engineers running GitOps, repeat-ingest users), the chart hands them at least three foreign mental models in a trench coat.
The CLI is a sibling interface to the chart, not a replacement. Both translate to the same protocol; the customer picks the one that matches their workflow.
Customer interfaces (pick one or many):
┌─────────────────────────────────────────────────┐
│ Web UI Studio for clicking through │ ← future
│ CLI `tracebloc dataset push ./data` │ ← this repo
│ Python SDK `IngestConfig(...).submit()` │ ← future
│ K8s CRD `kubectl apply` Ingestion CR │ ← future
│ Helm chart tracebloc/ingestor │ ← today
└─────────────────────────────────────────────────┘
│
│ All translate to the same protocol —
│ no special-casing in jobs-manager
▼
┌─────────────────────────────────────────────────┐
│ POST /internal/submit-ingestion-run │
│ Protocol: ingest.v1.json schema │
└─────────────────────────────────────────────────┘
The protocol — the v1 schema + the POST endpoint — is the stable point. Everything above is interchangeable.
v0.1.0-alphais a pre-release, so theinstall.tracebloc.ioshortcut isn't wired up yet and GitHub'slatestskips pre-releases. Install the alpha by explicit tag (below), or build from source.
# Install the alpha — pins the pre-release tag (Linux/macOS)
curl -fsSL https://github.com/tracebloc/cli/releases/download/v0.1.0-alpha/install.sh \
| sh -s -- --version v0.1.0-alpha
# Windows + manual/signed downloads: https://github.com/tracebloc/cli/releases/tag/v0.1.0-alpha
# Per dataset
tracebloc dataset push ./my-data \
--table cats_dogs_train \
--category image_classification \
--intent train \
--label-column labelWhat that runs under the curtain:
- Reads kubeconfig, discovers the parent
tracebloc/clientrelease in the cluster - Validates the implied
ingest.yamlagainst the v1 schema locally (instant feedback) - Mints an
ingestorServiceAccount token via TokenRequest - Stages the local files into the cluster's shared PVC via an ephemeral Pod
- POSTs the request to jobs-manager
- Watches the ingestor Job; streams logs; prints the summary
The customer never touches Helm, never edits YAML, never runs kubectl cp.
git clone https://github.com/tracebloc/cli.git
cd cli
go build -o tracebloc ./cmd/tracebloc
./tracebloc versionRequires Go 1.26 or newer (the k8s.io/* dependencies set the floor; see go.mod). The binary self-reports its build metadata; a go build without -ldflags reports dev / unknown / unknown for version/git-sha/build-date so support can tell a local hack apart from a release build.
# Release-style build with version metadata
go build -ldflags "\
-X main.version=$(git describe --tags --always) \
-X main.gitSHA=$(git rev-parse --short HEAD) \
-X main.buildDate=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
-o tracebloc ./cmd/traceblocAll v0.1 phases are merged:
| Phase | Ticket | What | Status |
|---|---|---|---|
| 0 | #148 | Repo bootstrap + Go module + CI + tracebloc version |
✅ |
| 1 | #149 | Embed ingest.v1.json + tracebloc ingest validate <path> (local-only) |
✅ |
| 2 | #150 | Cluster discovery + ingestor SA token via TokenRequest | ✅ |
| 3 | #151 | Stage data into the shared PVC via ephemeral Pod | ✅ |
| 4 | #152 | Submit to jobs-manager + watch ingestor Job + summary | ✅ |
| 5 | #153 | GitHub Releases + install.sh distribution (Homebrew tap deferred) | ✅ — v0.1.0-alpha published (pre-release) |
Beyond the original phases, dataset push was widened from image-classification-only to 9 of 10 modalities, and the test suite gained unit-coverage wins plus a kind-based integration harness for the real-I/O seams.
Next (v0.2): cloud-source ingestion (S3/GCS/HTTPS) for datasets above the 1 GiB local cap; semantic_segmentation (data-ingestors#136) and instance_segmentation; more dataset verbs (list, rm). Smaller follow-ups are tracked as open issues (#3–#7).
Epic: tracebloc/client#147.
- tracebloc/client — the parent Helm chart this CLI submits to, and where the
tracebloc/ingestorsubchart lives. - tracebloc/data-ingestors — the ingestor image + JSON schema this CLI validates against.
Apache 2.0 — see LICENSE.