diff --git a/.agents/skills/creating-documentation-graphics/SKILL.md b/.agents/skills/creating-documentation-graphics/SKILL.md new file mode 100644 index 0000000..270cdb5 --- /dev/null +++ b/.agents/skills/creating-documentation-graphics/SKILL.md @@ -0,0 +1,327 @@ +--- +name: creating-documentation-graphics +description: Creates clean technical graphics for Tilebox documentation. Use for architecture diagrams, flow diagrams, class relationships, trees, graphs, dependency diagrams, and Tilebox workflow diagrams for docs.tilebox.com. +--- + +# Creating Documentation Graphics + +Create clean, technical graphics for the Tilebox documentation website. Use this skill for diagrams and visual explanations that belong in `docs.tilebox.com`, including architecture diagrams, data flows, class relationships, trees, graph structures, dependency diagrams, and Tilebox workflow diagrams. + +This skill is derived from the Tilebox infographic style, but it is optimized for documentation: precise, legible, attractive, and easy to drop into Mintlify pages. + +## Core rule: start with painter PNGs + +When the user asks to create, edit, change, redraw, mock up, or iterate on a documentation graphic, use painter from the start and produce PNG image files. Do not draft the graphic as SVG, Mermaid, D2, HTML, hand-authored vector markup, or another intermediate format and then convert it. Do not only describe the change. Produce updated image files. + +If the user asks for a non-visual plan, outline, or copy only, answer normally. But once the task involves visuals or visual edits, use painter. + +The only exception is when the user explicitly asks for a Tilebox workflow DAG diagram, a D2 workflow diagram, or to update an existing workflow DAG under `assets/workflows/diagrams/`. In that case, use the existing D2 workflow generator and palette. This exception is only for task/DAG diagrams, not for normal workflow concept, architecture, release, deployment, or data-flow graphics. + +## Core outcome + +Produce documentation graphics that: + +- explain a concrete technical relationship or process; +- fit naturally inside Mintlify pages; +- have matching light and dark variants by default; +- are generated as PNGs with painter by default; +- use a clean technical Tilebox look +- avoid background texture, use a solid color matching the specified documentation background colors for the light/dark theme; +- avoid logos, marketing copy, and presentation-specific decoration. + +## Output modes + +Always start out with creating a light mode, and afterwards ask the user for confirmation to also generate the dark mode. +When asked to integrate the graphic into the documentation page, always generate both light and dark mode. + +Use painter-generated PNG outputs by default: + +- `name-light.png` +- `name-dark.png` + +Create the light version first, then create the dark version as a color-only edit of the light version. Preserve layout and content exactly. + +Only use SVG outputs for the explicit Tilebox workflow DAG / D2 exception. In that case, follow the existing `name.svg` and `name.dark.svg` workflow convention. Do not choose SVGs because they seem easier to maintain; for documentation graphics, the expected artifact is the painter PNG pair. + +## Documentation integration pattern + +Use the docs repository light/dark image pattern: + +```mdx +Short descriptive alt text +Short descriptive alt text +``` + +For the explicit workflow DAG / D2 exception, use the existing SVG pattern: + +```mdx +Short descriptive alt text +Short descriptive alt text +``` + +Always include meaningful `alt` text. Keep it factual and concise. + +## What this skill creates + +Use this skill for: + +- architecture diagrams; +- service and module interaction diagrams; +- request, event, and data flow diagrams; +- class and object relationship overviews; +- task graphs, DAGs, trees, and dependency graphs; +- state machines and lifecycle diagrams; +- storage, dataset, and workflow mental models; +- small technical visual explainers embedded in guides or concept pages. + +Do not include: + +- Tilebox logos; +- logo-derived marks used as decoration; +- slide titles or slide-like framing; +- conclusion strips or takeaway banners; +- section numbers; +- footer treatments; +- large marketing headlines inside the image. + +Do not use heavy, full-canvas, or central background patterns. + +Documentation graphics never need the Tilebox logo. Do not place it in generated graphics. + +## Visual style + +Use the reference style in `references/best-practice-technical-sketch.jpeg` as the best-practice visual target for normal documentation graphics. It is a clean technical sketch style: thin dark outlines, white or near-background fills, simple line icons, generous whitespace, restrained red action accents, and subtle low-contrast background motifs. + +Use the following Tilebox color language and shape language: + +- clean, technical, calm, and highly legible; +- polished painter-generated diagrams with a subtle hand-drawn or field-notes feel, similar to the best-practice reference image; +- restrained deep rose accents, default `#BE123C`, used mainly for action arrows, small icon highlights, checks, emphasis marks, and short underlines; +- faint background, matching the to the docs background colors; +- generous whitespace and clear hierarchy; +- short labels close to the things they describe; +- simple arrows and connectors with unambiguous direction, drawn as clean sketch-like strokes with open arrowheads where appropriate; +- iconography: use minimal black line-art icons with tiny red highlights only when they clarify a node or concept; +- consistent spacing, alignment, and shape language; +- prefer white or near-background cards with thin dark borders over colored filled cards; +- use rounded rectangles for actors, systems, and grouped lists, and hexagons for loop steps or lifecycle states when that improves clarity; +- avoid heavy shadows; if needed, use only soft sketch shadows that do not make the graphic look like generic SaaS vector art. + +Avoid: + +- generic corporate vector art; +- noisy decoration; +- overusing accent colors or filling large cards with red, blue, or gray; +- overdoing iconography; +- playful or childish marker style; +- cursive or handwriting-style fonts; +- tiny labels; +- dense paragraphs inside diagrams; +- low-contrast strokes or fills; +- bulky app-icon style illustrations; +- gradients, glassmorphism, or glossy depth effects. + +## Typography + +Use Geist as the default font. + +Use `Geist Mono` where it helps identify technical terms, code-like labels, identifiers, field names, method names, task names, dataset names, state values, or small snippets. + +Examples that should usually use `Geist Mono`: + +- `Task`, `Runner`, `Dataset`, `Collection` when shown as code-like entities; +- `job.id`, `collection_name`, `time_interval`; +- method names such as `submit()` or `query()`; +- task class names such as `DownloadTask`; +- state values such as `queued`, `running`, `computed`. + +Keep text in graphics short. Prefer one-to-three-word labels and compact identifiers. + +## Default color palettes + + +### Light mode + +Use these colors by default for non-workflow documentation graphics: + +- **Background:** transparent or `#fcf9fa` only. If using texture, keep the base background exactly `#fcf9fa` and add only subtle low-contrast patterning on top. +- **Primary accent:** `#BE123C` deep rose for active paths, arrows, highlights, emphasis strokes, selected nodes, and small icon fills. +- **Text and outlines:** deep navy / near-black navy. +- **Supporting fills and lines:** pale blue-gray or warm gray for quiet card fills, secondary connectors, shadows, and grouping regions. +- **Sketch style:** prefer white or near-background card interiors with thin deep navy outlines; avoid large colored fills. + +### Dark mode + +Use these colors by default for non-workflow documentation graphics: + +- **Background:** transparent or `#161416` only. If using texture, keep the base background exactly `#161416` and add only subtle low-contrast patterning on top. +- **Primary accent:** `#BE123C` deep rose for active paths, arrows, highlights, emphasis strokes, selected nodes, and small icon fills. +- **Text and outlines:** off-white / very light cool gray. +- **Supporting fills and lines:** muted blue-gray or warm gray for quiet card fills, secondary connectors, shadows, and grouping regions. +- **Sketch style:** preserve the same thin-outline reference style in dark mode: use dark near-background card interiors, off-white line icons and outlines, muted secondary strokes, and small deep-rose highlights rather than heavy filled panels. + +New docs graphics should use transparent, `#fcf9fa`, or `#161416` backgrounds as specified above. + +## Background + +Create documentation-friendly backgrounds, compatible with the Mintlify backgrounds. + +Good background treatments: + +- subtle paper grain or soft technical-notebook texture; +- pale blue-gray sketch shadows under main cards or nodes; +- tiny low-contrast connector dots or guide marks near margins; + +Rules: + +- Keep the center of the graphic clean enough for labels and connectors. +- Keep the border of the graphic the solid background color, to integrate seamlessly with the Mintliy background +- Keep patterns low contrast and secondary to the diagram. +- Do not place texture behind dense labels or code-like terms. +- Do not make the graphic look like a presentation slide. +- Do not include any logo or logo watermark. + +## Tilebox workflow diagram palette + +Only when the user explicitly asks for a Tilebox workflow DAG diagram, D2 workflow diagram, task-state DAG, retry DAG, optional-subtask tree, or runner execution DAG, use the workflow-specific D2 palette from `assets/workflows/diagrams/generate.py` instead of painter and instead of the default palette. + +Reference source: + +- `assets/workflows/diagrams/generate.py` + +Light workflow theme: + +- **Background:** `#FCF9FA` +- **Queued task:** fill `#FFF0F5`, stroke `#504448`, text `#000000` +- **Running task:** fill `#AFEEEE`, stroke `#0e5253`, text `#000000` +- **Computed task:** fill `#F0FFF0`, stroke `#3f4b40`, text `#000000` +- **Failed task:** fill `#FA8072`, stroke `#4a1511`, text `#000000` +- **Skipped task:** fill `#fcf3ae`, stroke `#877e3c`, text `#000000` +- **Subtask edge:** stroke `#170206` +- **Dependency edge:** dashed stroke `#9B1A47` +- **Diagram title:** text `#170206` + +Dark workflow theme: + +- **Background:** `#161416` +- **Queued task:** fill `#A37200`, stroke `#fcc76f`, text `#FFFFFF` +- **Running task:** fill `#3E7079`, stroke `#b1e5ef`, text `#FFFFFF` +- **Computed task:** fill `#265429`, stroke `#b7ebb8`, text `#FFFFFF` +- **Failed task:** fill `#A31800`, stroke `#f78d79`, text `#FFFFFF` +- **Skipped task:** fill `#c6b63c`, stroke `#ffed67`, text `#FFFFFF` +- **Subtask edge:** stroke `#F4F1F4` +- **Dependency edge:** dashed stroke `#F97F76` +- **Diagram title:** text `#F4F1F4` + +Workflow D2 conventions: + +- put source diagrams in `assets/workflows/diagrams/*.d2`; +- generate SVG outputs into `assets/workflows/diagrams/svg/`; +- run `python generate.py` from `assets/workflows/diagrams/` after editing workflow diagrams; +- commit both `.d2` source and generated `.svg` / `.dark.svg` outputs; +- use `Geist` font files through the existing generator; +- keep task labels short and use D2 classes such as `queued`, `running`, `computed`, `failed`, `skipped`, `optional`, `subtask-edge`, and `dependency-edge`. + +Only use this workflow-specific palette for explicit Tilebox workflow DAG / D2 diagrams. For all other documentation graphics, use painter and the default documentation palette. + +## Composition rules + +Default to an aspect ratio that fits the page content: + +- **Wide architecture or flow diagrams:** 16:9 PNG. +- **Tall trees or lifecycle diagrams:** portrait PNG. +- **Small inline concepts:** compact PNG with minimal padding. + +Rules: + +- Make the diagram understandable without surrounding prose, but do not duplicate the prose. +- Keep one primary idea per graphic. +- Put labels inside or adjacent to their objects. +- Use consistent connector direction and spacing. +- Prefer left-to-right flow for processes and top-to-bottom flow for trees. +- Use grouping boxes only when they reduce ambiguity. +- Leave enough padding for rendering in Mintlify cards and pages. +- Use decorative motifs only as subtle background texture or to clarify structure. + +## Choosing the implementation format + +Use this order: + +1. **Painter-generated PNGs** for documentation graphics by default. +2. **D2 through `assets/workflows/diagrams/generate.py`** only when the user explicitly asks for a Tilebox workflow DAG / D2 diagram or updates an existing workflow DAG source. + +Do not choose hand-authored SVGs or generic source-backed diagram tooling for new docs graphics. The desired default is painter-generated PNGs because they look better in the docs. The only built-in non-painter path in this skill is the explicit Tilebox workflow DAG / D2 exception. If a normal documentation diagram would benefit from iteration, iterate with painter prompts and saved PNG outputs instead of creating a source-backed SVG. + +## Creating a new documentation graphic + +1. Identify the page and asset folder. Use the nearest product folder under `assets/`. +2. Read nearby MDX pages and assets to match naming, embedding, and visual density. +3. Use painter immediately and produce PNGs. Only use D2 if the user explicitly asks for a Tilebox workflow DAG / D2 diagram. +4. Define the smallest diagram that explains the target concept. +5. Create both light and dark variants unless the user asks otherwise. +6. Add or update the MDX embedding only if the user asked to place the graphic in the docs. +7. Report the generated PNG paths and any caveat about text legibility or exact wording. + +## Editing an existing documentation graphic + +1. Use painter to edit existing PNG documentation graphics by default. +2. For explicit workflow DAG / D2 diagrams, find the `.d2` source first. Do not edit generated workflow SVG output if a D2 source file exists. +3. Preserve the existing naming and light/dark pairing. +4. Keep layout changes limited to what the user requested. +5. Verify both mode variants exist and are referenced correctly. + +## Painter prompt patterns + +Use painter as the default tool. + +Generate one image per painter call. For paired light/dark outputs, first generate the light PNG, then use that image as the input for a dark-mode color-only edit. + +### Light documentation graphic prompt base + +> Create a clean technical Tilebox documentation graphic for docs.tilebox.com as a PNG, matching the best-practice technical sketch style from `references/best-practice-technical-sketch.jpeg`: thin dark outlines, mostly white or near-background cards, minimal black line-art icons with tiny deep-rose highlights, restrained red action arrows, generous whitespace, and a calm hand-sketched field-notes feel without looking childish. Use Geist typography, with Geist Mono for code-like labels and technical identifiers. Use a transparent background or exactly `#fcf9fa` as the base background, deep navy text and outlines, restrained deep rose accents exactly `#BE123C`, and pale blue-gray or warm-gray secondary lines. Prefer rounded rectangles for actors/systems/grouped lists and hexagons for loop steps or lifecycle states when useful. Add only tasteful low-contrast background motifs, such as subtle paper grain, faint honeycomb/technical guide marks near the margins, or soft sketch shadows, while keeping the center clean and readable and the edges the solid background color. Create a precise architecture / flow / relationship diagram with short, highly legible labels and limited simple iconography only where it clarifies the content. Do not include any Tilebox logo, logo watermark, slide title, conclusion text, section number, footer, or full-canvas background pattern. Avoid clutter, colored filled cards, gradients, glossy depth effects, decorative corporate vector art, overdone icons, cursive, childish style, tiny text, and low contrast. + +### Dark documentation graphic prompt base + +Use this only when creating a dark-mode-only graphic with no light source image: + +> Create a clean technical Tilebox documentation graphic for docs.tilebox.com as a PNG, not a presentation slide, matching the best-practice technical sketch style from `references/best-practice-technical-sketch.jpeg` adapted to dark mode: thin off-white outlines, dark near-background cards, minimal light line-art icons with tiny deep-rose highlights, restrained red action arrows, generous whitespace, and a calm hand-sketched field-notes feel without looking childish. Use Geist typography, with Geist Mono for code-like labels and technical identifiers. Use a transparent background or exactly `#161416` as the base background, off-white text and outlines, restrained deep rose accents exactly `#BE123C`, and muted blue-gray or warm-gray secondary lines. Prefer rounded rectangles for actors/systems/grouped lists and hexagons for loop steps or lifecycle states when useful. Add only tasteful low-contrast background motifs, such as subtle paper grain, faint honeycomb/technical guide marks near the margins, or soft sketch shadows, while keeping the center clean and readable and the edges the solid background color. Create a precise architecture / flow / relationship diagram with short, highly legible labels and limited simple iconography only where it clarifies the content. Do not include any Tilebox logo, logo watermark, slide title, conclusion text, section number, footer, or full-canvas background pattern. Avoid clutter, colored filled cards, gradients, glossy depth effects, decorative corporate vector art, overdone icons, cursive, childish style, tiny text, and low contrast. + +### Dark-from-light color-only edit prompt base + +> Edit the provided light-mode Tilebox documentation graphic into dark mode. ONLY EDIT THE COLORS. DO NOT CHANGE ANY LAYOUT OR CONTENT. Preserve every label, icon, node, arrow, position, size, spacing, texture placement, and composition exactly. Change the base background to transparent or exactly `#161416`; change deep navy text and outlines to off-white / very light cool gray; keep the primary accent exactly `#BE123C`; adapt white or near-background cards to dark near-background cards; adapt pale blue-gray or warm-gray secondary lines, fills, shadows, and background texture to muted dark-mode equivalents. Preserve the thin technical sketch style, minimal line icons, and restrained red action accents. Do not add any Tilebox logo, logo watermark, slide title, conclusion text, section number, footer, or full-canvas background pattern. The dark mode graphic should be a 1-to-1 drop in replacement for the light version, users switching themes should just notice a color change, without any layout or other different jumps between the two versions. + +### Edit prompt base + +> Edit the provided Tilebox documentation graphic. Make only these changes: [specific changes]. Preserve the clean technical sketch documentation style from `references/best-practice-technical-sketch.jpeg`: thin outlines, mostly unfilled cards, minimal line icons, restrained deep-rose action accents, generous whitespace, Geist typography, Geist Mono for code-like labels where appropriate, no logos, no slide framing, no heavy or central background pattern, and all other content unchanged unless requested. Use the correct mode palette: [light or dark palette]. + +## Text handling + +Generated text in images can drift. Keep text short and inspect the output carefully. + +Prefer: + +- short labels; +- entity names; +- state names; +- method or field names; +- small numeric markers; +- simple arrows and connectors. + +Avoid: + +- paragraphs inside diagrams; +- marketing claims; +- page headings inside images; +- tiny legends; +- repeated labels that can be represented by grouping or color. + +## When to ask before proceeding + +Ask a short clarification if: + +- the target page or asset folder is unknown and there is no obvious location; +- the content of the diagram is underspecified; +- the user asks to overwrite existing image files and overwrite intent is unclear; +- exact aspect ratio or export format matters but was not specified. + +Do not ask whether to make light and dark versions unless the user’s instruction conflicts with the default. The default is both. diff --git a/.agents/skills/creating-documentation-graphics/references/best-practice-technical-sketch.jpeg b/.agents/skills/creating-documentation-graphics/references/best-practice-technical-sketch.jpeg new file mode 100644 index 0000000..0171ef7 Binary files /dev/null and b/.agents/skills/creating-documentation-graphics/references/best-practice-technical-sketch.jpeg differ diff --git a/.vscode/settings.json b/.vscode/settings.json index 201fca8..2d817f4 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,5 +1,10 @@ { "editor.formatOnSave": true, - "editor.rulers": [120], + "editor.rulers": [ + 120 + ], "files.insertFinalNewline": true, + "json.schemaDownload.trustedDomains": { + "https://mintlify.com/docs.json": true + }, } diff --git a/AGENTS.md b/AGENTS.md index 384525d..3da4334 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -101,18 +101,18 @@ Local setup and validation commands: ```bash # install tooling -npm i -g mintlify +npm i -g mint vale sync pre-commit install # run docs locally -mintlify dev +mint dev # lint prose vale . # check broken links -mintlify broken-links +mint broken-links # optional: run all hooks pre-commit run --all-files @@ -124,11 +124,13 @@ CI notes: 2. CI installs `mdx2vast` before running Vale. 3. CI runs `vale sync && vale .` and `mintlify broken-links`. -## Diagrams And Assets +## Documentation Graphics And Workflow DAG Assets -Workflow diagrams under `assets/workflows/diagrams/` are generated from `.d2` files via `generate.py`. +For normal documentation graphics, use the `creating-documentation-graphics` skill and generate PNGs with painter from the start. This applies to architecture diagrams, concept diagrams, release/deployment diagrams, data-flow diagrams, and other visual explainers embedded in docs pages. -When updating workflow diagrams: +Do not hand-author SVGs for normal documentation graphics just because they seem more maintainable. The intended output is a light/dark PNG pair, usually named like `name-light.png` and `name-dark.png`, referenced with the standard Mintlify light/dark image pattern. + +The D2/SVG workflow under `assets/workflows/diagrams/` is only for workflow DAG diagrams, such as task graphs, task-state DAGs, retry DAGs, optional-subtask trees, or existing diagrams with `.d2` sources. When updating those workflow DAG diagrams: 1. Edit the `.d2` source. 2. Regenerate SVG assets with `python generate.py` from `assets/workflows/diagrams/`. diff --git a/agentic-development/agent-skills.mdx b/agentic-development/agent-skills.mdx index 99a75cc..8799130 100644 --- a/agentic-development/agent-skills.mdx +++ b/agentic-development/agent-skills.mdx @@ -32,7 +32,8 @@ Tilebox skills cover common agent workflows around the CLI, datasets, workflows, | `managing-tilebox-datasets` | Creating schemas, updating metadata, and querying datasets | | `managing-tilebox-jobs` | Submitting jobs, monitoring status, reading logs, and inspecting spans | | `working-with-tilebox-automations` | Working with triggers, automations, and storage locations | -| `writing-tilebox-workflows` | Writing Tilebox workflow tasks and task runners | +| `writing-tilebox-workflows` | Writing workflow task classes, task graphs, runner definitions, caches, logs, and spans | +| `releasing-tilebox-workflows` | Creating `tilebox.workflow.toml`, building releases, publishing releases, deploying to clusters, and running release runners | ## How to use skills with agents @@ -43,3 +44,5 @@ Load the relevant Tilebox skills. Use the Tilebox CLI to submit a Sentinel-2 mos ``` Skills work best with the Tilebox CLI. The CLI gives the agent repeatable terminal commands, and skills tell it how to combine those commands safely. Use MCP as an alternative when the agent runs in a web or chat environment without practical terminal access. + +For Python workflow release work, ask the agent to use both `writing-tilebox-workflows` and `releasing-tilebox-workflows`. The typical loop is to edit tasks, build and publish a release, deploy it to a development cluster, run `tilebox runner start`, submit a test job, and inspect logs or spans before iterating. diff --git a/agentic-development/overview.mdx b/agentic-development/overview.mdx index e155bd7..58b1877 100644 --- a/agentic-development/overview.mdx +++ b/agentic-development/overview.mdx @@ -26,6 +26,9 @@ Tilebox supports agents through a CLI-based setup. The CLI gives agents a determ Install Tilebox skills that teach agents how to work with Tilebox tools. + + Use an agent to edit workflow code, publish releases, deploy them to a dev cluster, and inspect jobs. + ## Recommended setup diff --git a/agentic-development/tilebox-cli.mdx b/agentic-development/tilebox-cli.mdx index 15d4351..af4fde0 100644 --- a/agentic-development/tilebox-cli.mdx +++ b/agentic-development/tilebox-cli.mdx @@ -24,7 +24,9 @@ The CLI reads the `TILEBOX_API_KEY` environment variable. Create an API key in t export TILEBOX_API_KEY="YOUR_TILEBOX_API_KEY" ``` -Use API keys with the smallest permissions needed for the agent's task. + +Alternatively, you can also provide the API key, or override the environment variable with the `--api-key` flag in each command. + ## Discover commands with agent-context diff --git a/api-reference/go/datasets/Collect.mdx b/api-reference/go/datasets/Collect.mdx index 705f47e..fd2e448 100644 --- a/api-reference/go/datasets/Collect.mdx +++ b/api-reference/go/datasets/Collect.mdx @@ -10,12 +10,12 @@ func Collect[K any](seq iter.Seq2[K, error]) ([]K, error) Convert any sequence into a slice. -It return an error if any of the elements in the sequence has a non-nil error. +It returns an error if any element in the sequence has a non-nil error. ## Parameters - - The sequence of bytes to convert + + The sequence to convert. ## Returns diff --git a/api-reference/go/datasets/Collections.Create.mdx b/api-reference/go/datasets/Collections.Create.mdx index 108f378..e316b5e 100644 --- a/api-reference/go/datasets/Collections.Create.mdx +++ b/api-reference/go/datasets/Collections.Create.mdx @@ -1,5 +1,5 @@ --- -title: Client.Collections.Create +title: Collections.Create sidebarTitle: Collections.Create icon: layer-group --- diff --git a/api-reference/go/datasets/Collections.Delete.mdx b/api-reference/go/datasets/Collections.Delete.mdx index 45bc274..bff1d2b 100644 --- a/api-reference/go/datasets/Collections.Delete.mdx +++ b/api-reference/go/datasets/Collections.Delete.mdx @@ -1,5 +1,5 @@ --- -title: Client.Collections.Delete +title: Collections.Delete sidebarTitle: Collections.Delete icon: layer-group --- diff --git a/api-reference/go/datasets/Collections.Get.mdx b/api-reference/go/datasets/Collections.Get.mdx index ced5b3d..2393865 100644 --- a/api-reference/go/datasets/Collections.Get.mdx +++ b/api-reference/go/datasets/Collections.Get.mdx @@ -1,5 +1,5 @@ --- -title: Client.Collections.Get +title: Collections.Get sidebarTitle: Collections.Get icon: layer-group --- @@ -12,7 +12,7 @@ func (collectionClient) Get( ) (*datasets.Collection, error) ``` -Get a dataset by its slug. +Get a collection by name from a dataset. ## Parameters @@ -25,7 +25,7 @@ Get a dataset by its slug. ## Returns -The created collection object. +A collection object. ```go Go diff --git a/api-reference/go/datasets/Collections.GetOrCreate.mdx b/api-reference/go/datasets/Collections.GetOrCreate.mdx index bcec9ec..9039527 100644 --- a/api-reference/go/datasets/Collections.GetOrCreate.mdx +++ b/api-reference/go/datasets/Collections.GetOrCreate.mdx @@ -1,5 +1,5 @@ --- -title: Client.Collections.GetOrCreate +title: Collections.GetOrCreate sidebarTitle: Collections.GetOrCreate icon: layer-group --- diff --git a/api-reference/go/datasets/Collections.List.mdx b/api-reference/go/datasets/Collections.List.mdx index c1fc4d9..d23fd85 100644 --- a/api-reference/go/datasets/Collections.List.mdx +++ b/api-reference/go/datasets/Collections.List.mdx @@ -1,5 +1,5 @@ --- -title: Client.Collections.List +title: Collections.List sidebarTitle: Collections.List icon: layer-group --- diff --git a/api-reference/go/datasets/Create.mdx b/api-reference/go/datasets/Create.mdx new file mode 100644 index 0000000..b4f5df8 --- /dev/null +++ b/api-reference/go/datasets/Create.mdx @@ -0,0 +1,73 @@ +--- +title: Datasets.Create +sidebarTitle: Create +icon: laptop-code +--- + +```go +func (datasetClient) Create( + ctx context.Context, + kind datasets.DatasetKind, + codeName string, + name string, + fields []datasets.Field, + options ...datasets.DatasetOption, +) (*datasets.Dataset, error) +``` + +Create a dataset with the given code name, display name, schema kind, and custom fields. + +## Parameters + + + The dataset kind. + + + The stable code identifier for the dataset. + + + The display name of the dataset. + + + The custom fields in the dataset schema. + + + Options for dataset metadata. + + +## Options + + + Set a short dataset summary. + + + Set the dataset's markdown description. + + +## Dataset kinds + + + A dataset that contains timestamp, ID, and ingestion time fields. + + + A dataset that contains timestamp, ID, ingestion time, and geometry fields. + + +## Returns + +The created dataset object. + + +```go Go +dataset, err := client.Datasets.Create(ctx, + datasets.KindSpatiotemporal, + "my_catalog", + "My catalog", + []datasets.Field{ + field.String("source").Description("Source system"), + field.Float64("cloud_cover"), + }, + datasets.WithSummary("Scenes prepared for analysis"), +) +``` + diff --git a/api-reference/go/datasets/CreateOrUpdate.mdx b/api-reference/go/datasets/CreateOrUpdate.mdx index 9d161db..d2075ae 100644 --- a/api-reference/go/datasets/CreateOrUpdate.mdx +++ b/api-reference/go/datasets/CreateOrUpdate.mdx @@ -1,6 +1,6 @@ --- -title: Client.Datasets.Create -sidebarTitle: Create +title: Datasets.CreateOrUpdate +sidebarTitle: CreateOrUpdate icon: laptop-code --- @@ -8,36 +8,51 @@ icon: laptop-code func (datasetClient) CreateOrUpdate( ctx context.Context, kind datasets.DatasetKind, - codeName string, + codeName string, name string, - fields []Field, + fields []datasets.Field, + options ...datasets.DatasetOption, ) (*datasets.Dataset, error) ``` -Create a dataset or update an existing one, if a dataset with the given `codeName` already exists. +Create a dataset or update an existing dataset if a dataset with the given `codeName` already exists. + +If the dataset already exists, Tilebox applies the same schema update rules as a direct update. New fields can be added to non-empty datasets. Breaking schema changes are only allowed for empty datasets. ## Parameters - - The kind of the dataset + + The dataset kind. + + + The stable code identifier for the dataset. + + + The display name of the dataset. - - The code name of the dataset + + The custom fields in the dataset schema. - - The name of the dataset + + Options for dataset metadata. + + +## Options + + + Set a short dataset summary. - - The fields of the dataset + + Set the dataset's markdown description. ## Dataset kinds - A dataset that contains a timestamp field + A dataset that contains timestamp, ID, and ingestion time fields. - A dataset that contains a timestamp field and a geometry field + A dataset that contains timestamp, ID, ingestion time, and geometry fields. ## Field types @@ -87,19 +102,20 @@ Create a dataset or update an existing one, if a dataset with the given `codeNam ## Returns -The created dataset object. +The created or updated dataset object. ```go Go dataset, err := client.Datasets.CreateOrUpdate(ctx, datasets.KindSpatiotemporal, "my_catalog", - "My personal catalog", - []Field{ + "My catalog", + []datasets.Field{ field.String("field1"), field.Int64("field2").Repeated(), field.Geometry("field3").Description("Field 3").ExampleValue("Value 3"), }, + datasets.WithSummary("Scenes prepared for analysis"), ) ``` diff --git a/api-reference/go/datasets/DatapointDecoder.Unmarshal.mdx b/api-reference/go/datasets/DatapointDecoder.Unmarshal.mdx new file mode 100644 index 0000000..6b39f45 --- /dev/null +++ b/api-reference/go/datasets/DatapointDecoder.Unmarshal.mdx @@ -0,0 +1,55 @@ +--- +title: DatapointDecoder.Unmarshal +sidebarTitle: DatapointDecoder.Unmarshal +icon: layer-group +--- + +```go +func (d datasets.DatapointDecoder) Unmarshal( + descriptor *datasets.DatapointDescriptor, + data []byte, +) (map[string]any, error) +``` + +Decode a raw protobuf datapoint into a JSON-like map with configurable protobuf decoding options. + +## Parameters + + + The descriptor returned by [`NewDatapointDescriptor`](/api-reference/go/datasets/NewDatapointDescriptor). + + + The raw protobuf datapoint bytes returned by a datapoint query. + + +## Decoder options + + + Allow messages that are missing required fields. + + + Ignore unknown fields in the raw protobuf message. + + + Override the resolver used to look up message and extension types. + + + Limit how deeply nested messages may be decoded. A zero value uses the protobuf default. + + +## Returns + +A map of datapoint fields, or an error if the raw datapoint cannot be decoded. + + +```go Go +decoder := datasets.DatapointDecoder{ + DiscardUnknown: true, +} + +datapoint, err := decoder.Unmarshal(descriptor, data) +if err != nil { + return err +} +``` + diff --git a/api-reference/go/datasets/Datapoints.Delete.mdx b/api-reference/go/datasets/Datapoints.Delete.mdx index 119a0c2..80523de 100644 --- a/api-reference/go/datasets/Datapoints.Delete.mdx +++ b/api-reference/go/datasets/Datapoints.Delete.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.Delete +title: Datapoints.Delete sidebarTitle: Datapoints.Delete icon: layer-group --- diff --git a/api-reference/go/datasets/Datapoints.DeleteIDs.mdx b/api-reference/go/datasets/Datapoints.DeleteIDs.mdx index 2bae11a..eea3414 100644 --- a/api-reference/go/datasets/Datapoints.DeleteIDs.mdx +++ b/api-reference/go/datasets/Datapoints.DeleteIDs.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.DeleteIDs +title: Datapoints.DeleteIDs sidebarTitle: Datapoints.DeleteIDs icon: layer-group --- diff --git a/api-reference/go/datasets/Datapoints.GetInto.mdx b/api-reference/go/datasets/Datapoints.GetInto.mdx index be5f947..6ce654f 100644 --- a/api-reference/go/datasets/Datapoints.GetInto.mdx +++ b/api-reference/go/datasets/Datapoints.GetInto.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.GetInto +title: Datapoints.GetInto sidebarTitle: Datapoints.GetInto icon: layer-group --- diff --git a/api-reference/go/datasets/Datapoints.Ingest.mdx b/api-reference/go/datasets/Datapoints.Ingest.mdx index 761cc6d..02e9e8a 100644 --- a/api-reference/go/datasets/Datapoints.Ingest.mdx +++ b/api-reference/go/datasets/Datapoints.Ingest.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.Ingest +title: Datapoints.Ingest sidebarTitle: Datapoints.Ingest icon: layer-group --- @@ -32,8 +32,7 @@ Ingest data points into a collection. ## Returns -The list of datapoint ids that were ingested, including the IDs of existing data points in case of duplicates and -`allowExisting=true`. +An `IngestResponse` with `NumCreated`, `NumExisting`, and `DatapointIDs`. `DatapointIDs` includes the IDs of ingested datapoints and, when `allowExisting` is `true`, the IDs of datapoints that already existed. ```go Go @@ -50,7 +49,7 @@ datapoints := []*v1.Modis{ ingestResponse, err := client.Datapoints.Ingest(ctx, collectionID, - &datapoints + &datapoints, false, ) ``` @@ -59,5 +58,5 @@ ingestResponse, err := client.Datapoints.Ingest(ctx, ## Errors - If `allowExisting` is `False` and any of the datapoints attempting to ingest already exist. + If `allowExisting` is `false` and any datapoints already exist. diff --git a/api-reference/go/datasets/Datapoints.Query.mdx b/api-reference/go/datasets/Datapoints.Query.mdx index 8387836..397a89a 100644 --- a/api-reference/go/datasets/Datapoints.Query.mdx +++ b/api-reference/go/datasets/Datapoints.Query.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.Query +title: Datapoints.Query sidebarTitle: Datapoints.Query icon: layer-group --- @@ -14,7 +14,7 @@ func (datapointClient) Query( Query datapoints from one or more collections of the same dataset. -The datapoints are lazily queried and returned as a sequence of bytes. +The datapoints are lazily queried across pages and returned as a sequence of bytes. The output sequence can be transformed into a typed `proto.Message` using [CollectAs](/api-reference/go/datasets/CollectAs) or [As](/api-reference/go/datasets/As) functions. ## Parameters @@ -49,6 +49,12 @@ The output sequence can be transformed into a typed `proto.Message` using [Colle Skip the data when querying datapoints. If set, only the required and auto-generated fields will be returned. + + Start the query after the cursor returned by a previous page. + + + Limit the total number of datapoints yielded by the sequence. + ## Returns diff --git a/api-reference/go/datasets/Datapoints.QueryInto.mdx b/api-reference/go/datasets/Datapoints.QueryInto.mdx index f3bc41b..7ec85b7 100644 --- a/api-reference/go/datasets/Datapoints.QueryInto.mdx +++ b/api-reference/go/datasets/Datapoints.QueryInto.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datapoints.QueryInto +title: Datapoints.QueryInto sidebarTitle: Datapoints.QueryInto icon: layer-group --- @@ -52,6 +52,12 @@ QueryInto is a convenience function for [Query](/api-reference/go/datasets/Datap Skip the data when querying datapoints. If set, only the required and auto-generated fields will be returned. + + Start the query after the cursor returned by a previous page. + + + Limit the total number of datapoints returned. + ## Returns diff --git a/api-reference/go/datasets/Datapoints.QueryPage.mdx b/api-reference/go/datasets/Datapoints.QueryPage.mdx new file mode 100644 index 0000000..f6af90f --- /dev/null +++ b/api-reference/go/datasets/Datapoints.QueryPage.mdx @@ -0,0 +1,83 @@ +--- +title: Datapoints.QueryPage +sidebarTitle: Datapoints.QueryPage +icon: layer-group +--- + +```go +func (datapointClient) QueryPage( + ctx context.Context, + datasetID uuid.UUID, + options ...datasets.QueryOption, +) (*datasets.DatapointPage, error) +``` + +Query a single page of datapoints from one or more collections of the same dataset. + +Use `QueryPage` when you need manual pagination. Use [`Datapoints.Query`](/api-reference/go/datasets/Datapoints.Query) for automatic lazy pagination. + +## Parameters + + + The ID of the dataset to query. + + + Options for querying datapoints. + + +## Options + + + Specify the time or datapoint ID interval to query. + + + Specify the geographical extent to query. + + + Specify a geographical extent with an explicit spatial filter mode and coordinate system. + + + Restrict the query to specific dataset collections by collection object. + + + Restrict the query to specific dataset collections by collection ID. + + + Skip datapoint data and return only required and generated fields. + + + Start the query after the cursor returned by a previous page. + + + Limit the number of datapoints returned in this page. + + +## Returns + +A `DatapointPage` with raw protobuf datapoints in `Datapoints` and an optional `NextCursor` for the next page. + + +```go Go +page, err := client.Datapoints.QueryPage(ctx, + dataset.ID, + datasets.WithTemporalExtent(queryInterval), + datasets.WithCollectionIDs(collection.ID), + datasets.WithLimit(100), +) +if err != nil { + return err +} + +if page.NextCursor != nil { + nextPage, err := client.Datapoints.QueryPage(ctx, + dataset.ID, + datasets.WithTemporalExtent(queryInterval), + datasets.WithCollectionIDs(collection.ID), + datasets.WithCursor(page.NextCursor), + datasets.WithLimit(100), + ) + _ = nextPage + _ = err +} +``` + diff --git a/api-reference/go/datasets/Get.mdx b/api-reference/go/datasets/Get.mdx index 6a0aa04..61789ff 100644 --- a/api-reference/go/datasets/Get.mdx +++ b/api-reference/go/datasets/Get.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datasets.Get +title: Datasets.Get sidebarTitle: Get icon: laptop-code --- diff --git a/api-reference/go/datasets/List.mdx b/api-reference/go/datasets/List.mdx index 67efb66..3816c21 100644 --- a/api-reference/go/datasets/List.mdx +++ b/api-reference/go/datasets/List.mdx @@ -1,5 +1,5 @@ --- -title: Client.Datasets.List +title: Datasets.List sidebarTitle: List icon: laptop-code --- diff --git a/api-reference/go/datasets/NewDatapointDescriptor.mdx b/api-reference/go/datasets/NewDatapointDescriptor.mdx new file mode 100644 index 0000000..3c94eb9 --- /dev/null +++ b/api-reference/go/datasets/NewDatapointDescriptor.mdx @@ -0,0 +1,37 @@ +--- +title: datasets.NewDatapointDescriptor +sidebarTitle: NewDatapointDescriptor +icon: layer-group +--- + +```go +func NewDatapointDescriptor(dataset *datasets.Dataset) (*datasets.DatapointDescriptor, error) +``` + +Create a reusable descriptor for decoding raw protobuf datapoints from a loaded dataset. + +Use this helper when you want to query datasets without generated Go protobuf types. + +## Parameters + + + A dataset returned by `client.Datasets.Get` or another dataset client method. + + +## Returns + +A `DatapointDescriptor` that can be passed to [`UnmarshalDatapoint`](/api-reference/go/datasets/UnmarshalDatapoint) or [`DatapointDecoder.Unmarshal`](/api-reference/go/datasets/DatapointDecoder.Unmarshal). + + +```go Go +dataset, err := client.Datasets.Get(ctx, "open_data.copernicus.sentinel1_sar") +if err != nil { + return err +} + +descriptor, err := datasets.NewDatapointDescriptor(dataset) +if err != nil { + return err +} +``` + diff --git a/api-reference/go/datasets/UnmarshalDatapoint.mdx b/api-reference/go/datasets/UnmarshalDatapoint.mdx new file mode 100644 index 0000000..86b0492 --- /dev/null +++ b/api-reference/go/datasets/UnmarshalDatapoint.mdx @@ -0,0 +1,51 @@ +--- +title: datasets.UnmarshalDatapoint +sidebarTitle: UnmarshalDatapoint +icon: layer-group +--- + +```go +func UnmarshalDatapoint( + descriptor *datasets.DatapointDescriptor, + data []byte, +) (map[string]any, error) +``` + +Decode a raw protobuf datapoint into a JSON-like map using a dataset descriptor. + +## Parameters + + + The descriptor returned by [`NewDatapointDescriptor`](/api-reference/go/datasets/NewDatapointDescriptor). + + + The raw protobuf datapoint bytes returned by a datapoint query. + + +## Returns + +A map of datapoint fields, or an error if the raw datapoint cannot be decoded. + + +```go Go +descriptor, err := datasets.NewDatapointDescriptor(dataset) +if err != nil { + return err +} + +for data, err := range client.Datapoints.Query(ctx, + dataset.ID, + datasets.WithTemporalExtent(queryInterval), +) { + if err != nil { + return err + } + + datapoint, err := datasets.UnmarshalDatapoint(descriptor, data) + if err != nil { + return err + } + fmt.Println(datapoint["id"]) +} +``` + diff --git a/api-reference/go/datasets/Update.mdx b/api-reference/go/datasets/Update.mdx new file mode 100644 index 0000000..d6cdfb8 --- /dev/null +++ b/api-reference/go/datasets/Update.mdx @@ -0,0 +1,70 @@ +--- +title: Datasets.Update +sidebarTitle: Update +icon: laptop-code +--- + +```go +func (datasetClient) Update( + ctx context.Context, + id uuid.UUID, + kind datasets.DatasetKind, + codeName string, + name string, + fields []datasets.Field, + options ...datasets.DatasetOption, +) (*datasets.Dataset, error) +``` + +Update an existing dataset by ID with the given code name, display name, schema kind, custom fields, and metadata. + +## Parameters + + + The ID of the dataset to update. + + + The dataset kind. + + + The stable code identifier for the dataset. + + + The display name of the dataset. + + + The full custom field list for the dataset schema. + + + Options for dataset metadata. + + +## Options + + + Set a short dataset summary. + + + Set the dataset's markdown description. + + +## Returns + +The updated dataset object. + + +```go Go +dataset, err := client.Datasets.Update(ctx, + datasetID, + datasets.KindSpatiotemporal, + "my_catalog", + "My catalog", + []datasets.Field{ + field.String("source").Description("Source system"), + field.Float64("cloud_cover"), + field.Timestamp("processed_at"), + }, + datasets.WithDescription("Catalog of scenes prepared for analysis."), +) +``` + diff --git a/api-reference/go/workflows/Automations.Get.mdx b/api-reference/go/workflows/Automations.Get.mdx new file mode 100644 index 0000000..385af22 --- /dev/null +++ b/api-reference/go/workflows/Automations.Get.mdx @@ -0,0 +1,30 @@ +--- +title: Automations.Get +sidebarTitle: Automations.Get +icon: diagram-project +--- + +```go +func (automationClient) Get( + ctx context.Context, + automationID uuid.UUID, +) (*workflows.Automation, error) +``` + +Get an automation prototype by ID. + +## Parameters + + + The ID of the automation. + + +## Returns + +An automation object. + + +```go Go +automation, err := client.Automations.Get(ctx, automationID) +``` + diff --git a/api-reference/go/workflows/Automations.GetStorageLocation.mdx b/api-reference/go/workflows/Automations.GetStorageLocation.mdx new file mode 100644 index 0000000..c44464e --- /dev/null +++ b/api-reference/go/workflows/Automations.GetStorageLocation.mdx @@ -0,0 +1,30 @@ +--- +title: Automations.GetStorageLocation +sidebarTitle: Automations.GetStorageLocation +icon: diagram-project +--- + +```go +func (automationClient) GetStorageLocation( + ctx context.Context, + storageLocationID uuid.UUID, +) (*workflows.StorageLocation, error) +``` + +Get a storage location used by automation storage event triggers. + +## Parameters + + + The ID of the storage location. + + +## Returns + +A storage location object. + + +```go Go +location, err := client.Automations.GetStorageLocation(ctx, storageLocationID) +``` + diff --git a/api-reference/go/workflows/Automations.List.mdx b/api-reference/go/workflows/Automations.List.mdx new file mode 100644 index 0000000..6f97bfd --- /dev/null +++ b/api-reference/go/workflows/Automations.List.mdx @@ -0,0 +1,21 @@ +--- +title: Automations.List +sidebarTitle: Automations.List +icon: diagram-project +--- + +```go +func (automationClient) List(ctx context.Context) ([]*workflows.Automation, error) +``` + +List all automation prototypes. + +## Returns + +A list of automation objects. + + +```go Go +automations, err := client.Automations.List(ctx) +``` + diff --git a/api-reference/go/workflows/Automations.ListStorageLocations.mdx b/api-reference/go/workflows/Automations.ListStorageLocations.mdx new file mode 100644 index 0000000..176157e --- /dev/null +++ b/api-reference/go/workflows/Automations.ListStorageLocations.mdx @@ -0,0 +1,21 @@ +--- +title: Automations.ListStorageLocations +sidebarTitle: Automations.ListStorageLocations +icon: diagram-project +--- + +```go +func (automationClient) ListStorageLocations(ctx context.Context) ([]*workflows.StorageLocation, error) +``` + +List storage locations available for automation storage event triggers. + +## Returns + +A list of storage location objects. + + +```go Go +locations, err := client.Automations.ListStorageLocations(ctx) +``` + diff --git a/api-reference/go/workflows/Clusters.Create.mdx b/api-reference/go/workflows/Clusters.Create.mdx index 288c6d9..ff38145 100644 --- a/api-reference/go/workflows/Clusters.Create.mdx +++ b/api-reference/go/workflows/Clusters.Create.mdx @@ -1,5 +1,5 @@ --- -title: Client.Clusters.Create +title: Clusters.Create sidebarTitle: Clusters.Create icon: circle-nodes --- diff --git a/api-reference/go/workflows/Clusters.Delete.mdx b/api-reference/go/workflows/Clusters.Delete.mdx index 203cbef..ae8bcb5 100644 --- a/api-reference/go/workflows/Clusters.Delete.mdx +++ b/api-reference/go/workflows/Clusters.Delete.mdx @@ -1,5 +1,5 @@ --- -title: Client.Clusters.Delete +title: Clusters.Delete sidebarTitle: Clusters.Delete icon: circle-nodes --- diff --git a/api-reference/go/workflows/Clusters.Get.mdx b/api-reference/go/workflows/Clusters.Get.mdx index 249e553..099c540 100644 --- a/api-reference/go/workflows/Clusters.Get.mdx +++ b/api-reference/go/workflows/Clusters.Get.mdx @@ -1,5 +1,5 @@ --- -title: Client.Clusters.Get +title: Clusters.Get sidebarTitle: Clusters.Get icon: circle-nodes --- diff --git a/api-reference/go/workflows/Clusters.List.mdx b/api-reference/go/workflows/Clusters.List.mdx index 6f0692e..d010df8 100644 --- a/api-reference/go/workflows/Clusters.List.mdx +++ b/api-reference/go/workflows/Clusters.List.mdx @@ -1,5 +1,5 @@ --- -title: Client.Clusters.List +title: Clusters.List sidebarTitle: Clusters.List icon: circle-nodes --- diff --git a/api-reference/go/workflows/Collect.mdx b/api-reference/go/workflows/Collect.mdx index 3a20832..cc78af6 100644 --- a/api-reference/go/workflows/Collect.mdx +++ b/api-reference/go/workflows/Collect.mdx @@ -10,12 +10,12 @@ func Collect[K any](seq iter.Seq2[K, error]) ([]K, error) Convert any sequence into a slice. -It return an error if any of the elements in the sequence has a non-nil error. +It returns an error if any element in the sequence has a non-nil error. ## Parameters - - The sequence of bytes to convert + + The sequence to convert. ## Returns @@ -25,7 +25,7 @@ A slice of `K` or an error if any. ```go Go jobs, err := workflows.Collect( - client.Jobs.List(ctx, interval), + client.Jobs.Query(ctx, job.WithTemporalExtent(interval)), ) ``` diff --git a/api-reference/go/workflows/ConfigureConsoleLogging.mdx b/api-reference/go/workflows/ConfigureConsoleLogging.mdx index f52378b..de01f97 100644 --- a/api-reference/go/workflows/ConfigureConsoleLogging.mdx +++ b/api-reference/go/workflows/ConfigureConsoleLogging.mdx @@ -38,11 +38,13 @@ func main() { client := workflows.NewClient() runner, err := client.NewTaskRunner(ctx) if err != nil { - slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err)) + slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err)) return } - runner.Run(ctx) + if err := runner.RunForever(ctx); err != nil { + slog.ErrorContext(ctx, "runner failed", slog.Any("error", err)) + } } ``` diff --git a/api-reference/go/workflows/DefaultProgress.mdx b/api-reference/go/workflows/DefaultProgress.mdx new file mode 100644 index 0000000..5ea8980 --- /dev/null +++ b/api-reference/go/workflows/DefaultProgress.mdx @@ -0,0 +1,29 @@ +--- +title: workflows.DefaultProgress +sidebarTitle: DefaultProgress +icon: chart-gantt +--- + +```go +func DefaultProgress() workflows.ProgressTracker +``` + +Return the default progress tracker for the currently executing task. + +Use the returned tracker to update total and completed work units inside a task `Execute` method. + +## Returns + +A progress tracker for the default progress indicator. + + +```go Go +progress := workflows.DefaultProgress() +if err := progress.Add(ctx, 10); err != nil { + return err +} +if err := progress.Done(ctx, 1); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/Jobs.Cancel.mdx b/api-reference/go/workflows/Jobs.Cancel.mdx index 33142ef..a271d0b 100644 --- a/api-reference/go/workflows/Jobs.Cancel.mdx +++ b/api-reference/go/workflows/Jobs.Cancel.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.Cancel +title: Jobs.Cancel sidebarTitle: Jobs.Cancel icon: diagram-project --- @@ -8,7 +8,7 @@ icon: diagram-project func (*JobClient) Cancel(ctx context.Context, jobID uuid.UUID) error ``` -Cancel a job. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners. +Cancel a job. When a job is canceled, no queued tasks will be picked up by runners and executed even if runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by runners. ## Parameters diff --git a/api-reference/go/workflows/Jobs.Get.mdx b/api-reference/go/workflows/Jobs.Get.mdx index d712b10..4e30352 100644 --- a/api-reference/go/workflows/Jobs.Get.mdx +++ b/api-reference/go/workflows/Jobs.Get.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.Get +title: Jobs.Get sidebarTitle: Jobs.Get icon: diagram-project --- diff --git a/api-reference/go/workflows/Jobs.Query.mdx b/api-reference/go/workflows/Jobs.Query.mdx index 8cde4ef..2619ff2 100644 --- a/api-reference/go/workflows/Jobs.Query.mdx +++ b/api-reference/go/workflows/Jobs.Query.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.Query +title: Jobs.Query sidebarTitle: Jobs.Query icon: diagram-project --- @@ -11,10 +11,9 @@ func (*JobClient) Query( ) iter.Seq2[*workflows.Job, error] ``` -Query jobs in the specified interval. +Query jobs matching the provided options. -The jobs are lazily loaded and returned as a sequence of Jobs. -The jobs are returned sorted by creation time in reverse order. +The jobs are lazily loaded across pages and returned as a sequence of jobs. The output sequence can be transformed into a slice of Job using [Collect](/api-reference/go/workflows/Collect) function. ## Parameters @@ -25,23 +24,30 @@ The output sequence can be transformed into a slice of Job using [Collect](/api- ## Options - - Specify the time interval for which data should be queried. - Right now, a temporal extent is required for every query. + + Filter jobs by time or job ID interval. - - Specify the automation id for which data should be queried. - Only jobs that were created by the specified automation will be returned. + + Filter jobs by the automations that submitted them. - - Filter jobs by their state. Only jobs in any of the given states will be returned. + + Filter jobs by job state. - - Filter jobs by name. Only jobs with a matching name will be returned. + + Filter jobs by name. - + Filter jobs by the states of their tasks. Only jobs that have at least one task in any of the given states will be returned. Useful for finding jobs with [optional](/workflows/concepts/tasks#optional-tasks) task failures. + + Start the query after the cursor returned by a previous page. + + + Limit the total number of jobs yielded by the sequence. + + + Sort jobs by submission date. Use `job.Ascending` for oldest first or `job.Descending` for newest first. + ## Returns @@ -62,8 +68,9 @@ interval := query.NewTimeInterval( ) jobs, err := workflows.Collect( - client.Jobs.Query(ctx, + client.Jobs.Query(ctx, job.WithTemporalExtent(interval), + job.WithJobStates(job.Running, job.Started), ), ) ``` diff --git a/api-reference/go/workflows/Jobs.QueryLogs.mdx b/api-reference/go/workflows/Jobs.QueryLogs.mdx index 30f9f05..2899bf3 100644 --- a/api-reference/go/workflows/Jobs.QueryLogs.mdx +++ b/api-reference/go/workflows/Jobs.QueryLogs.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.QueryLogs +title: Jobs.QueryLogs sidebarTitle: Jobs.QueryLogs icon: rectangle-terminal --- @@ -8,7 +8,7 @@ icon: rectangle-terminal func (*JobClient) QueryLogs( ctx context.Context, jobID uuid.UUID, - options ...workflows.TelemetryQueryOption, + options ...job.TelemetryQueryOption, ) iter.Seq2[*workflows.LogRecord, error] ``` @@ -21,22 +21,25 @@ The logs are lazily loaded and returned as a sequence of log records. Use [Colle The ID of the job to query logs for. - + Options for querying logs. ## Options - - Sort logs by time. Use `workflows.Ascending` for oldest first or `workflows.Descending` for newest first. + + Start the query after the cursor returned by a previous page. - - Limit the number of log records returned. + + Limit the total number of log records yielded by the sequence. + + + Sort logs by time. Use `job.Ascending` for oldest first or `job.Descending` for newest first. ## Returns -A sequence of log records. Each record includes `Time`, `Level`, `Body`, and structured attributes. +A sequence of log records. Each record includes `Time`, `SeverityText`, `Body`, trace IDs, span IDs, and structured attributes. ```go Go @@ -47,6 +50,7 @@ import ( "github.com/google/uuid" "github.com/tilebox/tilebox-go/workflows/v1" + "github.com/tilebox/tilebox-go/workflows/v1/job" ) jobID := uuid.MustParse("019e07b1-916b-0630-f3ba-f1c33235d174") @@ -54,7 +58,7 @@ jobID := uuid.MustParse("019e07b1-916b-0630-f3ba-f1c33235d174") for record, err := range client.Jobs.QueryLogs( ctx, jobID, - workflows.WithSortDirection(workflows.Ascending), + job.WithSortDirection(job.Ascending), ) { if err != nil { slog.ErrorContext(ctx, "failed to query job logs", slog.Any("error", err)) @@ -63,7 +67,7 @@ for record, err := range client.Jobs.QueryLogs( fmt.Printf("%s %-5s %s\n", record.Time.Format(time.RFC3339), - record.Level, + record.SeverityText, record.Body, ) } diff --git a/api-reference/go/workflows/Jobs.QueryLogsPage.mdx b/api-reference/go/workflows/Jobs.QueryLogsPage.mdx new file mode 100644 index 0000000..97a2a3d --- /dev/null +++ b/api-reference/go/workflows/Jobs.QueryLogsPage.mdx @@ -0,0 +1,50 @@ +--- +title: Jobs.QueryLogsPage +sidebarTitle: Jobs.QueryLogsPage +icon: rectangle-terminal +--- + +```go +func (jobClient) QueryLogsPage( + ctx context.Context, + jobID uuid.UUID, + options ...job.TelemetryQueryOption, +) (*workflows.LogPage, error) +``` + +Query a single page of log records emitted while running a job. + +## Parameters + + + The ID of the job to query logs for. + + + Options for querying logs. + + +## Options + + + Start the query after the cursor returned by a previous page. + + + Limit the number of log records returned in this page. + + + Sort logs by time. Use `job.Ascending` for oldest first or `job.Descending` for newest first. + + +## Returns + +A `LogPage` with log records and an optional `NextCursor` for the next page. + + +```go Go +page, err := client.Jobs.QueryLogsPage(ctx, + jobID, + job.WithLimit(100), + job.WithSortDirection(job.Ascending), +) +``` + diff --git a/api-reference/go/workflows/Jobs.QueryPage.mdx b/api-reference/go/workflows/Jobs.QueryPage.mdx new file mode 100644 index 0000000..6230da3 --- /dev/null +++ b/api-reference/go/workflows/Jobs.QueryPage.mdx @@ -0,0 +1,75 @@ +--- +title: Jobs.QueryPage +sidebarTitle: Jobs.QueryPage +icon: diagram-project +--- + +```go +func (jobClient) QueryPage( + ctx context.Context, + options ...job.QueryOption, +) (*workflows.JobPage, error) +``` + +Query a single page of jobs matching the provided options. + +Use `QueryPage` when you need manual pagination. Use [`Jobs.Query`](/api-reference/go/workflows/Jobs.Query) for automatic lazy pagination. + +## Parameters + + + Options for querying jobs. + + +## Options + + + Filter jobs by time or job ID interval. + + + Filter jobs by the automations that submitted them. + + + Filter jobs by job state. + + + Filter jobs by task state. Only jobs with at least one task in any of the given states are returned. + + + Filter jobs by name. + + + Start the query after the cursor returned by a previous page. + + + Limit the number of jobs returned in this page. + + + Sort jobs by submission date. Use `job.Ascending` for oldest first or `job.Descending` for newest first. + + +## Returns + +A `JobPage` with jobs and an optional `NextCursor` for the next page. + + +```go Go +page, err := client.Jobs.QueryPage(ctx, + job.WithJobStates(job.Running, job.Started), + job.WithLimit(50), + job.WithSortDirection(job.Descending), +) +if err != nil { + return err +} + +if page.NextCursor != nil { + nextPage, err := client.Jobs.QueryPage(ctx, + job.WithCursor(page.NextCursor), + job.WithLimit(50), + ) + _ = nextPage + _ = err +} +``` + diff --git a/api-reference/go/workflows/Jobs.QuerySpans.mdx b/api-reference/go/workflows/Jobs.QuerySpans.mdx index f511e72..e978b14 100644 --- a/api-reference/go/workflows/Jobs.QuerySpans.mdx +++ b/api-reference/go/workflows/Jobs.QuerySpans.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.QuerySpans +title: Jobs.QuerySpans sidebarTitle: Jobs.QuerySpans icon: chart-gantt --- @@ -8,7 +8,7 @@ icon: chart-gantt func (*JobClient) QuerySpans( ctx context.Context, jobID uuid.UUID, - options ...workflows.TelemetryQueryOption, + options ...job.TelemetryQueryOption, ) iter.Seq2[*workflows.Span, error] ``` @@ -21,17 +21,20 @@ The spans are lazily loaded and returned as a sequence of spans. Use [Collect](/ The ID of the job to query spans for. - + Options for querying spans. ## Options - - Sort spans by start time. Use `workflows.Ascending` for oldest first or `workflows.Descending` for newest first. + + Start the query after the cursor returned by a previous page. - - Limit the number of spans returned. + + Limit the total number of spans yielded by the sequence. + + + Sort spans by start time. Use `job.Ascending` for oldest first or `job.Descending` for newest first. ## Returns @@ -47,6 +50,7 @@ import ( "github.com/google/uuid" "github.com/tilebox/tilebox-go/workflows/v1" + "github.com/tilebox/tilebox-go/workflows/v1/job" ) jobID := uuid.MustParse("019e07b1-916b-0630-f3ba-f1c33235d174") @@ -54,7 +58,7 @@ jobID := uuid.MustParse("019e07b1-916b-0630-f3ba-f1c33235d174") for span, err := range client.Jobs.QuerySpans( ctx, jobID, - workflows.WithSortDirection(workflows.Ascending), + job.WithSortDirection(job.Ascending), ) { if err != nil { slog.ErrorContext(ctx, "failed to query job spans", slog.Any("error", err)) diff --git a/api-reference/go/workflows/Jobs.QuerySpansPage.mdx b/api-reference/go/workflows/Jobs.QuerySpansPage.mdx new file mode 100644 index 0000000..e0b33f3 --- /dev/null +++ b/api-reference/go/workflows/Jobs.QuerySpansPage.mdx @@ -0,0 +1,50 @@ +--- +title: Jobs.QuerySpansPage +sidebarTitle: Jobs.QuerySpansPage +icon: chart-gantt +--- + +```go +func (jobClient) QuerySpansPage( + ctx context.Context, + jobID uuid.UUID, + options ...job.TelemetryQueryOption, +) (*workflows.SpanPage, error) +``` + +Query a single page of spans emitted while running a job. + +## Parameters + + + The ID of the job to query spans for. + + + Options for querying spans. + + +## Options + + + Start the query after the cursor returned by a previous page. + + + Limit the number of spans returned in this page. + + + Sort spans by start time. Use `job.Ascending` for oldest first or `job.Descending` for newest first. + + +## Returns + +A `SpanPage` with spans and an optional `NextCursor` for the next page. + + +```go Go +page, err := client.Jobs.QuerySpansPage(ctx, + jobID, + job.WithLimit(100), + job.WithSortDirection(job.Ascending), +) +``` + diff --git a/api-reference/go/workflows/Jobs.Retry.mdx b/api-reference/go/workflows/Jobs.Retry.mdx index 3fb773e..817a073 100644 --- a/api-reference/go/workflows/Jobs.Retry.mdx +++ b/api-reference/go/workflows/Jobs.Retry.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.Retry +title: Jobs.Retry sidebarTitle: Jobs.Retry icon: diagram-project --- @@ -11,7 +11,7 @@ func (*JobClient) Retry( ) (int64, error) ``` -Retry a job. All failed tasks will become queued again, and queued tasks will be picked up by task runners again. +Retry a job. All failed tasks will become queued again, and queued tasks will be picked up by runners again. ## Parameters diff --git a/api-reference/go/workflows/Jobs.Submit.mdx b/api-reference/go/workflows/Jobs.Submit.mdx index c587a72..75a87bc 100644 --- a/api-reference/go/workflows/Jobs.Submit.mdx +++ b/api-reference/go/workflows/Jobs.Submit.mdx @@ -1,5 +1,5 @@ --- -title: Client.Jobs.Submit +title: Jobs.Submit sidebarTitle: Jobs.Submit icon: diagram-project --- diff --git a/api-reference/go/workflows/NewPollingTaskRunner.mdx b/api-reference/go/workflows/NewPollingTaskRunner.mdx new file mode 100644 index 0000000..0a69b28 --- /dev/null +++ b/api-reference/go/workflows/NewPollingTaskRunner.mdx @@ -0,0 +1,45 @@ +--- +title: Client.NewPollingTaskRunner +sidebarTitle: NewPollingTaskRunner +icon: gear-code +--- + +```go +func (*Client) NewPollingTaskRunner( + ctx context.Context, + executor workflows.TaskExecutor, + options ...runner.Option, +) (*workflows.PollingTaskRunner, error) +``` + +Create a polling runner for a custom task executor. + +Use this lower-level runner when you need a custom execution backend, such as a dynamic runtime, but still want the Tilebox polling protocol to request, lease, and report tasks. + +## Parameters + + + The executor that reports task capabilities and executes leased tasks. + + + Options for initializing the polling runner. + + +## Options + + + The cluster to poll for tasks. If not provided, the default cluster is used. + + + Set the logger to use for the polling runner. + + +## Returns + +The created polling runner. + + +```go Go +runner, err := client.NewPollingTaskRunner(ctx, executor) +``` + diff --git a/api-reference/go/workflows/NewTaskRunner.mdx b/api-reference/go/workflows/NewTaskRunner.mdx index 2e68399..abb3fdc 100644 --- a/api-reference/go/workflows/NewTaskRunner.mdx +++ b/api-reference/go/workflows/NewTaskRunner.mdx @@ -11,29 +11,29 @@ func (*Client) NewTaskRunner( ) (*workflows.TaskRunner, error) ``` -Initialize a task runner. +Initialize a runner. ## Parameters - Options for initializing the task runner + Options for initializing the runner ## Options - + The [cluster](/workflows/concepts/clusters#managing-clusters) to connect to. If not provided, the default cluster is used. - - Set the logger to use for the task runner + + Set the logger to use for the runner - - Disable OpenTelemetry metrics for the task runner + + Disable OpenTelemetry metrics for the runner ## Returns -The created task runner object. +The created runner object. ```go Go diff --git a/api-reference/go/workflows/PollingTaskRunner.HasActiveTask.mdx b/api-reference/go/workflows/PollingTaskRunner.HasActiveTask.mdx new file mode 100644 index 0000000..8efec0f --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.HasActiveTask.mdx @@ -0,0 +1,23 @@ +--- +title: PollingTaskRunner.HasActiveTask +sidebarTitle: PollingTaskRunner.HasActiveTask +icon: gear-code +--- + +```go +func (*PollingTaskRunner) HasActiveTask() bool +``` + +Report whether the polling runner currently has an active task. + +## Returns + +`true` if a task is currently active. + + +```go Go +if runner.HasActiveTask() { + slog.Info("runner has an active task") +} +``` + diff --git a/api-reference/go/workflows/PollingTaskRunner.InterruptActiveTask.mdx b/api-reference/go/workflows/PollingTaskRunner.InterruptActiveTask.mdx new file mode 100644 index 0000000..dd2aa82 --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.InterruptActiveTask.mdx @@ -0,0 +1,31 @@ +--- +title: PollingTaskRunner.InterruptActiveTask +sidebarTitle: PollingTaskRunner.InterruptActiveTask +icon: gear-code +--- + +```go +func (*PollingTaskRunner) InterruptActiveTask(ctx context.Context) error +``` + +Interrupt the active task by canceling its lease extension and reporting it as failed. + +If there is no active task, the method returns nil. + +## Parameters + + + The context used to report the interrupted task. + + +## Returns + +An error if the active task could not be reported as failed. + + +```go Go +if err := runner.InterruptActiveTask(ctx); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/PollingTaskRunner.IsRequestingTasks.mdx b/api-reference/go/workflows/PollingTaskRunner.IsRequestingTasks.mdx new file mode 100644 index 0000000..591d2e4 --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.IsRequestingTasks.mdx @@ -0,0 +1,23 @@ +--- +title: PollingTaskRunner.IsRequestingTasks +sidebarTitle: PollingTaskRunner.IsRequestingTasks +icon: gear-code +--- + +```go +func (*PollingTaskRunner) IsRequestingTasks() bool +``` + +Report whether the polling runner is currently requesting new tasks. + +## Returns + +`true` if the runner is requesting new tasks. + + +```go Go +if runner.IsRequestingTasks() { + slog.Info("runner is requesting tasks") +} +``` + diff --git a/api-reference/go/workflows/PollingTaskRunner.RunAll.mdx b/api-reference/go/workflows/PollingTaskRunner.RunAll.mdx new file mode 100644 index 0000000..676f047 --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.RunAll.mdx @@ -0,0 +1,29 @@ +--- +title: PollingTaskRunner.RunAll +sidebarTitle: PollingTaskRunner.RunAll +icon: gear-code +--- + +```go +func (*PollingTaskRunner) RunAll(ctx context.Context) error +``` + +Run the polling runner until there are no more tasks available. + +## Parameters + + + The context controlling the polling loop lifetime. + + +## Returns + +An error if the polling loop fails. + + +```go Go +if err := runner.RunAll(ctx); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/PollingTaskRunner.RunForever.mdx b/api-reference/go/workflows/PollingTaskRunner.RunForever.mdx new file mode 100644 index 0000000..17f8973 --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.RunForever.mdx @@ -0,0 +1,29 @@ +--- +title: PollingTaskRunner.RunForever +sidebarTitle: PollingTaskRunner.RunForever +icon: gear-code +--- + +```go +func (*PollingTaskRunner) RunForever(ctx context.Context) error +``` + +Run the polling runner continuously, polling for tasks when idle. + +## Parameters + + + The context controlling the polling loop lifetime. + + +## Returns + +An error if the polling loop fails. + + +```go Go +if err := runner.RunForever(ctx); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/PollingTaskRunner.StopRequestingNewTasks.mdx b/api-reference/go/workflows/PollingTaskRunner.StopRequestingNewTasks.mdx new file mode 100644 index 0000000..811db55 --- /dev/null +++ b/api-reference/go/workflows/PollingTaskRunner.StopRequestingNewTasks.mdx @@ -0,0 +1,23 @@ +--- +title: PollingTaskRunner.StopRequestingNewTasks +sidebarTitle: PollingTaskRunner.StopRequestingNewTasks +icon: gear-code +--- + +```go +func (*PollingTaskRunner) StopRequestingNewTasks() +``` + +Stop requesting new tasks from the Tilebox API. + +The runner can still finish and report an active task after this method is called. + +## Returns + +Nothing. + + +```go Go +runner.StopRequestingNewTasks() +``` + diff --git a/api-reference/go/workflows/Progress.mdx b/api-reference/go/workflows/Progress.mdx new file mode 100644 index 0000000..755c2ab --- /dev/null +++ b/api-reference/go/workflows/Progress.mdx @@ -0,0 +1,32 @@ +--- +title: workflows.Progress +sidebarTitle: Progress +icon: chart-gantt +--- + +```go +func Progress(label string) workflows.ProgressTracker +``` + +Return a named progress tracker for the currently executing task. + +Use named trackers when a task reports multiple progress indicators. + +## Parameters + + + The progress indicator label. + + +## Returns + +A progress tracker for the named progress indicator. + + +```go Go +download := workflows.Progress("Download") +if err := download.Add(ctx, 100); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/ProgressTracker.Add.mdx b/api-reference/go/workflows/ProgressTracker.Add.mdx new file mode 100644 index 0000000..22138db --- /dev/null +++ b/api-reference/go/workflows/ProgressTracker.Add.mdx @@ -0,0 +1,31 @@ +--- +title: ProgressTracker.Add +sidebarTitle: ProgressTracker.Add +icon: chart-gantt +--- + +```go +func (ProgressTracker) Add(ctx context.Context, n uint64) error +``` + +Add total work units to a progress indicator. + +## Parameters + + + The task execution context. + + + The number of total work units to add. + + +## Returns + +An error if the context is not a task execution context. + + +```go Go +progress := workflows.DefaultProgress() +err := progress.Add(ctx, 10) +``` + diff --git a/api-reference/go/workflows/ProgressTracker.Done.mdx b/api-reference/go/workflows/ProgressTracker.Done.mdx new file mode 100644 index 0000000..77864c5 --- /dev/null +++ b/api-reference/go/workflows/ProgressTracker.Done.mdx @@ -0,0 +1,31 @@ +--- +title: ProgressTracker.Done +sidebarTitle: ProgressTracker.Done +icon: chart-gantt +--- + +```go +func (ProgressTracker) Done(ctx context.Context, n uint64) error +``` + +Mark work units as completed for a progress indicator. + +## Parameters + + + The task execution context. + + + The number of completed work units to add. + + +## Returns + +An error if the context is not a task execution context. + + +```go Go +progress := workflows.DefaultProgress() +err := progress.Done(ctx, 1) +``` + diff --git a/api-reference/go/workflows/SetTaskDisplay.mdx b/api-reference/go/workflows/SetTaskDisplay.mdx new file mode 100644 index 0000000..ec122ef --- /dev/null +++ b/api-reference/go/workflows/SetTaskDisplay.mdx @@ -0,0 +1,37 @@ +--- +title: workflows.SetTaskDisplay +sidebarTitle: SetTaskDisplay +icon: code +--- + +```go +func SetTaskDisplay(ctx context.Context, display string) error +``` + +Set the display label of the currently executing task. + +This function is intended to be used inside a task `Execute` method. + +## Parameters + + + The task execution context. + + + The display label to set for the current task. + + +## Returns + +An error if the context is not a task execution context. + + +```go Go +func (t *ProcessScene) Execute(ctx context.Context) error { + if err := workflows.SetTaskDisplay(ctx, "Process scene " + t.SceneID); err != nil { + return err + } + return nil +} +``` + diff --git a/api-reference/go/workflows/SubmitSubtask.mdx b/api-reference/go/workflows/SubmitSubtask.mdx index 2728d84..7f7d6ac 100644 --- a/api-reference/go/workflows/SubmitSubtask.mdx +++ b/api-reference/go/workflows/SubmitSubtask.mdx @@ -12,7 +12,7 @@ workflows.SubmitSubtask( ) (subtask.FutureTask, error) ``` -Submit a subtask to the task runner. +Submit a subtask to the runner. This function is intended to be used in tasks. @@ -30,7 +30,7 @@ This function is intended to be used in tasks. Set dependencies for the task - + Set the cluster slug of the cluster where the task will be executed. diff --git a/api-reference/go/workflows/SubmitSubtasks.mdx b/api-reference/go/workflows/SubmitSubtasks.mdx index 711017b..9b313a4 100644 --- a/api-reference/go/workflows/SubmitSubtasks.mdx +++ b/api-reference/go/workflows/SubmitSubtasks.mdx @@ -12,7 +12,7 @@ workflows.SubmitSubtasks( ) ([]subtask.FutureTask, error) ``` -Submit multiple subtasks to the task runner. Same as [SubmitSubtask](/api-reference/go/workflows/SubmitSubtask), but accepts a list of tasks. +Submit multiple subtasks to the runner. Same as [SubmitSubtask](/api-reference/go/workflows/SubmitSubtask), but accepts a list of tasks. This function is intended to be used in tasks. @@ -30,7 +30,7 @@ This function is intended to be used in tasks. Set dependencies for the tasks - + Set the cluster slug of the cluster where the tasks will be executed. diff --git a/api-reference/go/workflows/Task.mdx b/api-reference/go/workflows/Task.mdx index e127657..bf1ce2e 100644 --- a/api-reference/go/workflows/Task.mdx +++ b/api-reference/go/workflows/Task.mdx @@ -17,7 +17,7 @@ Task.Execute(ctx context.Context) error ``` The entry point for the execution of the task. -If not defined, the task can't be registered with a task runner but can still be submitted. +If not defined, the task can't be registered with a runner but can still be submitted. ```go Task.Identifier() TaskIdentifier @@ -25,7 +25,7 @@ Task.Identifier() TaskIdentifier Provides a user-defined task identifier. The identifier is used to uniquely identify the task and specify its version. -If not defined, the task runner will generate an identifier for it using reflection. +If not defined, the runner will generate an identifier for it using reflection. ## JSON-serializable task diff --git a/api-reference/go/workflows/TaskExecutor.mdx b/api-reference/go/workflows/TaskExecutor.mdx new file mode 100644 index 0000000..b41b09b --- /dev/null +++ b/api-reference/go/workflows/TaskExecutor.mdx @@ -0,0 +1,25 @@ +--- +title: TaskExecutor +sidebarTitle: TaskExecutor +icon: gear-code +--- + +```go +type TaskExecutor interface { + TaskIdentifiers() []*workflowsv1.TaskIdentifier + ExecuteTask(ctx context.Context, task *workflowsv1.Task) (*workflowsv1.ExecuteTaskResponse, error) +} +``` + +`TaskExecutor` is the interface used by [`NewPollingTaskRunner`](/api-reference/go/workflows/NewPollingTaskRunner) for custom task execution backends. + +Use this interface when you need Tilebox to poll and lease tasks, but you want another runtime to execute them. + +## Methods + + + Return the task implementations the executor can currently run. This method should be cheap and non-blocking. + + + Execute a leased task and return the computed or failed task response. + diff --git a/api-reference/go/workflows/TaskRunner.GetRegisteredTask.mdx b/api-reference/go/workflows/TaskRunner.GetRegisteredTask.mdx index bf5c059..366d725 100644 --- a/api-reference/go/workflows/TaskRunner.GetRegisteredTask.mdx +++ b/api-reference/go/workflows/TaskRunner.GetRegisteredTask.mdx @@ -15,7 +15,7 @@ Get the task with the given identifier. ## Parameters - A display name for the cluster + The identifier of the registered task. ## Returns diff --git a/api-reference/go/workflows/TaskRunner.RegisterTasks.mdx b/api-reference/go/workflows/TaskRunner.RegisterTasks.mdx index 190248e..e32ef6d 100644 --- a/api-reference/go/workflows/TaskRunner.RegisterTasks.mdx +++ b/api-reference/go/workflows/TaskRunner.RegisterTasks.mdx @@ -7,7 +7,7 @@ icon: gear-code func (*TaskRunner) RegisterTasks(tasks ...workflows.ExecutableTask) error ``` -Register tasks that can be executed by this task runner. +Register tasks that can be executed by this runner. ## Parameters diff --git a/api-reference/go/workflows/TaskRunner.Run.mdx b/api-reference/go/workflows/TaskRunner.Run.mdx deleted file mode 100644 index d9b2490..0000000 --- a/api-reference/go/workflows/TaskRunner.Run.mdx +++ /dev/null @@ -1,16 +0,0 @@ ---- -title: TaskRunner.Run -icon: gear-code ---- - -```go -func (*TaskRunner) Run(ctx context.Context) -``` - -Run the task runner forever, looking for new tasks to run and polling for new tasks when idle. - - -```go Go -runner.Run(ctx) -``` - diff --git a/api-reference/go/workflows/TaskRunner.RunAll.mdx b/api-reference/go/workflows/TaskRunner.RunAll.mdx new file mode 100644 index 0000000..d994fd0 --- /dev/null +++ b/api-reference/go/workflows/TaskRunner.RunAll.mdx @@ -0,0 +1,31 @@ +--- +title: TaskRunner.RunAll +sidebarTitle: TaskRunner.RunAll +icon: gear-code +--- + +```go +func (*TaskRunner) RunAll(ctx context.Context) error +``` + +Run the runner until there are no more tasks available. + +Use `RunAll` for finite worker processes and local draining workflows. Use [`RunForever`](/api-reference/go/workflows/TaskRunner.RunForever) for long-running runners. + +## Parameters + + + The context controlling the runner lifetime. + + +## Returns + +An error if the polling loop fails. + + +```go Go +if err := runner.RunAll(ctx); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/TaskRunner.RunForever.mdx b/api-reference/go/workflows/TaskRunner.RunForever.mdx new file mode 100644 index 0000000..c0822c2 --- /dev/null +++ b/api-reference/go/workflows/TaskRunner.RunForever.mdx @@ -0,0 +1,31 @@ +--- +title: TaskRunner.RunForever +sidebarTitle: TaskRunner.RunForever +icon: gear-code +--- + +```go +func (*TaskRunner) RunForever(ctx context.Context) error +``` + +Run the runner continuously, polling for tasks when idle. + +`RunForever` stops when the context is canceled or when the process receives a supported shutdown signal. + +## Parameters + + + The context controlling the runner lifetime. + + +## Returns + +An error if the polling loop fails. + + +```go Go +if err := runner.RunForever(ctx); err != nil { + return err +} +``` + diff --git a/api-reference/go/workflows/WithSpan.mdx b/api-reference/go/workflows/WithSpan.mdx index b7fcbbd..a9e1008 100644 --- a/api-reference/go/workflows/WithSpan.mdx +++ b/api-reference/go/workflows/WithSpan.mdx @@ -12,7 +12,7 @@ func WithSpan( ) error ``` -Wrap a function with a [tracing span](/workflows/observability/tracing) using the current task runner's tracer. +Wrap a function with a [tracing span](/workflows/observability/tracing) using the current runner's tracer. Use `WithSpan` inside a task `Execute` method. If the context is not a task execution context, the function runs without creating a span. diff --git a/api-reference/go/workflows/WithSpanResult.mdx b/api-reference/go/workflows/WithSpanResult.mdx index 5dc8a44..f840b98 100644 --- a/api-reference/go/workflows/WithSpanResult.mdx +++ b/api-reference/go/workflows/WithSpanResult.mdx @@ -12,7 +12,7 @@ func WithSpanResult[Result any]( ) (Result, error) ``` -Wrap a function with a [tracing span](/workflows/observability/tracing) using the current task runner's tracer and return the function result. +Wrap a function with a [tracing span](/workflows/observability/tracing) using the current runner's tracer and return the function result. Use `WithSpanResult` inside a task `Execute` method. If the context is not a task execution context, the function runs without creating a span. diff --git a/api-reference/go/workflows/WithTaskSpan.mdx b/api-reference/go/workflows/WithTaskSpan.mdx index b4f5361..148b66c 100644 --- a/api-reference/go/workflows/WithTaskSpan.mdx +++ b/api-reference/go/workflows/WithTaskSpan.mdx @@ -12,20 +12,25 @@ workflows.WithTaskSpan( ) error ``` -Wrap a function with a [tracing span](/workflows/observability/tracing). +Wrap a function with a [tracing span](/workflows/observability/tracing) using the current runner's tracer. + +`WithTaskSpan` is an alias for [`WithSpan`](/api-reference/go/workflows/WithSpan). Use it inside a task `Execute` method. If the context is not a task execution context, the function runs without creating a span. ## Parameters - - The name of the span + + The task execution context. + + + The name of the span. - - The function to wrap + + The function to wrap. ## Returns -An error if any. +The error returned by `f`, if any. ```go Go diff --git a/api-reference/go/workflows/WithTaskSpanResult.mdx b/api-reference/go/workflows/WithTaskSpanResult.mdx index 5d4fd37..8a838fa 100644 --- a/api-reference/go/workflows/WithTaskSpanResult.mdx +++ b/api-reference/go/workflows/WithTaskSpanResult.mdx @@ -12,20 +12,25 @@ workflows.WithTaskSpanResult[Result any]( ) (Result, error) ``` -Wrap a function with a [tracing span](/workflows/observability/tracing). +Wrap a function with a [tracing span](/workflows/observability/tracing) using the current runner's tracer and return the function result. + +`WithTaskSpanResult` is an alias for [`WithSpanResult`](/api-reference/go/workflows/WithSpanResult). Use it inside a task `Execute` method. If the context is not a task execution context, the function runs without creating a span. ## Parameters - - The name of the span + + The task execution context. + + + The name of the span. - - The function to wrap + + The function to wrap. ## Returns -The result of the function and an error if any. +The result and error returned by `f`. ```go Go diff --git a/api-reference/go/workflows/Workflows.Create.mdx b/api-reference/go/workflows/Workflows.Create.mdx new file mode 100644 index 0000000..ef71668 --- /dev/null +++ b/api-reference/go/workflows/Workflows.Create.mdx @@ -0,0 +1,43 @@ +--- +title: Workflows.Create +sidebarTitle: Workflows.Create +icon: diagram-project +--- + +```go +func (workflowClient) Create( + ctx context.Context, + name string, + options ...workflows.WorkflowOption, +) (*workflows.Workflow, error) +``` + +Create a workflow. + +## Parameters + + + The workflow display name. + + + Options for creating the workflow. + + +## Options + + + Set the workflow description. + + +## Returns + +The created workflow object. + + +```go Go +workflow, err := client.Workflows.Create(ctx, + "Scene processing", + workflows.WithDescription("Process new scenes into analysis-ready outputs."), +) +``` + diff --git a/api-reference/go/workflows/Workflows.DeployRelease.mdx b/api-reference/go/workflows/Workflows.DeployRelease.mdx new file mode 100644 index 0000000..13b342d --- /dev/null +++ b/api-reference/go/workflows/Workflows.DeployRelease.mdx @@ -0,0 +1,42 @@ +--- +title: Workflows.DeployRelease +sidebarTitle: Workflows.DeployRelease +icon: diagram-project +--- + +```go +func (workflowClient) DeployRelease( + ctx context.Context, + workflowSlug string, + releaseID uuid.UUID, + clusterSlugs []string, +) (*workflows.WorkflowReleaseDeployment, error) +``` + +Deploy a workflow release to one or more clusters. + +## Parameters + + + The workflow slug. + + + The ID of the workflow release to deploy. + + + The clusters to deploy the release to. + + +## Returns + +The deployment result, including the release and affected clusters. + + +```go Go +deployment, err := client.Workflows.DeployRelease(ctx, + workflow.Slug, + release.ID, + []string{"production"}, +) +``` + diff --git a/api-reference/go/workflows/Workflows.Get.mdx b/api-reference/go/workflows/Workflows.Get.mdx new file mode 100644 index 0000000..53f12c6 --- /dev/null +++ b/api-reference/go/workflows/Workflows.Get.mdx @@ -0,0 +1,30 @@ +--- +title: Workflows.Get +sidebarTitle: Workflows.Get +icon: diagram-project +--- + +```go +func (workflowClient) Get( + ctx context.Context, + slug string, +) (*workflows.Workflow, error) +``` + +Get a workflow by slug. + +## Parameters + + + The workflow slug. + + +## Returns + +A workflow object. + + +```go Go +workflow, err := client.Workflows.Get(ctx, "scene-processing") +``` + diff --git a/api-reference/go/workflows/Workflows.List.mdx b/api-reference/go/workflows/Workflows.List.mdx new file mode 100644 index 0000000..8ddfb32 --- /dev/null +++ b/api-reference/go/workflows/Workflows.List.mdx @@ -0,0 +1,21 @@ +--- +title: Workflows.List +sidebarTitle: Workflows.List +icon: diagram-project +--- + +```go +func (workflowClient) List(ctx context.Context) ([]*workflows.Workflow, error) +``` + +List all workflows. + +## Returns + +A list of workflow objects. + + +```go Go +workflowsList, err := client.Workflows.List(ctx) +``` + diff --git a/api-reference/go/workflows/Workflows.PublishRelease.mdx b/api-reference/go/workflows/Workflows.PublishRelease.mdx new file mode 100644 index 0000000..43fb88e --- /dev/null +++ b/api-reference/go/workflows/Workflows.PublishRelease.mdx @@ -0,0 +1,46 @@ +--- +title: Workflows.PublishRelease +sidebarTitle: Workflows.PublishRelease +icon: diagram-project +--- + +```go +func (workflowClient) PublishRelease( + ctx context.Context, + workflowSlug string, + artifactID uuid.UUID, + content *workflows.ReleaseContent, +) (*workflows.WorkflowRelease, error) +``` + +Publish an immutable release for a workflow. + +## Parameters + + + The workflow slug. + + + The ID of the release artifact. + + + The files, task identifiers, runner object path, and optional command override included in the release. + + +## Returns + +The published workflow release. + + +```go Go +release, err := client.Workflows.PublishRelease(ctx, + workflow.Slug, + artifactID, + &workflows.ReleaseContent{ + Fingerprint: fingerprint, + Tasks: []workflows.TaskIdentifier{workflows.NewTaskIdentifier("tilebox.com/tasks/ProcessScene", "v1.0")}, + RunnerObjectPath: "worker.runner", + }, +) +``` + diff --git a/api-reference/go/workflows/Workflows.UndeployRelease.mdx b/api-reference/go/workflows/Workflows.UndeployRelease.mdx new file mode 100644 index 0000000..623468d --- /dev/null +++ b/api-reference/go/workflows/Workflows.UndeployRelease.mdx @@ -0,0 +1,42 @@ +--- +title: Workflows.UndeployRelease +sidebarTitle: Workflows.UndeployRelease +icon: diagram-project +--- + +```go +func (workflowClient) UndeployRelease( + ctx context.Context, + workflowSlug string, + releaseID uuid.UUID, + clusterSlugs []string, +) (*workflows.WorkflowReleaseDeployment, error) +``` + +Remove a workflow release from one or more clusters. + +## Parameters + + + The workflow slug. + + + The ID of the workflow release to remove from clusters. + + + The clusters to remove the release from. + + +## Returns + +The result, including the release and affected clusters. + + +```go Go +deployment, err := client.Workflows.UndeployRelease(ctx, + workflow.Slug, + release.ID, + []string{"production"}, +) +``` + diff --git a/api-reference/python/tilebox.datasets/Client.create_or_update_dataset.mdx b/api-reference/python/tilebox.datasets/Client.create_or_update_dataset.mdx index 3b4d7c3..44f91ad 100644 --- a/api-reference/python/tilebox.datasets/Client.create_or_update_dataset.mdx +++ b/api-reference/python/tilebox.datasets/Client.create_or_update_dataset.mdx @@ -8,26 +8,27 @@ icon: laptop-code def Client.create_or_update_dataset( kind: DatasetKind, code_name: str, + fields: list[FieldDict] | None = None, + *, name: str | None = None, - custom_fields: list[FieldDict], -) -> Dataset +) -> DatasetClient ``` -Create a dataset. +Create a dataset, or update the existing dataset with the same code name. ## Parameters - The kind of the dataset + The kind of the dataset. - The code name of the dataset + The code name of the dataset. - - The name of the dataset + + The custom fields of the dataset. Defaults to an empty field list. - - The fields of the dataset + + The display name of the dataset. Defaults to the code name when creating a dataset, and to the existing name when updating a dataset. ## Dataset kinds @@ -91,17 +92,20 @@ Note that the type can also be a list of one of the types, indicating that the f ## Returns -The created dataset object. +A `DatasetClient` for the created or updated dataset. ```python Python from shapely import Geometry +from tilebox.datasets import Client +from tilebox.datasets.data.datasets import DatasetKind + +client = Client() dataset = client.create_or_update_dataset( - DatasetKind.SPATIOTEMPORAL, - "my_catalog", - "My personal catalog", - [ + kind=DatasetKind.SPATIOTEMPORAL, + code_name="my_catalog", + fields=[ { "name": "field1", "type": str, @@ -116,7 +120,8 @@ dataset = client.create_or_update_dataset( "description": "Field 3", "example_value": "Value 3", }, - ], + ], + name="My personal catalog", ) ``` diff --git a/api-reference/python/tilebox.datasets/Client.mdx b/api-reference/python/tilebox.datasets/Client.mdx index 4c1d50c..530c69b 100644 --- a/api-reference/python/tilebox.datasets/Client.mdx +++ b/api-reference/python/tilebox.datasets/Client.mdx @@ -4,21 +4,32 @@ icon: code --- ```python -class Client(url: str, token: str) +class Client( + *, + url: str = "https://api.tilebox.com", + token: str | None = None, + warn_if_unauthenticated: bool = True, +) ``` Create a Tilebox datasets client. ## Parameters - + Tilebox API Url. Defaults to `https://api.tilebox.com`. - - The API Key to authenticate with. If not set the `TILEBOX_API_KEY` environment variable will be used. + + The API key to authenticate with. If not set, the `TILEBOX_API_KEY` environment variable is used. + + + + Whether to log a warning when no API key is provided for the Tilebox API URL. Defaults to `True`. +If no API key is provided, the client uses anonymous open data access for public datasets. + ```python Python from tilebox.datasets import Client @@ -29,7 +40,7 @@ client = Client() # or provide connection details directly client = Client( url="https://api.tilebox.com", - token="YOUR_TILEBOX_API_KEY" + token="YOUR_TILEBOX_API_KEY", ) ``` diff --git a/api-reference/python/tilebox.datasets/Collection.delete.mdx b/api-reference/python/tilebox.datasets/Collection.delete.mdx index 9923782..a85072d 100644 --- a/api-reference/python/tilebox.datasets/Collection.delete.mdx +++ b/api-reference/python/tilebox.datasets/Collection.delete.mdx @@ -4,7 +4,11 @@ icon: layer-group --- ```python -def Collection.delete(datapoints: DatapointIDs) -> int +def Collection.delete( + datapoints: DatapointIDs, + *, + show_progress: bool | Callable[[float], None] = False, +) -> int ``` Delete data points from the collection. @@ -20,7 +24,7 @@ Data points are identified and deleted by their ids. Datapoint IDs to delete from the collection. - Supported `DatapointIDs` types are. + Supported `DatapointIDs` types are: - A `pandas.DataFrame` containing an `id` column. - A `pandas.Series` containing datapoint IDs. - An `xarray.Dataset` containing an "id" variable. @@ -30,6 +34,10 @@ Data points are identified and deleted by their ids. - A `Collection[str]` containing datapoint IDs as strings, e.g. `list[str]`. + + If `True`, display a progress bar while deleting many datapoints. You can also pass a callback to receive progress values between `0` and `1`. Defaults to `False`. + + ## Returns The number of data points that were deleted. diff --git a/api-reference/python/tilebox.datasets/Collection.info.mdx b/api-reference/python/tilebox.datasets/Collection.info.mdx index 7a79924..e0db610 100644 --- a/api-reference/python/tilebox.datasets/Collection.info.mdx +++ b/api-reference/python/tilebox.datasets/Collection.info.mdx @@ -4,11 +4,26 @@ icon: layer-group --- ```python -def Collection.info() -> CollectionInfo +def Collection.info( + availability: bool | None = None, + count: bool | None = None, +) -> CollectionInfo ``` Fetch metadata about the data points in this collection. +Collection availability and datapoint counts are always returned. The `availability` and `count` arguments are deprecated and no longer affect the response. + +## Parameters + + + Deprecated. Collection availability is always returned. + + + + Deprecated. Collection datapoint counts are always returned. + + ## Returns A collection info object. diff --git a/api-reference/python/tilebox.datasets/Collection.ingest.mdx b/api-reference/python/tilebox.datasets/Collection.ingest.mdx index 03b655c..b788261 100644 --- a/api-reference/python/tilebox.datasets/Collection.ingest.mdx +++ b/api-reference/python/tilebox.datasets/Collection.ingest.mdx @@ -6,14 +6,16 @@ icon: layer-group ```python def Collection.ingest( data: IngestionData, - allow_existing: bool = True + allow_existing: bool = True, + *, + show_progress: bool | Callable[[float], None] = False, ) -> list[UUID] ``` Ingest data into a collection. - You need to have write permission on the collection to be able to delete data points. + You need write permission on the collection to ingest data points. ## Parameters @@ -25,7 +27,7 @@ Ingest data into a collection. - A `pandas.DataFrame`, mapping the column names to dataset fields. - An `xarray.Dataset`, mapping variables and coordinates to dataset fields. - An `Iterable`, `dict` or `nd-array`: ingest any object that can be converted to a `pandas.DataFrame` using - it's constructor, equivalent to `ingest(pd.DataFrame(data))`. + its constructor, equivalent to `ingest(pd.DataFrame(data))`. Datapoint fields are used to generate a deterministic unique `UUID` for each @@ -34,10 +36,13 @@ Ingest data into a collection. If `allow_existing` is `False`, `ingest` will raise an error if any of the generated datapoint IDs already exist. Defaults to `True`. + + If `True`, display a progress bar while ingesting many datapoints. You can also pass a callback to receive progress values between `0` and `1`. Defaults to `False`. + ## Returns -List of datapoint ids that were ingested, including the IDs of existing data points in case of duplicates and +List of datapoint IDs that were ingested, including the IDs of existing data points in case of duplicates and `allow_existing=True`. diff --git a/api-reference/python/tilebox.datasets/Collection.query.mdx b/api-reference/python/tilebox.datasets/Collection.query.mdx index 4535a5d..a0ec842 100644 --- a/api-reference/python/tilebox.datasets/Collection.query.mdx +++ b/api-reference/python/tilebox.datasets/Collection.query.mdx @@ -5,9 +5,11 @@ icon: layer-group ```python def Collection.query( + *, temporal_extent: TimeIntervalLike, + spatial_extent: SpatialFilterLike | None = None, skip_data: bool = False, - show_progress: bool = False + show_progress: bool | Callable[[float], None] = False, ) -> xarray.Dataset ``` @@ -17,26 +19,30 @@ If no data exists for the requested time or interval, an empty `xarray.Dataset` ## Parameters - The time or time interval for which to query data. This can be a single time scalar, a tuple of two time scalars, or an array of time scalars. + The time or time interval for which to query data. This can be a single time scalar, a tuple of two time scalars, or an array of time scalars. - Valid time scalars are: `datetime.datetime` objects, strings in ISO 8601 format, or Unix timestamps in seconds. + Valid time scalars are: `datetime.datetime` objects, strings in ISO 8601 format, or Unix timestamps in seconds. - Behavior for each input type: + Behavior for each input type: - - **TimeScalar**: If a single time scalar is provided, `query` returns all data points for that exact millisecond. + - **TimeScalar**: If a single time scalar is provided, `query` returns all data points for that exact millisecond. - - **TimeInterval**: If a time interval is provided, `query` returns all data points in that interval. Intervals can be a tuple of two `TimeScalars` or a `TimeInterval` object. Tuples are interpreted as a half-open interval `[start, end)`. With a `TimeInterval` object, the `start_exclusive` and `end_inclusive` parameters control whether the start and end time are inclusive or exclusive. + - **TimeInterval**: If a time interval is provided, `query` returns all data points in that interval. Intervals can be a tuple of two `TimeScalars` or a `TimeInterval` object. Tuples are interpreted as a half-open interval `[start, end)`. With a `TimeInterval` object, the `start_exclusive` and `end_inclusive` parameters control whether the start and end time are inclusive or exclusive. - - **Iterable[TimeScalar]**: If an array of time scalars is provided, `query` constructs a time interval from the first and last time scalar in the array. Here, both the `start` and `end` times are inclusive. + - **Iterable[TimeScalar]**: If an array of time scalars is provided, `query` constructs a time interval from the first and last time scalar in the array. Here, both the `start` and `end` times are inclusive. + + Optional spatial filter. Use this for spatial queries in spatio-temporal datasets. + + - If `True`, the response contains only the ID and the timestamp for each datapoint. Defaults to `False`. + If `True`, only required datapoint fields are returned, such as `time`, `id`, and `ingestion_time`. Defaults to `False`. - - If `True`, a progress bar is displayed when pagination is required. Defaults to `False`. + + If `True`, display a progress bar when pagination is required. You can also pass a callback to receive progress values between `0` and `1`. Defaults to `False`. ## Returns @@ -62,6 +68,12 @@ data = collection.query( show_progress=True, ) +# querying a spatio-temporal collection with a spatial filter +data = collection.query( + temporal_extent=interval, + spatial_extent=geometry, +) + # querying a time interval with TimeInterval interval = TimeInterval( start=datetime(2023, 5, 1), diff --git a/api-reference/python/tilebox.datasets/Dataset.collections.mdx b/api-reference/python/tilebox.datasets/Dataset.collections.mdx index e18e2a1..b113383 100644 --- a/api-reference/python/tilebox.datasets/Dataset.collections.mdx +++ b/api-reference/python/tilebox.datasets/Dataset.collections.mdx @@ -4,11 +4,26 @@ icon: database --- ```python -def Dataset.collections() -> dict[str, Collection] +def Dataset.collections( + availability: bool | None = None, + count: bool | None = None, +) -> dict[str, Collection] ``` List the available collections in a dataset. +Collection availability and datapoint counts are always returned. The `availability` and `count` arguments are deprecated and no longer affect the response. + +## Parameters + + + Deprecated. Collection availability is always returned. + + + + Deprecated. Collection datapoint counts are always returned. + + ## Returns A dictionary mapping collection names to collection objects. diff --git a/api-reference/python/tilebox.workflows/AutomationClient.all.mdx b/api-reference/python/tilebox.workflows/AutomationClient.all.mdx new file mode 100644 index 0000000..c0faa4d --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.all.mdx @@ -0,0 +1,20 @@ +--- +title: AutomationClient.all +icon: clock-3 +--- + +```python +def AutomationClient.all() -> list[AutomationPrototype] +``` + +List all registered automations. + +## Returns + +A list of automation prototypes. + + +```python Python +automations = client.automations().all() +``` + diff --git a/api-reference/python/tilebox.workflows/AutomationClient.create_cron_automation.mdx b/api-reference/python/tilebox.workflows/AutomationClient.create_cron_automation.mdx new file mode 100644 index 0000000..c94ff13 --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.create_cron_automation.mdx @@ -0,0 +1,54 @@ +--- +title: AutomationClient.create_cron_automation +sidebarTitle: AutomationClient.create_cron... +icon: calendar-clock +--- + +```python +def AutomationClient.create_cron_automation( + name: str, + task: CronTask, + cron_schedules: str | list[str], + cluster: ClusterSlugLike | None = None, + max_retries: int = 0, +) -> AutomationPrototype +``` + +Create an automation that submits a task on one or more cron schedules. + +## Parameters + + + Name of the automation to create. + + + + Task to run when the automation is triggered. + + + + Cron schedule or schedules that trigger the automation. + + + + Cluster to run the task on. If not provided, the default cluster is used. + + + + Maximum number of retries for the task. Defaults to `0`. + + +## Returns + +The created automation prototype. + + +```python Python +automation = client.automations().create_cron_automation( + name="daily-index-refresh", + task=RefreshIndex(), + cron_schedules="0 2 * * *", + cluster="default", +) +``` + diff --git a/api-reference/python/tilebox.workflows/AutomationClient.create_storage_event_automation.mdx b/api-reference/python/tilebox.workflows/AutomationClient.create_storage_event_automation.mdx new file mode 100644 index 0000000..ca5901b --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.create_storage_event_automation.mdx @@ -0,0 +1,55 @@ +--- +title: AutomationClient.create_storage_event_automation +sidebarTitle: AutomationClient.create_storage... +icon: folder-sync +--- + +```python +def AutomationClient.create_storage_event_automation( + name: str, + task: StorageEventTask, + triggers: list[tuple[StorageLocation, str]] | tuple[StorageLocation, str], + cluster: ClusterSlugLike | None = None, + max_retries: int = 0, +) -> AutomationPrototype +``` + +Create an automation that submits a task when objects are added to storage. + +## Parameters + + + Name of the automation to create. + + + + Task to run when a matching storage event occurs. + + + + One trigger or a list of triggers. Each trigger contains a storage location and a glob pattern. + + + + Cluster to run the task on. If not provided, the default cluster is used. + + + + Maximum number of retries for the task. Defaults to `0`. + + +## Returns + +The created automation prototype. + + +```python Python +location = client.automations().storage_locations()[0] + +automation = client.automations().create_storage_event_automation( + name="ingest-new-scenes", + task=IngestScene(), + triggers=(location, "incoming/**/*.json"), +) +``` + diff --git a/api-reference/python/tilebox.workflows/AutomationClient.delete.mdx b/api-reference/python/tilebox.workflows/AutomationClient.delete.mdx new file mode 100644 index 0000000..de0c22d --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.delete.mdx @@ -0,0 +1,33 @@ +--- +title: AutomationClient.delete +icon: trash-2 +--- + +```python +def AutomationClient.delete( + automation_or_id: AutomationPrototype | UUID | str, + cancel_jobs: bool = False, +) -> None +``` + +Delete an automation by object or ID. + +## Parameters + + + Automation object, automation ID, or automation ID string to delete. + + + + Whether to cancel currently queued or running jobs for the automation. Defaults to `False`. + + +## Returns + +`None` + + +```python Python +client.automations().delete(automation, cancel_jobs=True) +``` + diff --git a/api-reference/python/tilebox.workflows/AutomationClient.find.mdx b/api-reference/python/tilebox.workflows/AutomationClient.find.mdx new file mode 100644 index 0000000..799e603 --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.find.mdx @@ -0,0 +1,26 @@ +--- +title: AutomationClient.find +icon: clock-3 +--- + +```python +def AutomationClient.find(automation_id: UUID | str) -> AutomationPrototype +``` + +Find an automation by ID. + +## Parameters + + + ID of the automation to fetch. + + +## Returns + +The automation prototype for the given ID. + + +```python Python +automation = client.automations().find("0195c87a-49f6-5ffa-e3cb-92215d057ea6") +``` + diff --git a/api-reference/python/tilebox.workflows/AutomationClient.storage_locations.mdx b/api-reference/python/tilebox.workflows/AutomationClient.storage_locations.mdx new file mode 100644 index 0000000..3a40742 --- /dev/null +++ b/api-reference/python/tilebox.workflows/AutomationClient.storage_locations.mdx @@ -0,0 +1,20 @@ +--- +title: AutomationClient.storage_locations +icon: hard-drive +--- + +```python +def AutomationClient.storage_locations() -> list[StorageLocation] +``` + +List storage locations that can be used as storage event automation triggers. + +## Returns + +A list of storage locations available to the current account. + + +```python Python +locations = client.automations().storage_locations() +``` + diff --git a/api-reference/python/tilebox.workflows/Client.configure_logging.mdx b/api-reference/python/tilebox.workflows/Client.configure_logging.mdx index 8f392fb..6cc60ba 100644 --- a/api-reference/python/tilebox.workflows/Client.configure_logging.mdx +++ b/api-reference/python/tilebox.workflows/Client.configure_logging.mdx @@ -5,7 +5,7 @@ icon: laptop-code ```python def Client.configure_logging( - level: int, + level: int | logging.Logger, runner_level: int | None = None, ) -> None ``` @@ -14,14 +14,18 @@ Configure the log level for logs exported by this workflow client. ## Parameters - + Logging level for task logs emitted with `context.logger`, for example `logging.INFO` or `logging.DEBUG`. - Logging level for internal task runner logs. If omitted, the value of `level` is used. + Logging level for internal runner logs. If omitted, the value of `level` is used. + + Passing a `logging.Logger` instance as `level` is deprecated. Configure the Tilebox root logger directly when you need to export logs to another destination. + + ## Returns `None` diff --git a/api-reference/python/tilebox.workflows/Client.mdx b/api-reference/python/tilebox.workflows/Client.mdx index ddf9971..0cf00ff 100644 --- a/api-reference/python/tilebox.workflows/Client.mdx +++ b/api-reference/python/tilebox.workflows/Client.mdx @@ -9,6 +9,7 @@ class Client( url: str = "https://api.tilebox.com", token: str | None = None, name: str | None = None, + client_id: UUID | None = None, ) ``` @@ -16,18 +17,22 @@ Create a Tilebox workflows client. ## Parameters - + Tilebox API Url. Defaults to `https://api.tilebox.com`. - - The API Key to authenticate with. If not set the `TILEBOX_API_KEY` environment variable will be used. + + The API key to authenticate with. If not set, the `TILEBOX_API_KEY` environment variable is used. - + Optional service name for workflow telemetry. If not set, the default service name is used. + + Optional stable ID used to scope internal loggers. If not set, a random ID is generated. + + ## Sub clients The workflows client exposes sub clients for interacting with different parts of the Tilebox workflows API. @@ -53,18 +58,21 @@ A client for scheduling automations. ## Logging ```python -def Client.configure_logging(level: int, runner_level: int | None = None) -> None +def Client.configure_logging( + level: int | logging.Logger, + runner_level: int | None = None, +) -> None ``` Configure which task and runner logs this client exports. See [`Client.configure_logging`](/api-reference/python/tilebox.workflows/Client.configure_logging). -## Task runners +## Runners ```python def Client.runner(...) -> TaskRunner ``` -A client is also used to instantiate task runners. Check out the [`Client.runner` API reference](/api-reference/python/tilebox.workflows/Client.runner) for more information. +A client is also used to instantiate runners. Check out the [`Client.runner` API reference](/api-reference/python/tilebox.workflows/Client.runner) for more information. ```python Python @@ -85,7 +93,7 @@ job_client = client.jobs() cluster_client = client.clusters() automation_client = client.automations() -# or instantiate a task runner +# or instantiate a runner runner = client.runner(tasks=[...]) ``` diff --git a/api-reference/python/tilebox.workflows/Client.runner.mdx b/api-reference/python/tilebox.workflows/Client.runner.mdx index b150a98..b2c567b 100644 --- a/api-reference/python/tilebox.workflows/Client.runner.mdx +++ b/api-reference/python/tilebox.workflows/Client.runner.mdx @@ -5,29 +5,41 @@ icon: laptop-code ```python def Client.runner( - cluster: ClusterSlugLike | None = None, - tasks: list[type[Task]], - cache: JobCache | None = None + cluster: ClusterSlugLike | None = None, + tasks: list[type[Task]] | None = None, + cache: JobCache | None = None, + context: type[RunnerContext] | None = None, + runner: Runner | None = None, ) -> TaskRunner ``` -Initialize a task runner. +Initialize a direct runner from task classes. + +For new Python workflow projects, define a reusable [`Runner`](/api-reference/python/tilebox.workflows/Runner) object and call [`Runner.connect_to`](/api-reference/python/tilebox.workflows/Runner.connect_to) when you want direct execution. The `Client.runner` method remains available for existing code and simple direct runner scripts. ## Parameters - The [cluster slug](/workflows/concepts/clusters#managing-clusters) for the cluster associated with this task runner. + The [cluster slug](/workflows/concepts/clusters#cluster-slug) for the cluster associated with this direct runner. If not provided, the default cluster is used. - - A list of task classes that this runner can execute. + + A list of task classes that this runner can execute. Pass either `tasks`, `cache`, and `context`, or pass a reusable `runner` object. An optional [job cache](/workflows/caches) for caching results from tasks and sharing data between tasks. + + Optional runner context class to instantiate for task execution. + + + + A reusable runner definition. If provided, do not also pass `tasks`, `cache`, or `context`. + + ```python Python from tilebox.workflows import Client diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.job_cache.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.job_cache.mdx index aeb5c56..5b17218 100644 --- a/api-reference/python/tilebox.workflows/ExecutionContext.job_cache.mdx +++ b/api-reference/python/tilebox.workflows/ExecutionContext.job_cache.mdx @@ -1,5 +1,5 @@ --- -title: Context.job_cache +title: ExecutionContext.job_cache icon: folder-gear --- diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.logger.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.logger.mdx index 7edf762..732fbb7 100644 --- a/api-reference/python/tilebox.workflows/ExecutionContext.logger.mdx +++ b/api-reference/python/tilebox.workflows/ExecutionContext.logger.mdx @@ -27,11 +27,11 @@ Structured attributes are attached to the log record and become searchable telem ```python Python from tilebox.workflows import ExecutionContext, Task -class ProcessScene(Task): - scene_id: str +class DownloadInput(Task): + input_id: str def execute(self, context: ExecutionContext) -> None: - context.logger.info("Started", scene_id=self.scene_id) + context.logger.info("Started", input_id=self.input_id) log = context.logger.bind(component="download") log.debug("Fetching input") diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.progress.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.progress.mdx new file mode 100644 index 0000000..b1fac3d --- /dev/null +++ b/api-reference/python/tilebox.workflows/ExecutionContext.progress.mdx @@ -0,0 +1,34 @@ +--- +title: ExecutionContext.progress +icon: folder-gear +--- + +```python +def ExecutionContext.progress(label: str | None = None) -> ProgressUpdate +``` + +Create or return a progress indicator for the currently executing task. + +Progress updates are attached to the task result and are visible in job execution state. Calling `progress` again with the same label returns the same progress indicator. + +## Parameters + + + Optional label for the progress indicator. Empty strings are treated as `None`. + + +## Returns + +A `ProgressUpdate` object for tracking total and completed work. + + +```python Python +def execute(self, context: ExecutionContext) -> None: + progress = context.progress("download-scenes") + progress.add(len(self.scenes)) + + for scene in self.scenes: + download(scene) + progress.done(1) +``` + diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.runner_context.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.runner_context.mdx new file mode 100644 index 0000000..e34129c --- /dev/null +++ b/api-reference/python/tilebox.workflows/ExecutionContext.runner_context.mdx @@ -0,0 +1,25 @@ +--- +title: ExecutionContext.runner_context +icon: folder-gear +--- + +```python +ExecutionContext.runner_context: RunnerContext +``` + +Access the runner context instance for the runner executing the task. + +Use a custom `RunnerContext` subclass when tasks need shared runtime configuration or helpers during execution. + +## Returns + +The runner context object associated with the runner. + + +```python Python +def execute(self, context: ExecutionContext) -> None: + bucket = context.runner_context.gcs_client("my-project:my-bucket") + blob = bucket.blob("inputs/scene.json") + data = blob.download_as_bytes() +``` + diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask.mdx index ee35f15..de41930 100644 --- a/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask.mdx +++ b/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask.mdx @@ -6,7 +6,7 @@ icon: folder-gear ```python def ExecutionContext.submit_subtask( task: Task, - depends_on: list[FutureTask] = None, + depends_on: FutureTask | list[FutureTask] | None = None, cluster: str | None = None, max_retries: int = 0, optional: bool = False @@ -21,8 +21,8 @@ Submit a [subtask](/workflows/concepts/tasks#subtasks-and-task-composition) from The task to submit as a subtask. - - An optional list of tasks already submitted within the same context that this subtask depends on. + + An optional task or list of tasks already submitted within the same context that this subtask depends on. @@ -62,4 +62,3 @@ optional_task = context.submit_subtask( ) ``` - diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtasks.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtasks.mdx index 3afd696..087a0dd 100644 --- a/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtasks.mdx +++ b/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtasks.mdx @@ -6,7 +6,7 @@ icon: folder-gear ```python def ExecutionContext.submit_subtasks( tasks: Sequence[Task], - depends_on: list[FutureTask] = None, + depends_on: FutureTask | list[FutureTask] | None = None, cluster: str | None = None, max_retries: int = 0, optional: bool = False @@ -21,8 +21,8 @@ Submit multiple [subtasks](/workflows/concepts/tasks#subtasks-and-task-compositi The tasks to submit as subtasks. - - An optional list of tasks already submitted within the same context that the subtasks depend on. + + An optional task or list of tasks already submitted within the same context that the subtasks depend on. diff --git a/api-reference/python/tilebox.workflows/ExecutionContext.tracer.mdx b/api-reference/python/tilebox.workflows/ExecutionContext.tracer.mdx index 2609a11..e81d2ba 100644 --- a/api-reference/python/tilebox.workflows/ExecutionContext.tracer.mdx +++ b/api-reference/python/tilebox.workflows/ExecutionContext.tracer.mdx @@ -22,11 +22,11 @@ Custom spans are nested under the current task span and are exported to Tilebox ```python Python from tilebox.workflows import ExecutionContext, Task -class ProcessScene(Task): - scene_id: str +class DownloadInput(Task): + input_id: str def execute(self, context: ExecutionContext) -> None: - with context.tracer.span("download-scene") as span: - span.set_attribute("scene_id", self.scene_id) + with context.tracer.span("download-input") as span: + span.set_attribute("input_id", self.input_id) ``` diff --git a/api-reference/python/tilebox.workflows/JobClient.cancel.mdx b/api-reference/python/tilebox.workflows/JobClient.cancel.mdx index 83a213a..7fe32ea 100644 --- a/api-reference/python/tilebox.workflows/JobClient.cancel.mdx +++ b/api-reference/python/tilebox.workflows/JobClient.cancel.mdx @@ -7,7 +7,7 @@ icon: diagram-project def JobClient.cancel(job_or_id: Job | str) ``` -Cancel a job. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners. +Cancel a job. When a job is canceled, no queued tasks will be picked up by runners and executed even if runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by runners. ## Parameters diff --git a/api-reference/python/tilebox.workflows/JobClient.display.mdx b/api-reference/python/tilebox.workflows/JobClient.display.mdx new file mode 100644 index 0000000..f2aea9e --- /dev/null +++ b/api-reference/python/tilebox.workflows/JobClient.display.mdx @@ -0,0 +1,45 @@ +--- +title: JobClient.display +icon: diagram-project +--- + +```python +def JobClient.display( + job_id: Job | UUID | str, + direction: str = "down", + layout: str = "dagre", + sketchy: bool = True, +) -> None +``` + +Display a job diagram in an interactive Python environment, such as a Jupyter notebook. + +Use [`JobClient.visualize`](/api-reference/python/tilebox.workflows/JobClient.visualize) when you need the SVG markup instead of displaying it directly. + +## Parameters + + + The job, job ID, or job ID string to display. + + + + Direction for the diagram to flow. Defaults to `down`. + + + + Layout engine for the diagram. Supported values are `dagre` and `elk`. Defaults to `dagre`. + + + + Whether to render the diagram in a sketchy, hand-drawn style. Defaults to `True`. + + +## Returns + +`None` + + +```python Python +job_client.display(job) +``` + diff --git a/api-reference/python/tilebox.workflows/JobClient.query.mdx b/api-reference/python/tilebox.workflows/JobClient.query.mdx index aafa2be..83de44e 100644 --- a/api-reference/python/tilebox.workflows/JobClient.query.mdx +++ b/api-reference/python/tilebox.workflows/JobClient.query.mdx @@ -6,7 +6,7 @@ icon: diagram-project ```python def JobClient.query( temporal_extent: TimeIntervalLike | IDIntervalLike, - automation_id: UUID | None = None, + automation_ids: UUID | list[UUID] | None = None, job_states: JobState | list[JobState] | None = None, name: str | None = None, task_states: TaskState | list[TaskState] | None = None, @@ -30,8 +30,8 @@ Query jobs in the specified interval. - tuple of two strings: [start, end) -> Construct an `IDInterval` with the given start and end id parsed from the strings - - The automation id to filter jobs by. If not provided, jobs from all automations are returned. + + One automation ID or a list of automation IDs to filter jobs by. If not provided, jobs from all automations are returned. A job state or list of job states to filter by. If specified, only jobs in any of the given states are returned. diff --git a/api-reference/python/tilebox.workflows/JobClient.retry.mdx b/api-reference/python/tilebox.workflows/JobClient.retry.mdx index 81017df..8a59a77 100644 --- a/api-reference/python/tilebox.workflows/JobClient.retry.mdx +++ b/api-reference/python/tilebox.workflows/JobClient.retry.mdx @@ -7,7 +7,7 @@ icon: diagram-project def JobClient.retry(job_or_id: Job | str) -> int ``` -Retry a job. All failed tasks will become queued again, and queued tasks will be picked up by task runners again. +Retry a job. All failed tasks will become queued again, and queued tasks will be picked up by runners again. ## Parameters diff --git a/api-reference/python/tilebox.workflows/JobClient.submit.mdx b/api-reference/python/tilebox.workflows/JobClient.submit.mdx index aba2ab9..1540237 100644 --- a/api-reference/python/tilebox.workflows/JobClient.submit.mdx +++ b/api-reference/python/tilebox.workflows/JobClient.submit.mdx @@ -6,8 +6,8 @@ icon: diagram-project ```python def JobClient.submit( job_name: str, - root_task_or_tasks: Task | Iterable[Task], - cluster: str | Cluster | Iterable[str | Cluster] | None = None, + root_task_or_tasks: Task | list[Task], + cluster: ClusterSlugLike | list[ClusterSlugLike] | None = None, max_retries: int = 0 ) -> Job ``` @@ -20,17 +20,16 @@ Submit a job. The name of the job. - - The root task for the job. This task is executed first and can submit subtasks to manage the entire workflow. A job can have optionally consist of multiple root tasks. + + The root task or root tasks for the job. Root tasks execute first and can submit subtasks to compose the workflow. - - The [cluster slug](/workflows/concepts/clusters#managing-clusters) for the cluster to run the root task on. In case of multiple root tasks, a list of cluster slugs can be provided. - If not provided, the default cluster is used. + + The [cluster slug](/workflows/concepts/clusters#managing-clusters) or cluster object for the root task. When submitting multiple root tasks, pass one cluster or a list with one cluster per task. If not provided, the default cluster is used. - The maximum number of [retries](/workflows/concepts/tasks#retry-handling) for the subtask in case it fails. Defaults to 0. + The maximum number of [retries](/workflows/concepts/tasks#retry-handling) for the root task or tasks. Defaults to 0. diff --git a/api-reference/python/tilebox.workflows/JobClient.visualize.mdx b/api-reference/python/tilebox.workflows/JobClient.visualize.mdx index 8234c4e..d8f1443 100644 --- a/api-reference/python/tilebox.workflows/JobClient.visualize.mdx +++ b/api-reference/python/tilebox.workflows/JobClient.visualize.mdx @@ -5,11 +5,11 @@ icon: diagram-project ```python def JobClient.visualize( - job_or_id: Job | str, + job: Job | UUID | str, direction: str = "down", layout: str = "dagre", sketchy: bool = True -) +) -> str ``` Create a visualization of a job as a diagram. @@ -20,8 +20,8 @@ Create a visualization of a job as a diagram. ## Parameters - - The job to visualize. + + The job, job ID, or job ID string to visualize. @@ -36,6 +36,10 @@ Create a visualization of a job as a diagram. Indicates whether to use a sketchy, hand-drawn style for the diagram. The default is `True`. +## Returns + +Rendered SVG markup for the job diagram. + ```python Python svg = job_client.visualize(job) diff --git a/api-reference/python/tilebox.workflows/ProgressUpdate.add.mdx b/api-reference/python/tilebox.workflows/ProgressUpdate.add.mdx new file mode 100644 index 0000000..64e01b9 --- /dev/null +++ b/api-reference/python/tilebox.workflows/ProgressUpdate.add.mdx @@ -0,0 +1,29 @@ +--- +title: ProgressUpdate.add +icon: bars-progress +--- + +```python +def ProgressUpdate.add(count: int) -> None +``` + +Add work to the total amount tracked by a progress indicator. + +Call `add` before or during a task when the task discovers more work to complete. + +## Parameters + + + Amount of work to add to the progress indicator total. + + +## Returns + +`None` + + +```python Python +progress = context.progress("process-items") +progress.add(len(items)) +``` + diff --git a/api-reference/python/tilebox.workflows/ProgressUpdate.done.mdx b/api-reference/python/tilebox.workflows/ProgressUpdate.done.mdx new file mode 100644 index 0000000..f55f194 --- /dev/null +++ b/api-reference/python/tilebox.workflows/ProgressUpdate.done.mdx @@ -0,0 +1,33 @@ +--- +title: ProgressUpdate.done +icon: bars-progress +--- + +```python +def ProgressUpdate.done(count: int) -> None +``` + +Mark work as completed on a progress indicator. + +Call `done` as the task finishes units of work added with [`ProgressUpdate.add`](/api-reference/python/tilebox.workflows/ProgressUpdate.add). + +## Parameters + + + Amount of work to mark as completed. + + +## Returns + +`None` + + +```python Python +progress = context.progress("process-items") +progress.add(len(items)) + +for item in items: + process(item) + progress.done(1) +``` + diff --git a/api-reference/python/tilebox.workflows/Runner.connect_to.mdx b/api-reference/python/tilebox.workflows/Runner.connect_to.mdx new file mode 100644 index 0000000..63de9c4 --- /dev/null +++ b/api-reference/python/tilebox.workflows/Runner.connect_to.mdx @@ -0,0 +1,38 @@ +--- +title: Runner.connect_to +icon: plug +--- + +```python +def Runner.connect_to( + client: Client, + cluster: ClusterSlugLike | None = None, +) -> TaskRunner +``` + +Create a direct runner from a reusable [`Runner`](/api-reference/python/tilebox.workflows/Runner) definition and a Tilebox workflows client. + +The returned [`TaskRunner`](/api-reference/python/tilebox.workflows/TaskRunner.run_forever) connects to the Tilebox API, advertises the task registrations from the `Runner` definition, and executes matching tasks from the selected cluster. + +## Parameters + + + Tilebox workflows client used by the direct runner. + + + + The [cluster slug](/workflows/concepts/clusters#cluster-slug) for the runner. If not provided, the default cluster is used. + + +## Example + +```python Python +from tilebox.workflows import Client + +from my_workflow.runner import runner + + +client = Client(name="my-workflow-direct") +task_runner = runner.connect_to(client, cluster="dev-cluster") +task_runner.run_forever() +``` diff --git a/api-reference/python/tilebox.workflows/Runner.mdx b/api-reference/python/tilebox.workflows/Runner.mdx new file mode 100644 index 0000000..d712d19 --- /dev/null +++ b/api-reference/python/tilebox.workflows/Runner.mdx @@ -0,0 +1,74 @@ +--- +title: Runner +icon: list-check +--- + +```python +class Runner( + *, + tasks: list[type[Task]] | None = None, + cache: JobCache | None = None, + context: type[RunnerContext] | None = None, +) +``` + +Define task registrations for a Python workflow project. + +A `Runner` object is a reusable definition. Use it for direct execution by connecting it to a [`Client`](/api-reference/python/tilebox.workflows/Client), or reference it from `tilebox.workflow.toml` so release runners can load the same task registrations from a workflow release. + +## Parameters + + + A list of task classes this runner can execute. + + + + Optional [job cache](/workflows/caches) used by tasks executed by this runner. + + + + Optional runner context class to instantiate for tasks executed by this runner. + + +## Example + +```python Python +from tilebox.workflows import Runner +from tilebox.workflows.cache import LocalFileSystemCache + +from my_workflow.tasks import MyRootTask, MySubtask + + +runner = Runner( + tasks=[MyRootTask, MySubtask], + cache=LocalFileSystemCache(), +) +``` + +You can also register tasks after creating a runner: + +```python Python +runner = Runner(cache=LocalFileSystemCache()) +runner.register(MyRootTask) +runner.register(MySubtask) +``` + +Use the same object for direct execution: + +```python Python +from tilebox.workflows import Client + +from my_workflow.runner import runner + + +runner.connect_to(Client(), cluster="dev-cluster").run_forever() +``` + +Or reference it from `tilebox.workflow.toml` for release execution: + +```toml +[workflow] +slug = "my-workflow" +root = "." +runner = "my_workflow.runner:runner" +``` diff --git a/api-reference/python/tilebox.workflows/Runner.register.mdx b/api-reference/python/tilebox.workflows/Runner.register.mdx new file mode 100644 index 0000000..234c280 --- /dev/null +++ b/api-reference/python/tilebox.workflows/Runner.register.mdx @@ -0,0 +1,34 @@ +--- +title: Runner.register +icon: list-check +--- + +```python +def Runner.register(task: type[Task]) -> None +``` + +Register a task class with a reusable runner definition. + +Use `register` when you want to build a runner incrementally instead of passing all task classes to the [`Runner`](/api-reference/python/tilebox.workflows/Runner) constructor. + +## Parameters + + + Task class that this runner can execute. + + +## Returns + +`None` + + +```python Python +from tilebox.workflows import Runner + +from my_workflow.tasks import MyRootTask, MySubtask + +runner = Runner() +runner.register(MyRootTask) +runner.register(MySubtask) +``` + diff --git a/api-reference/python/tilebox.workflows/TaskRunner.run_all.mdx b/api-reference/python/tilebox.workflows/TaskRunner.run_all.mdx index efe5160..5416cf4 100644 --- a/api-reference/python/tilebox.workflows/TaskRunner.run_all.mdx +++ b/api-reference/python/tilebox.workflows/TaskRunner.run_all.mdx @@ -7,7 +7,7 @@ icon: gear-code def TaskRunner.run_all() ``` -Run the task runner and execute all tasks, until there are no more tasks available. +Run the runner and execute all tasks, until there are no more tasks available. ```python Python diff --git a/api-reference/python/tilebox.workflows/TaskRunner.run_forever.mdx b/api-reference/python/tilebox.workflows/TaskRunner.run_forever.mdx index 72c23c7..0208cf8 100644 --- a/api-reference/python/tilebox.workflows/TaskRunner.run_forever.mdx +++ b/api-reference/python/tilebox.workflows/TaskRunner.run_forever.mdx @@ -7,7 +7,7 @@ icon: gear-code def TaskRunner.run_forever() ``` -Run the task runner forever. This will poll for new tasks and execute them as they come in. +Run the runner forever. This will poll for new tasks and execute them as they come in. If no tasks are available, it will sleep for a short time and then try again. diff --git a/assets/workflows/releases/workflow-release-deployments-dark.png b/assets/workflows/releases/workflow-release-deployments-dark.png new file mode 100644 index 0000000..fdaca66 Binary files /dev/null and b/assets/workflows/releases/workflow-release-deployments-dark.png differ diff --git a/assets/workflows/releases/workflow-release-deployments-light.png b/assets/workflows/releases/workflow-release-deployments-light.png new file mode 100644 index 0000000..4bfd110 Binary files /dev/null and b/assets/workflows/releases/workflow-release-deployments-light.png differ diff --git a/assets/workflows/runners/runner-architecture-dark.png b/assets/workflows/runners/runner-architecture-dark.png new file mode 100644 index 0000000..0e25815 Binary files /dev/null and b/assets/workflows/runners/runner-architecture-dark.png differ diff --git a/assets/workflows/runners/runner-architecture-light.png b/assets/workflows/runners/runner-architecture-light.png new file mode 100644 index 0000000..e181b30 Binary files /dev/null and b/assets/workflows/runners/runner-architecture-light.png differ diff --git a/assets/workflows/runners/runner-modes-dark.png b/assets/workflows/runners/runner-modes-dark.png new file mode 100644 index 0000000..4c80d4d Binary files /dev/null and b/assets/workflows/runners/runner-modes-dark.png differ diff --git a/assets/workflows/runners/runner-modes-light.png b/assets/workflows/runners/runner-modes-light.png new file mode 100644 index 0000000..dd72abb Binary files /dev/null and b/assets/workflows/runners/runner-modes-light.png differ diff --git a/assets/workflows/workflows-architecture-dark.png b/assets/workflows/workflows-architecture-dark.png new file mode 100644 index 0000000..f43a571 Binary files /dev/null and b/assets/workflows/workflows-architecture-dark.png differ diff --git a/assets/workflows/workflows-architecture-light.png b/assets/workflows/workflows-architecture-light.png new file mode 100644 index 0000000..9efb03c Binary files /dev/null and b/assets/workflows/workflows-architecture-light.png differ diff --git a/changelog.mdx b/changelog.mdx index b4fe9f1..be23e99 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -12,7 +12,7 @@ icon: rss ## Workflow Observability - Tilebox Workflows now includes built-in observability for jobs and task runners. Tilebox captures workflow logs, traces, task status, and runner context, then correlates them with jobs and tasks. + Tilebox Workflows now includes built-in observability for jobs and runners. Tilebox captures workflow logs, traces, task status, and runner context, then correlates them with jobs and tasks. The Console includes a built-in explorer for workflow observability, so you can inspect task logs, trace timing, failures, and runner behavior from the job view. @@ -102,7 +102,7 @@ icon: rss - Statically typed dataset types - CLI to generate dataset types - Workflows client - - Go task runners + - Go runners To get started, check out the [Go SDK documentation](/sdks/go/install). diff --git a/docs.json b/docs.json index 618d352..ede3642 100644 --- a/docs.json +++ b/docs.json @@ -93,10 +93,21 @@ "pages": [ "workflows/concepts/tasks", "workflows/concepts/jobs", - "workflows/concepts/task-runners", + "workflows/concepts/runners", + "workflows/concepts/workflow-releases", "workflows/concepts/clusters" ] }, + { + "group": "Build and Deploy", + "icon": "rocket", + "pages": [ + "workflows/build-and-deploy/project-structure", + "workflows/build-and-deploy/workflow-configuration", + "workflows/build-and-deploy/releases", + "workflows/build-and-deploy/cluster-deployments" + ] + }, "workflows/caches", "workflows/progress", { @@ -145,6 +156,7 @@ { "group": "Workflows", "pages": [ + "guides/workflows/agentic-workflow-iteration", "guides/workflows/multi-language" ] } @@ -210,19 +222,32 @@ "pages": [ "api-reference/python/tilebox.workflows/Client", "api-reference/python/tilebox.workflows/Task", + "api-reference/python/tilebox.workflows/Runner", + "api-reference/python/tilebox.workflows/Runner.register", + "api-reference/python/tilebox.workflows/Runner.connect_to", "api-reference/python/tilebox.workflows/Client.runner", "api-reference/python/tilebox.workflows/Client.configure_logging", "api-reference/python/tilebox.workflows/TaskRunner.run_all", "api-reference/python/tilebox.workflows/TaskRunner.run_forever", "api-reference/python/tilebox.workflows/ExecutionContext.logger", "api-reference/python/tilebox.workflows/ExecutionContext.tracer", + "api-reference/python/tilebox.workflows/ExecutionContext.progress", "api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask", "api-reference/python/tilebox.workflows/ExecutionContext.submit_subtasks", + "api-reference/python/tilebox.workflows/ExecutionContext.runner_context", "api-reference/python/tilebox.workflows/ExecutionContext.job_cache", + "api-reference/python/tilebox.workflows/ProgressUpdate.add", + "api-reference/python/tilebox.workflows/ProgressUpdate.done", "api-reference/python/tilebox.workflows/ClusterClient.create", "api-reference/python/tilebox.workflows/ClusterClient.find", "api-reference/python/tilebox.workflows/ClusterClient.delete", "api-reference/python/tilebox.workflows/ClusterClient.all", + "api-reference/python/tilebox.workflows/AutomationClient.storage_locations", + "api-reference/python/tilebox.workflows/AutomationClient.all", + "api-reference/python/tilebox.workflows/AutomationClient.find", + "api-reference/python/tilebox.workflows/AutomationClient.create_cron_automation", + "api-reference/python/tilebox.workflows/AutomationClient.create_storage_event_automation", + "api-reference/python/tilebox.workflows/AutomationClient.delete", "api-reference/python/tilebox.workflows/JobCache.group", "api-reference/python/tilebox.workflows/JobCache.__iter__", "api-reference/python/tilebox.workflows/JobClient.submit", @@ -230,6 +255,7 @@ "api-reference/python/tilebox.workflows/JobClient.retry", "api-reference/python/tilebox.workflows/JobClient.cancel", "api-reference/python/tilebox.workflows/JobClient.visualize", + "api-reference/python/tilebox.workflows/JobClient.display", "api-reference/python/tilebox.workflows/JobClient.query_logs", "api-reference/python/tilebox.workflows/JobClient.query_spans", "api-reference/python/tilebox.workflows/JobClient.query" @@ -244,6 +270,8 @@ "group": "datasets", "pages": [ "api-reference/go/datasets/Create", + "api-reference/go/datasets/Update", + "api-reference/go/datasets/CreateOrUpdate", "api-reference/go/datasets/Get", "api-reference/go/datasets/List", "api-reference/go/datasets/Collections.Create", @@ -253,10 +281,14 @@ "api-reference/go/datasets/Collections.List", "api-reference/go/datasets/Datapoints.GetInto", "api-reference/go/datasets/Datapoints.Query", + "api-reference/go/datasets/Datapoints.QueryPage", "api-reference/go/datasets/Datapoints.QueryInto", "api-reference/go/datasets/Datapoints.Ingest", "api-reference/go/datasets/Datapoints.Delete", "api-reference/go/datasets/Datapoints.DeleteIDs", + "api-reference/go/datasets/NewDatapointDescriptor", + "api-reference/go/datasets/UnmarshalDatapoint", + "api-reference/go/datasets/DatapointDecoder.Unmarshal", "api-reference/go/datasets/CollectAs", "api-reference/go/datasets/Collect", "api-reference/go/datasets/As" @@ -269,26 +301,53 @@ "api-reference/go/workflows/GetCurrentCluster", "api-reference/go/workflows/SubmitSubtask", "api-reference/go/workflows/SubmitSubtasks", + "api-reference/go/workflows/SetTaskDisplay", + "api-reference/go/workflows/DefaultProgress", + "api-reference/go/workflows/Progress", + "api-reference/go/workflows/ProgressTracker.Add", + "api-reference/go/workflows/ProgressTracker.Done", "api-reference/go/workflows/ConfigureConsoleLogging", "api-reference/go/workflows/WithSpan", "api-reference/go/workflows/WithSpanResult", "api-reference/go/workflows/WithTaskSpan", "api-reference/go/workflows/WithTaskSpanResult", "api-reference/go/workflows/NewTaskRunner", + "api-reference/go/workflows/NewPollingTaskRunner", + "api-reference/go/workflows/TaskExecutor", "api-reference/go/workflows/TaskRunner.GetRegisteredTask", "api-reference/go/workflows/TaskRunner.RegisterTasks", - "api-reference/go/workflows/TaskRunner.Run", + "api-reference/go/workflows/TaskRunner.RunForever", + "api-reference/go/workflows/TaskRunner.RunAll", + "api-reference/go/workflows/PollingTaskRunner.RunForever", + "api-reference/go/workflows/PollingTaskRunner.RunAll", + "api-reference/go/workflows/PollingTaskRunner.StopRequestingNewTasks", + "api-reference/go/workflows/PollingTaskRunner.IsRequestingTasks", + "api-reference/go/workflows/PollingTaskRunner.HasActiveTask", + "api-reference/go/workflows/PollingTaskRunner.InterruptActiveTask", "api-reference/go/workflows/Clusters.Create", "api-reference/go/workflows/Clusters.Get", "api-reference/go/workflows/Clusters.Delete", "api-reference/go/workflows/Clusters.List", + "api-reference/go/workflows/Workflows.Create", + "api-reference/go/workflows/Workflows.List", + "api-reference/go/workflows/Workflows.Get", + "api-reference/go/workflows/Workflows.PublishRelease", + "api-reference/go/workflows/Workflows.DeployRelease", + "api-reference/go/workflows/Workflows.UndeployRelease", + "api-reference/go/workflows/Automations.List", + "api-reference/go/workflows/Automations.Get", + "api-reference/go/workflows/Automations.GetStorageLocation", + "api-reference/go/workflows/Automations.ListStorageLocations", "api-reference/go/workflows/Jobs.Submit", "api-reference/go/workflows/Jobs.Get", "api-reference/go/workflows/Jobs.Retry", "api-reference/go/workflows/Jobs.Cancel", + "api-reference/go/workflows/Jobs.Query", + "api-reference/go/workflows/Jobs.QueryPage", "api-reference/go/workflows/Jobs.QueryLogs", + "api-reference/go/workflows/Jobs.QueryLogsPage", "api-reference/go/workflows/Jobs.QuerySpans", - "api-reference/go/workflows/Jobs.Query", + "api-reference/go/workflows/Jobs.QuerySpansPage", "api-reference/go/workflows/Collect" ] } @@ -316,9 +375,9 @@ "icon": "display" }, { - "anchor": "Book a Demo", - "href": "https://book.vimcal.com/p/lauracosta/tilebox-demo", - "icon": "calendar" + "anchor": "Talk to us", + "href": "https://book.vimcal.com/p/meeshvia/30-minute-meeting", + "icon": "phone" }, { "anchor": "Discord", diff --git a/guides/workflows/agentic-workflow-iteration.mdx b/guides/workflows/agentic-workflow-iteration.mdx new file mode 100644 index 0000000..ca4027d --- /dev/null +++ b/guides/workflows/agentic-workflow-iteration.mdx @@ -0,0 +1,118 @@ +--- +title: Iterate on Workflow Releases with Agents +description: Use a coding agent with the Tilebox command-line tool to edit, publish, deploy, run, and debug Python workflow releases. +icon: robot +--- + +This guide shows the recommended loop for AI-assisted workflow development. The agent edits Python workflow code, publishes a release, deploys it to a development cluster, starts a release runner, submits a test job, and inspects logs or spans before iterating. + +Use this loop when you want the code under test to match the artifact that release runners execute. + +## Prerequisites + +Set up the Tilebox command-line tool and skills in the environment where your agent runs. + +```bash +curl -fsSL https://cli.tilebox.com/install.sh | sh +export TILEBOX_API_KEY="YOUR_TILEBOX_API_KEY" +npx skills add tilebox/skills +``` + +Ask the agent to inspect the command-line tool before changing resources. + +```text Prompt +Load the Tilebox workflow skills. Inspect `tilebox agent-context workflow --output-schema` and `tilebox agent-context runner start --output-schema`. Do not modify Tilebox resources yet. +``` + +## Create or update the workflow project + +Ask the agent to keep the workflow project centered on one reusable `Runner` definition. + +```text Prompt +Update this Python workflow project. Keep task classes in `src//tasks.py`, export `runner = Runner(tasks=[...])` from `src//runner.py`, and keep `tilebox.workflow.toml` pointing at that runner object. +``` + +The direct runner path and release runner path should use the same `Runner` object. Direct execution uses `runner.connect_to(Client(), cluster=...)`; release execution uses `tilebox.workflow.toml` and `tilebox runner start`. + +## Use a development cluster + +Create a development cluster if you do not already have one. + +```bash +tilebox cluster create "workflow-dev" --json +``` + +Add the cluster slug to `tilebox.workflow.toml`. + +```toml +[targets.dev] +clusters = ["workflow-dev-abc123"] +``` + +## Build and publish a release + +Ask the agent to build locally first when it needs detailed validation output. + +```bash +tilebox workflow build-release --debug --json +``` + +Then publish the release. + +```bash +RELEASE_ID=$(tilebox workflow publish-release --json | jq -r '.id') +``` + +## Deploy to the development cluster + +Deploy the exact release the agent just published. + +```bash +tilebox workflow deploy-release --release "$RELEASE_ID" --target dev --json +``` + +This updates cluster deployment state. It does not submit a job. + +## Start a release runner + +In another terminal, start a release runner for the development cluster. + +```bash +tilebox runner start --cluster workflow-dev-abc123 --debug +``` + +Keep this runner running while the agent iterates. It can run multiple deployed releases for the cluster and reacts when the agent deploys a new release. + +## Submit and inspect a test job + +Submit a root task to the same cluster. + +```bash +JOB_ID=$(tilebox job submit \ + --name my-workflow-test \ + --task tilebox.com/example/ProcessScene \ + --version v1.0 \ + --cluster workflow-dev-abc123 \ + --input '{"scene_id":"S2A_001"}' \ + --wait \ + --json | jq -r '.id') +``` + +Inspect logs and spans when the job fails or takes longer than expected. + +```bash +tilebox job logs "$JOB_ID" --json +tilebox job spans "$JOB_ID" --json +``` + +## Iterate safely + +For compatible fixes, keep the task identifier name and major version stable. Publish and deploy the fix, then retry the failed job. + +```bash +RELEASE_ID=$(tilebox workflow publish-release --json | jq -r '.id') +tilebox workflow deploy-release --release "$RELEASE_ID" --target dev --json +tilebox job retry "$JOB_ID" --json +``` + +For breaking input or behavior changes, bump the task major version and submit a new test job. diff --git a/guides/workflows/multi-language.mdx b/guides/workflows/multi-language.mdx index 5d5cc79..397757e 100644 --- a/guides/workflows/multi-language.mdx +++ b/guides/workflows/multi-language.mdx @@ -168,7 +168,7 @@ func submitHandler(client *workflows.Client) http.HandlerFunc { ## Creating a Python runner -Write a Python script that starts a task runner and registers the `ScheduleImageCapture` task. +Write a Python script that starts a runner and registers the `ScheduleImageCapture` task. ```python Python from tilebox.workflows import Client diff --git a/introduction.mdx b/introduction.mdx index 373040e..13aed96 100644 --- a/introduction.mdx +++ b/introduction.mdx @@ -1,7 +1,7 @@ --- title: Tilebox sidebarTitle: Introduction -description: "Tilebox is a developer-friendly tool for space data management and workflow orchestration, purpose-built for ground segment and orbital applications alike." +description: "Tilebox helps teams turn physical world data into operational workflows." icon: house mode: wide 'og:title': "Tilebox Docs" @@ -9,9 +9,9 @@ mode: wide import { HeroCard } from '/snippets/components.mdx'; -Tilebox is a lightweight space data management and orchestration software - on ground and in orbit. It provides a framework that simplifies access, processing, and distribution of space data across different environments enabling efficient multi-language, multi-cluster workflows. Tilebox integrates seamlessly with your existing infrastructure, ensuring that you maintain complete control over your data and algorithms. +Use Tilebox to access datasets, connect your own data, build processing pipelines, launch jobs, and inspect results. You can work through the console, command line, SDKs, APIs, or coding agents. Each access point uses the same authentication, data access, and job history. -## Modules +## Concepts Tilebox consists of two primary modules: @@ -84,9 +84,9 @@ You can also start by looking through these guides: Gain a deeper understanding of how to create tasks using the Tilebox Workflow Orchestrator. - Find out how to deploy Task Runners to run workflows in a parallel, distributed manner. + href="/workflows/concepts/runners"> + Find out how to deploy runners to run workflows in a parallel, distributed manner. diff --git a/quickstart.mdx b/quickstart.mdx index 8c14dbc..56704f9 100644 --- a/quickstart.mdx +++ b/quickstart.mdx @@ -1,13 +1,13 @@ --- title: Quickstart -description: This guide helps you set up and get started using Tilebox. It covers how to install a Tilebox client for your preferred language and how to use it to query data from a dataset and run a workflow. +description: Start here to set up Tilebox and get started building. icon: rocket --- -Select your preferred language and follow the steps below to get started. +Tilebox works across the [Console](/console), [command line](/agentic-development/tilebox-cli), [SDKs](/sdks), and [coding agents](/agentic-development/onboard-your-agent). The first step is the same for each path: create an API key. + +Follow the steps below to get started. - - ## Start in a Notebook Explore the provided [Sample Notebooks](/sdks/python/sample-notebooks) to begin your journey with Tilebox. These notebooks offer a step-by-step guide to using the API and showcase many features supported by Tilebox Python clients. You can also use these notebooks as a foundation for your own projects. @@ -16,305 +16,107 @@ Explore the provided [Sample Notebooks](/sdks/python/sample-notebooks) to begin If you prefer to work locally, follow these steps to get started. - - - Install the Tilebox Python packages. - - - ```bash uv - uv add tilebox-datasets tilebox-workflows tilebox-storage - ``` - ```bash pip - pip install tilebox-datasets tilebox-workflows tilebox-storage - ``` - ```bash poetry - poetry add tilebox-datasets="*" tilebox-workflows="*" tilebox-storage="*" - ``` - ```bash pipenv - pipenv install tilebox-datasets tilebox-workflows tilebox-storage - ``` - - - - For new projects we recommend using [uv](https://docs.astral.sh/uv/). More information about installing the Tilebox Python SDKs can be found in the [Installation](/sdks/python/install) section. - - - - Create an API key by logging into the [Tilebox Console](https://console.tilebox.com), navigating to [Settings -> API Keys](https://console.tilebox.com/settings/api-keys), and clicking the "Create API Key" button. - - - Tilebox Console - Tilebox Console - - - Copy the API key and keep it somewhere safe. You will need it to authenticate your requests. - - - Use the datasets client to query data from a dataset. - - ```python Python - from tilebox.datasets import Client - - client = Client(token="YOUR_TILEBOX_API_KEY") - - # select a dataset - datasets = client.datasets() - dataset = datasets.open_data.copernicus.sentinel2_msi - - # and load data from a collection in a given time range - collection = dataset.collection("S2A_S2MSI1C") - data_january_2022 = collection.query(temporal_extent=("2022-01-01", "2022-02-01")) - ``` - - - Use the workflows client to create a task and submit it as a job. - - ```python Python - from tilebox.workflows import Client, Task - - # Replace with your actual token - client = Client(token="YOUR_TILEBOX_API_KEY") - - class HelloWorldTask(Task): - greeting: str = "Hello" - name: str = "World" - - def execute(self, context): - print(f"{self.greeting} {self.name}, from the main task!") - context.submit_subtask(HelloSubtask(name=self.name)) - - class HelloSubtask(Task): - name: str - - def execute(self, context): - print(f"Hello from the subtask, {self.name}!") - - # Initiate the job - jobs = client.jobs() - jobs.submit("parameterized-hello-world", HelloWorldTask(greeting="Greetings", name="Universe")) - - # Run the tasks - runner = client.runner(tasks=[HelloWorldTask, HelloSubtask]) - runner.run_all() - ``` - - - Review the following guides to learn more about the modules that make up Tilebox: - - - - Learn how to create a Timeseries dataset using the Tilebox Console. - - - Learn how to ingest an existing CSV dataset into a Timeseries dataset collection. - - - - - - - -## Start with Examples - -Explore the provided [Examples](/sdks/go/examples) to begin your journey with Tilebox. These examples offer a step-by-step guide to using the API and showcase many features supported by Tilebox Go clients. You can also use these examples as a foundation for your own projects. - -## Start on Your Device - -If you prefer to work locally, follow these steps to get started. - - - - Add the Tilebox library in your project. - - ```bash Shell - go get github.com/tilebox/tilebox-go - ``` - - Install [tilebox-generate](https://github.com/tilebox/tilebox-generate) command-line tool on your machine. - It's used to generate Go structs for Tilebox datasets. - - ```bash Shell - go install github.com/tilebox/tilebox-generate@latest - ``` - - - Create an API key by logging into the [Tilebox Console](https://console.tilebox.com), navigating to [Settings -> API Keys](https://console.tilebox.com/settings/api-keys), and clicking the "Create API Key" button. - - - Tilebox Console - Tilebox Console - - - Copy the API key and keep it somewhere safe. You will need it to authenticate your requests. - - - Run [tilebox-generate](https://github.com/tilebox/tilebox-generate) in the root directory of your Go project. - It generates the dataset type for Sentinel-2 MSI dataset. It will generate a `./protogen/tilebox/v1/sentinel2_msi.pb.go` file. - - ```bash Shell - tilebox-generate --dataset open_data.copernicus.sentinel2_msi --tilebox-api-key $TILEBOX_API_KEY - ``` - - - Use the datasets client to query data from a dataset. - - ```go Go - package main - - import ( - "context" - "log" - "log/slog" - "time" - - "github.com/paulmach/orb" - "github.com/paulmach/orb/encoding/wkt" - "github.com/tilebox/tilebox-go/datasets/v1" - "github.com/tilebox/tilebox-go/query" - ) - - func main() { - ctx := context.Background() - client := datasets.NewClient() - - // select a dataset - dataset, err := client.Datasets.Get(ctx, "open_data.copernicus.sentinel2_msi") - if err != nil { - log.Fatalf("Failed to get dataset: %v", err) - } - - // select a collection - collection, err := client.Collections.Get(ctx, dataset.ID, "S2A_S2MSI1C") - if err != nil { - log.Fatalf("Failed to get collection: %v", err) - } - - // load data from a collection in a given time range and spatial extent - colorado := orb.Polygon{ - {{-109.05, 41.00}, {-109.045, 37.0}, {-102.05, 37.0}, {-102.05, 41.00}, {-109.05, 41.00}}, - } - startDate := time.Date(2025, time.March, 1, 0, 0, 0, 0, time.UTC) - endDate := time.Date(2025, time.April, 1, 0, 0, 0, 0, time.UTC) - march2025 := query.NewTimeInterval(startDate, endDate) - - // You have to use tilebox-generate to generate the dataset type - var datapointsOverColorado []*v1.Sentinel2Msi - err = client.Datapoints.QueryInto(ctx, - dataset.ID, - &datapointsOverColorado, - datasets.WithCollectionIDs(collection.ID), - datasets.WithTemporalExtent(march2025), - datasets.WithSpatialExtent(colorado), - ) - if err != nil { - log.Fatalf("Failed to query datapoints: %v", err) - } - - slog.Info("Found datapoints over Colorado in March 2025", slog.Int("count", len(datapointsOverColorado))) - slog.Info("First datapoint over Colorado", - slog.String("id", datapointsOverColorado[0].GetId().AsUUID().String()), - slog.Time("event time", datapointsOverColorado[0].GetTime().AsTime()), - slog.Time("ingestion time", datapointsOverColorado[0].GetIngestionTime().AsTime()), - slog.String("geometry", wkt.MarshalString(datapointsOverColorado[0].GetGeometry().AsGeometry())), - slog.String("granule name", datapointsOverColorado[0].GetGranuleName()), - slog.String("processing level", datapointsOverColorado[0].GetProcessingLevel().String()), - slog.String("product type", datapointsOverColorado[0].GetProductType()), - // and so on... - ) - } - ``` - - - Use the workflows client to create a task and submit it as a job. - - ```go Go - package main - - import ( - "context" - "log/slog" - - "github.com/tilebox/tilebox-go/workflows/v1" - ) - - type HelloTask struct { - Greeting string - Name string - } - - func (t *HelloTask) Execute(ctx context.Context) error { - slog.InfoContext(ctx, "Hello from the main task!", slog.String("Greeting", t.Greeting), slog.String("Name", t.Name)) - - err := workflows.SubmitSubtasks(ctx, &HelloSubtask{Name: t.Name}) - if err != nil { - return err - } - - return nil - } - - type HelloSubtask struct { - Name string - } - - func (t *HelloSubtask) Execute(context.Context) error { - slog.Info("Hello from the subtask!", slog.String("Name", t.Name)) - return nil - } - - func main() { - ctx := context.Background() - - // Replace with your actual token - client := workflows.NewClient() - - job, err := client.Jobs.Submit(ctx, "hello-world", - []workflows.Task{ - &HelloTask{ - Greeting: "Greetings", - Name: "Tilebox", - }, - }, - ) - if err != nil { - slog.ErrorContext(ctx, "Failed to submit job", slog.Any("error", err)) - return - } - - slog.InfoContext(ctx, "Job submitted", slog.String("job_id", job.ID.String())) - - runner, err := client.NewTaskRunner(ctx) - if err != nil { - slog.Error("failed to create task runner", slog.Any("error", err)) - return - } - - err = runner.RegisterTasks( - &HelloTask{}, - &HelloSubtask{}, - ) - if err != nil { - slog.Error("failed to register task", slog.Any("error", err)) - return - } - - runner.Run(ctx) - } - ``` - - - Review the following guides to learn more about the modules that make up Tilebox: - - - - Learn how to create a Timeseries dataset using the Tilebox Console. - - - Learn how to ingest an existing CSV dataset into a Timeseries dataset collection. - - - - - - - + + + Create an API key by logging into the [Tilebox Console](https://console.tilebox.com), navigating to [Settings -> API Keys](https://console.tilebox.com/settings/api-keys), and clicking the "Create API Key" button. + + + Tilebox Console + Tilebox Console + + + Then, add it to your environment + + ```bash + export TILEBOX_API_KEY= + ``` + + + Tilebox can be used from the browser, terminal, via SDKs or coding agents. Choose the access point that matches how you want to start. + + + + Use Tilebox from the browser. Create API keys, inspect datasets and monitor jobs without installing anything. + + + + Work locally from your terminal. Use this path for setup, local development, and operational commands. + + + + Use Tilebox from your application code in Python or Go. + + + + Setup your coding agent, and give it access to Tilebox-specific skills and resources for catalogs, workflows, jobs and automations. + + + + + Use the datasets client to query data from a dataset. + + ```python Python + from tilebox.datasets import Client + + client = Client(token="YOUR_TILEBOX_API_KEY") + + # select a dataset + datasets = client.datasets() + dataset = datasets.open_data.copernicus.sentinel2_msi + + # and load data from a collection in a given time range + collection = dataset.collection("S2A_S2MSI1C") + data_january_2022 = collection.query(temporal_extent=("2022-01-01", "2022-02-01")) + ``` + + + Use the workflows client to create a task and submit it as a job. + + ```python Python + from tilebox.workflows import Client, Runner, Task + + # Replace with your actual token + client = Client(token="YOUR_TILEBOX_API_KEY") + + class HelloWorldTask(Task): + greeting: str = "Hello" + name: str = "World" + + def execute(self, context): + print(f"{self.greeting} {self.name}, from the main task!") + context.submit_subtask(HelloSubtask(name=self.name)) + + class HelloSubtask(Task): + name: str + + def execute(self, context): + print(f"Hello from the subtask, {self.name}!") + + # Initiate the job + jobs = client.jobs() + jobs.submit("parameterized-hello-world", HelloWorldTask(greeting="Greetings", name="Universe")) + + # Run the tasks + runner = Runner(tasks=[HelloWorldTask, HelloSubtask]) + runner.connect_to(client).run_all() + ``` + + + Review the following guides to learn more about the modules that make up Tilebox: + + + + Learn how to create a Timeseries dataset using the Tilebox Console. + + + Learn how to ingest an existing CSV dataset into a Timeseries dataset collection. + + + Package a Python workflow project, publish a release, and deploy it to a cluster. + + + Use a coding agent with the Tilebox CLI to build, deploy, run, and debug workflow releases. + + + + diff --git a/sdks/python/install.mdx b/sdks/python/install.mdx index 0c4eaa5..311cefe 100644 --- a/sdks/python/install.mdx +++ b/sdks/python/install.mdx @@ -13,7 +13,7 @@ Tilebox offers a Python SDK for accessing Tilebox services. The SDK includes sep Access Tilebox datasets from Python - Workflow client and task runner for Tilebox + Workflow client and runner for Tilebox diff --git a/workflows/build-and-deploy/cluster-deployments.mdx b/workflows/build-and-deploy/cluster-deployments.mdx new file mode 100644 index 0000000..ee25ceb --- /dev/null +++ b/workflows/build-and-deploy/cluster-deployments.mdx @@ -0,0 +1,76 @@ +--- +title: Cluster Deployments +description: Understand how workflow release deployments map releases to clusters and control what release runners can execute. +icon: rocket +--- + +Deploying a workflow release creates a cluster deployment. A cluster deployment maps one workflow release to one cluster, which controls what workflow code release runners on that cluster can load and execute. + +Creating, updating, or removing a deployment does not submit a job, start a runner, delete a release, or change past jobs. Existing release runners for that cluster observe the deployment change and update which workflow code they can execute. + +## Deployment model + +A workflow can have many releases, and each release can be deployed to one or more clusters. A cluster can also have releases from multiple workflows deployed at the same time. Release runners on that cluster load the deployed releases and execute compatible tasks when matching jobs are submitted. + +For a single workflow, a cluster has one active release deployment at a time. Updating the deployment changes which release new compatible task executions use on that cluster. + +## Default and explicit clusters + +Every team has a default cluster. If a deployment command does not specify a cluster or target, the default cluster is used. This is useful for quick tests when you do not need separate development or production clusters. + +Explicit clusters are useful when environments should run different workflow code. For example, a development cluster can run the latest release while a production cluster keeps running a release that has already been tested. + +Use explicit release IDs for production-like deployments when you need to know exactly which release is active. Use latest-release deployment only when that behavior is acceptable for the environment. + +## Targets + +Targets are named cluster groups defined in `tilebox.workflow.toml`. They let a deployment refer to a logical environment instead of repeating cluster slugs. + +```toml +[targets.dev] +clusters = ["dev-cluster"] + +[targets.production] +clusters = ["prod-a", "prod-b"] +``` + +A target can contain one cluster or multiple clusters. The same release is deployed to each cluster in the target. + +## Release runners and task execution + +A release runner runs in an environment you control and watches one cluster. It loads workflow code from releases deployed to that cluster, starts Python workflow runtime processes through the Tilebox command-line tool, and updates its task set when deployments change. + +Deployment alone is not enough for execution. A compatible release runner must be running for the cluster, and submitted tasks must match task identifiers and compatible versions from one of the deployed releases. + + + You still choose where the release runner process runs. Tilebox manages release selection through cluster deployments, while your environment provides the compute process that runs `tilebox runner start`. + + +## Job and deployment clusters must match + +When you submit a job, its root task is submitted to a cluster. For a release runner to execute that task, the job cluster, release deployment cluster, and release runner cluster must match. + +If a job stays queued, check the cluster first. A task submitted to `dev-cluster` is not claimed by a release runner on `staging-cluster`, even if the same workflow release is deployed somewhere else. + +## Removing deployments + +Removing a release deployment removes the active release mapping from the selected cluster or target. It does not delete the workflow, release, artifact, or past jobs. + +Release runners update their task set after a deployment is removed. They stop accepting new tasks that require the removed release, but already completed jobs and release records remain available for inspection. + +## Related documentation + + + + Understand how releases are built, checked, published, deployed, and used by release runners. + + + Define deployment targets in `tilebox.workflow.toml`. + + + Learn how clusters determine where workflow tasks can run. + + + Learn how release runners load deployed releases and execute compatible tasks. + + diff --git a/workflows/build-and-deploy/project-structure.mdx b/workflows/build-and-deploy/project-structure.mdx new file mode 100644 index 0000000..1cc4ac3 --- /dev/null +++ b/workflows/build-and-deploy/project-structure.mdx @@ -0,0 +1,100 @@ +--- +title: Project Structure +description: Structure a Python workflow project so releases can be built reproducibly and release runners can discover and execute its tasks. +icon: folder-tree +--- + +A Python workflow project contains task classes, a `Runner` definition, and a `tilebox.workflow.toml` file. The Tilebox command-line tool uses these files to build a workflow release, discover the tasks it can execute, and make the release available to release runners after deployment. + +Keep the project small and importable from its root. Release builds import the configured runner object, check that the Python runtime starts, and package the selected files into an immutable artifact. + +## Project structure + +Use a layout where task code and the runner definition are importable from the project root. + + + + + + + + + + + + + + +## Define tasks + +Put task classes in a module that can be imported during release validation. + +```python Python +# my_workflow/tasks.py +from tilebox.workflows import ExecutionContext, Task + + +class ProcessScene(Task): + scene_id: str + + @staticmethod + def identifier() -> tuple[str, str]: + return "tilebox.com/example/ProcessScene", "v1.0" + + def execute(self, context: ExecutionContext) -> None: + context.current_task.display = f"ProcessScene({self.scene_id})" + context.logger.info("Processing scene", scene_id=self.scene_id) +``` + +Use explicit identifiers for workflow code that will be published. A stable identifier lets existing jobs continue to run after refactors and compatible bug fixes. + +## Define the runner + +Create a module that exports a `Runner` object. This object defines the task registrations for the workflow, and release builds import it during validation. + +```python Python +# my_workflow/runner.py +from tilebox.workflows import Runner +from tilebox.workflows.cache import LocalFileSystemCache + +from my_workflow.tasks import ProcessScene + + +runner = Runner( + tasks=[ProcessScene], + cache=LocalFileSystemCache(), +) +``` + +## Configure the workflow release + +Point `tilebox.workflow.toml` at the exported `Runner` object and include the files required by the release runner. + +```toml +[workflow] +slug = "my-workflow" +root = "." +runner = "my_workflow.runner:runner" + +[build] +include = [ + "pyproject.toml", + "uv.lock", + "my_workflow/**", +] +exclude = [ + ".venv/**", + "**/__pycache__/**", + "**/*.pyc", + ".pytest_cache/**", +] +use_gitignore = true +``` + +The Tilebox command-line tool imports the runner object during `build-release` and `publish-release`, discovers its task identifiers, and records them in the workflow release. A release runner later loads the release artifact and invokes the Python runtime through the command-line tool. + +## Keep release artifacts small + +Include source code, lock files, and small configuration. Exclude local virtual environments, test caches, downloaded provider data, model checkpoints, generated outputs, and other large runtime artifacts. + +If a task needs a large model or reference file, fetch it lazily at runtime and cache it in a deterministic runner-local path such as `~/.cache/tilebox/...`. The workflow should still work when a release runner starts with an empty cache. diff --git a/workflows/build-and-deploy/releases.mdx b/workflows/build-and-deploy/releases.mdx new file mode 100644 index 0000000..47895a5 --- /dev/null +++ b/workflows/build-and-deploy/releases.mdx @@ -0,0 +1,72 @@ +--- +title: Release Lifecycle +description: Understand how workflow releases are built, checked, published, deployed, and used by release runners. +icon: box-open +--- + +Workflow releases make workflow code reproducible. A release captures the files selected from a Python workflow project, the runtime entrypoint, and the task identifiers discovered from the configured runner. Release runners later use that release to execute compatible tasks on clusters that have the release deployed. + +The release lifecycle separates code packaging from cluster rollout. Building and publishing create immutable release content. Deploying a release changes which workflow code release runners on a cluster can execute. + +## Lifecycle overview + +| Stage | What it does | What changes | +| --- | --- | --- | +| Workflow creation | Creates the long-lived workflow object and slug. | Adds a workflow record. | +| Local build | Resolves files, creates a deterministic artifact, and checks the configured Python runtime. | Writes local build output and cache entries. | +| Publishing | Uploads the artifact when needed and creates an immutable release record. | Adds or reuses a workflow release. | +| Deployment | Maps a release to one or more clusters. | Changes which release runners can execute the release. | +| Job execution | Submits tasks that match the release's task identifiers and compatible versions. | Creates or updates jobs and tasks. | + +## Workflow object + +A workflow is the stable object that releases belong to. It has a slug, such as `my-workflow`, which is referenced from `tilebox.workflow.toml`. The workflow object is not the executable code itself; it provides the long-lived name under which releases are published. + +Create the workflow object once, then keep publishing new releases to it as the code changes. This gives release runners and deployment commands a stable workflow identity while each release remains immutable. + +## Release artifact + +The release artifact is the package that release runners download and execute. Tilebox assembles it from the files selected by the `[build]` section in `tilebox.workflow.toml`, after include patterns, exclude patterns, and `.gitignore` handling are applied. + +Artifacts should contain source code, lock files, and small configuration. They should not contain downloaded source data, model checkpoints, generated outputs, local virtual environments, or task caches. If a workflow needs large runtime assets, the task code should fetch them at runtime and cache them in the runner environment. + +## Runtime checks and task discovery + +During a build, Tilebox starts the configured Python workflow runtime and imports the runner object or command from `tilebox.workflow.toml`. This checks that the artifact contains the files needed to start the workflow runtime. + +The same step discovers the task identifiers registered by the workflow. Those identifiers become part of the release metadata and determine which submitted tasks a release runner can execute after the release is deployed. + +## Publishing + +Publishing turns checked local release content into a workflow release in Tilebox. The release record points to the artifact, the discovered task registrations, and the workflow slug it belongs to. + +Publishing is idempotent for identical release content and artifact digests. If the same release already exists, Tilebox returns the existing release instead of creating another copy. This is useful in agent-assisted and automated workflows where the same build step may run more than once. + +## Deployment + +Publishing a release does not make release runners execute it. A release must be deployed to a cluster before release runners on that cluster load its workflow code and execute compatible tasks from it. + +A deployment maps one release to one cluster. Different clusters can run different releases of the same workflow, which makes it possible to test a release in a development cluster before promoting it to production-like clusters. + +## Compatibility and retries + +Release compatibility matters when retrying existing jobs. A failed job can resume from failed tasks after you deploy a compatible fixed release to the same cluster. The task identifier name, major version, and input schema must remain compatible with the failed task that is being retried. + +Use a new major task version and submit a new job when the task input schema or behavior is no longer compatible with tasks submitted before the change. This keeps old jobs reproducible while allowing breaking workflow changes to move forward safely. + +## Related documentation + + + + Structure a Python workflow project for workflow releases and release runners. + + + Configure `tilebox.workflow.toml`, build inputs, and deployment targets. + + + Map workflow releases to clusters and run them with release runners. + + + Learn how workflows, releases, artifacts, and cluster deployments fit together. + + diff --git a/workflows/build-and-deploy/workflow-configuration.mdx b/workflows/build-and-deploy/workflow-configuration.mdx new file mode 100644 index 0000000..40c47af --- /dev/null +++ b/workflows/build-and-deploy/workflow-configuration.mdx @@ -0,0 +1,81 @@ +--- +title: Workflow Configuration +description: Understand the tilebox.workflow.toml fields that define workflow identity, release contents, runtime entrypoints, and deployment targets. +icon: file-code +--- + +`tilebox.workflow.toml` describes how a Python workflow project becomes a workflow release. It identifies the Tilebox workflow, defines which files belong in the release artifact, selects the runtime entrypoint, and optionally names cluster targets for deployments. + +The Tilebox command-line tool searches upward from the current directory for the nearest configuration file. This lets release commands run from the project root or from subdirectories inside the project. + +## Minimal configuration + +```toml +[workflow] +slug = "my-workflow" +root = "." +runner = "my_workflow.runner:runner" + +[build] +include = [ + "pyproject.toml", + "uv.lock", + "my_workflow/**", +] +exclude = [ + ".venv/**", + "**/__pycache__/**", + "**/*.pyc", + ".pytest_cache/**", +] +use_gitignore = true +``` + +## Workflow section + +The `[workflow]` section identifies the workflow and tells the Tilebox command-line tool how to start the Python workflow runtime during release checks and release execution. + +| Field | Required | Description | +| --- | --- | --- | +| `slug` | Yes | Stable workflow slug, such as `my-workflow`. Releases are published under this workflow. | +| `root` | No | Project root for build paths. Defaults to `"."`. | +| `runner` | One of `runner` or `command` | Python module/object path to a `Runner` object. The Tilebox command-line tool runs it with `uv run python -m tilebox.workflows.runner `. | +| `command` | One of `runner` or `command` | Custom worker command as an array of strings. Use this only when the standard runner object path does not fit your runtime. | + +Set exactly one of `runner` or `command`. + +## Build section + +The `[build]` section controls which files are included in the release artifact. + +| Field | Required | Description | +| --- | --- | --- | +| `include` | Yes | Glob patterns, relative to `[workflow].root`, for files to include. | +| `exclude` | No | Glob patterns to exclude from the included file set. | +| `use_gitignore` | No | Whether to apply `.gitignore`. Defaults to `true`. | + +Include lock files and source files so release runners resolve the same dependencies that were tested during release validation. Exclude local environments, generated files, caches, and large runtime assets. + +## Targets + +Targets name reusable cluster groups. They are optional, but useful when the same release should be deployed to a known set of development, staging, or production clusters. + +```toml +[targets.dev] +clusters = ["dev-cluster"] + +[targets.production] +clusters = ["prod-a", "prod-b"] +``` + +A target can contain one cluster or multiple clusters. Deployment commands can refer to the target name instead of listing the same cluster slugs every time. For production-like deployments, use explicit release IDs so the active release is traceable. + +## Validation rules + +The Tilebox command-line tool checks configuration before building or publishing a release: + +- `workflow.slug` is required. +- Exactly one of `workflow.runner` or `workflow.command` is required. +- `build.include` must contain at least one pattern. +- Unknown TOML keys fail configuration loading. +- Build paths must stay within the configured workflow root. diff --git a/workflows/caches.mdx b/workflows/caches.mdx index 787515a..3f5c148 100644 --- a/workflows/caches.mdx +++ b/workflows/caches.mdx @@ -8,7 +8,7 @@ icon: box-archive The cache API is currently experimental and may undergo changes in the future. Many more features and new [backends](#cache-backends) are on the roadmap. There might be breaking changes to the Cache API in the future. -Caches are configured at the [task runner](/workflows/concepts/task-runners) level. Because task runners can be deployed across multiple locations, caches must be accessible from all task runners contributing to a workflow. +Caches are configured at the [runner](/workflows/concepts/runners) level. Because runners can be deployed across multiple locations, caches must be accessible from all runners contributing to a workflow. Currently, the default cache implementation uses a Google Cloud Storage bucket, providing a scalable method to share data between tasks. For quick prototyping and local development, you can also use a local file system cache, which is included by default. @@ -16,7 +16,7 @@ If needed, you can create your own cache backend by implementing the `Cache` int ## Configuring a Cache -You can configure a cache while creating a task runner by passing a cache instance to the `cache` parameter. To use an in-memory cache, use `tilebox.workflows.cache.InMemoryCache`. This implementation is helpful for local development and quick testing. For alternatives, see the supported [cache backends](#cache-backends). +You can configure a cache while creating a runner by passing a cache instance to the `cache` parameter. To use an in-memory cache, use `tilebox.workflows.cache.InMemoryCache`. This implementation is helpful for local development and quick testing. For alternatives, see the supported [cache backends](#cache-backends). ```python Python @@ -97,7 +97,7 @@ runner = client.runner( ### Local File System Cache -A cache implementation backed by a local file system. It's suitable for quick prototyping and local development, assuming all task runners share the same machine or access the same file system. +A cache implementation backed by a local file system. It's suitable for quick prototyping and local development, assuming all runners share the same machine or access the same file system. ```python Python @@ -118,7 +118,7 @@ runner = client.runner( ### In-Memory Cache -A simple in-memory cache useful for quick prototyping and development. The data is not shared between task runners and is lost upon task runner restarts. Use this cache only for workflows executed on a single task runner. +A simple in-memory cache useful for quick prototyping and development. The data is not shared between runners and is lost upon runner restarts. Use this cache only for workflows executed on a single runner. ```python Python @@ -168,7 +168,7 @@ The following snippet illustrates storing and retrieving data from the cache. In this example, data stored under the key `"data"` can be any size that fits the cache backend constraints. Ensure the key remains unique within the job's scope to avoid conflicts. -To test the workflow, you can start a local task runner using the `InMemoryCache` backend. Then, submit a job to execute the `ProducerTask` and inspect the logs emitted by the `ConsumerTask`. +To test the workflow, you can start a local runner using the `InMemoryCache` backend. Then, submit a job to execute the `ProducerTask` and inspect the logs emitted by the `ConsumerTask`. ```python Python @@ -248,7 +248,7 @@ class PrintSum(Task): ``` -Submitting a job of the `CacheGroupDemo` and running it with a task runner can be done as follows: +Submitting a job of the `CacheGroupDemo` and running it with a runner can be done as follows: ```python Python diff --git a/workflows/concepts/clusters.mdx b/workflows/concepts/clusters.mdx index a6f445b..348fa91 100644 --- a/workflows/concepts/clusters.mdx +++ b/workflows/concepts/clusters.mdx @@ -1,39 +1,46 @@ --- title: Clusters +description: Use clusters to group runners, target job execution, and control where workflow releases are deployed. icon: circle-nodes --- - - Clusters are a logical grouping for [task runners](/workflows/concepts/task-runners). - Using clusters, you can scope certain tasks to a specific group of task runners. - Tasks, which are always submitted to a specific cluster, are only executed on task runners assigned to the same cluster. - +A cluster determines where workflow tasks can run. When you submit a [job](/workflows/concepts/jobs), its root [task](/workflows/concepts/tasks) is submitted to a cluster, and only [runners](/workflows/concepts/runners) assigned to that cluster can claim it. + +By default, subtasks are submitted to the same cluster as their parent task. Task code can submit individual subtasks to other clusters when a workflow needs to execute across different environments. + +[Workflow releases](/workflows/concepts/workflow-releases) can be deployed to individual clusters. This lets [release runners](/workflows/concepts/runners#release-runners) automatically load workflow code and execute compatible tasks from those releases. ## Use Cases -Use clusters to organize [task runners](/workflows/concepts/clusters) into logical groups, which can help with: +Use clusters to organize [runners](/workflows/concepts/runners) and workflow deployments into logical groups, which can help with: -- Targeting specific task runners for a particular job -- Reserving a group of task runners for specific purposes, such as running certain types of batch jobs +- Targeting specific runners for a particular job +- Reserving a group of runners for specific purposes, such as running certain types of batch jobs - Setting up different clusters for different environments (like development and production) +- Deploying different workflow releases to development, staging, or production clusters + +Even within the same cluster, runners may have different capabilities. A direct runner advertises the tasks registered in its own process. A release runner advertises tasks from the workflow releases currently deployed to that cluster. + +## Cluster deployments + +A workflow release deployment maps one workflow release to one cluster. Deploying a release does not submit a job. It only makes the release available to release runners on that cluster. -Even when using different clusters, task runners within the same cluster may still have different capabilities, such as different registered tasks. -If multiple task runners have the same set of registered tasks, you can assign them to different clusters to target specific task runners for a particular job. +A release runner can run multiple deployed releases for the same cluster. While it runs, it polls cluster deployment state and updates its task registrations when releases are deployed, updated, or removed. -### Adding Task Runners to a Cluster +### Adding runners to a cluster -You can add task runners to a cluster by specifying the [cluster's slug](#cluster-slug) when [registering a task runner](/workflows/concepts/task-runners). -Each task runner must always be assigned to a cluster. +You can add direct runners to a cluster by specifying the [cluster's slug](#cluster-slug) when starting the runner from your SDK code. You can add release runners to a cluster with `tilebox runner start --cluster `. +Each runner must always be assigned to a cluster. If no cluster is specified, Tilebox uses the default cluster. ## Default Cluster Each team has a default cluster that is automatically created for them. -This cluster is used when no cluster is specified when [registering a task runner](/workflows/concepts/task-runners) or [submitting a job](/workflows/concepts/jobs). +This cluster is used when no cluster is specified when [starting a runner](/workflows/concepts/runners), [deploying a release](/workflows/build-and-deploy/cluster-deployments), or [submitting a job](/workflows/concepts/jobs). This is useful when you are just getting started and don't need to create any custom clusters yet. ## Managing Clusters -Before registering a task runner or submitting a job, you must create a cluster. You can also list, fetch, and delete clusters as needed. The following sections explain how to do this. +Before starting a runner or submitting a job to a custom cluster, create the cluster. You can also list, fetch, and delete clusters as needed. The following sections explain how to do this. To manage clusters, first instantiate a cluster client using the `clusters` method in the workflows client. @@ -158,7 +165,7 @@ To delete a cluster, use the `delete` method and pass the cluster's slug: ## Jobs Across Different Clusters When [submitting a job](/workflows/concepts/jobs), you need to specify which cluster the job's root task should be executed on. -This allows you to direct the job to a specific set of task runners. +This allows you to direct the job to a specific set of runners. By default, all sub-tasks within a job are also submitted to the same cluster, but this can be overridden to submit sub-tasks to different clusters if needed. See the example below for a job that spans across multiple clusters. @@ -173,7 +180,7 @@ class MultiCluster(Task): other_cluster = context.submit_subtask( DummyTask(), - # this task runs only on a task runner in the "other-cluster" cluster + # this task runs only on a runner in the "other-cluster" cluster cluster="other-cluster-As3dcSb3D9SAdK", # dependencies can be specified across clusters depends_on=[same_cluster], @@ -214,7 +221,7 @@ func (t *MultiCluster) Execute(ctx context.Context) error { otherCluster, err := workflows.SubmitSubtask( ctx, &DummyTask{}, - // this task runs only on a task runner in the "other-cluster" cluster + // this task runs only on a runner in the "other-cluster" cluster subtask.WithClusterSlug("other-cluster-As3dcSb3D9SAdK"), // dependencies can be specified across clusters subtask.WithDependencies(sameCluster), @@ -246,6 +253,6 @@ func main() { ``` -This workflow requires at least two task runners to complete. One must be in the "testing" cluster, and the other must be in the "other-cluster" cluster. -If no task runners are available in the "other-cluster," the task submitted to that cluster will remain queued until a task runner is available. -It won't execute on a task runner in the "testing" cluster, even if the task runner has the `DummyTask` registered. +This workflow requires at least two runners to complete. One must be in the "testing" cluster, and the other must be in the "other-cluster" cluster. +If no runners are available in the "other-cluster," the task submitted to that cluster will remain queued until a runner is available. +It won't execute on a runner in the "testing" cluster, even if that runner has the `DummyTask` registered. diff --git a/workflows/concepts/jobs.mdx b/workflows/concepts/jobs.mdx index a7df186..8b59b53 100644 --- a/workflows/concepts/jobs.mdx +++ b/workflows/concepts/jobs.mdx @@ -1,17 +1,18 @@ --- title: Jobs +description: Submit workflow jobs, target clusters, inspect job state, visualize execution, cancel jobs, and retry failed work. icon: diagram-project --- - - A job is a specific execution of a workflow with designated input parameters. It consists of one or more tasks that can run in parallel or sequentially, based on their dependencies. Submitting a job involves creating a root task with specific input parameters, which may trigger the execution of other tasks within the same job. - +A job is one execution of a workflow, starting from a root [task](/workflows/concepts/tasks) with concrete input values. As the root task runs, it can submit subtasks, creating the task graph that belongs to the same job. + +When you submit a job, its root task is assigned to a [cluster](/workflows/concepts/clusters). Compatible [runners](/workflows/concepts/runners) execute tasks as they become eligible, and Tilebox updates job state from submission through completion, failure, cancellation, or retry. ## Submission To execute a [task](/workflows/concepts/tasks), it must be initialized with concrete inputs and submitted as a job. The task will then run within the context of the job, and if it generates sub-tasks, those will also execute as part of the same job. -After submitting a job, the root task is scheduled for execution, and any [eligible task runner](/workflows/concepts/task-runners#task-selection) can pick it up and execute it. +After submitting a job, the root task is scheduled for execution, and any [eligible runner](/workflows/concepts/runners#task-selection) can pick it up and execute it. First, instantiate a job client by calling the `jobs` method on the workflow client. @@ -54,7 +55,7 @@ if err != nil { ``` -Once a job is submitted, it's immediately scheduled for execution. The root task will be picked up and executed as soon as an [eligible task runner](/workflows/concepts/task-runners#task-selection) is available. +Once a job is submitted, it's immediately scheduled for execution. The root task will be picked up and executed as soon as an [eligible runner](/workflows/concepts/runners#task-selection) is available. ## Retry Handling @@ -197,7 +198,7 @@ Every Job is always in exactly one of the following states:
Job submitted Started
-
The job has started, some tasks are already `COMPUTED`, but others are still `QUEUED`, waiting for an [eligible task runner](/workflows/concepts/task-runners#task-selection) to pick them up. However no task is currently `RUNNING`.
+
The job has started, some tasks are already `COMPUTED`, but others are still `QUEUED`, waiting for an [eligible runner](/workflows/concepts/runners#task-selection) to pick them up. However no task is currently `RUNNING`.
Job submitted Completed
@@ -264,7 +265,7 @@ The following diagram represents the job execution as a graph. Each task is show Color coding of task states -Below is another visualization of a job currently being executed by multiple task runners. +Below is another visualization of a job currently being executed by multiple runners. Job being executed by multiple runners @@ -274,7 +275,7 @@ Below is another visualization of a job currently being executed by multiple tas From the diagram, the following can be inferred: - The root task, `MyTask`, has been executed, is marked as `COMPUTED` and submitted three sub-tasks. -- At least three task runners are available, as three tasks currently are executed simultaneously. +- At least three runners are available, as three tasks currently are executed simultaneously. - The `SubTask` that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed. - The queued `DependentTask` requires the `LeafTask` to complete before it can be executed. @@ -355,7 +356,7 @@ job, err := client.Jobs.Submit(ctx, "custom-display-names", ## Cancellation -You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners. +You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by runners and executed even if runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by runners. Use the `cancel` method on the job client to cancel a job. @@ -602,7 +603,7 @@ func (t *PrintMovieStats) Execute(ctx context.Context) error { ``` -With this fix, and after redeploying the task runners with the updated `PrintMovieStats` implementation, you can retry the job: +With this fix, and after redeploying the runners with the updated `PrintMovieStats` implementation, you can retry the job: ```python Python diff --git a/workflows/concepts/runners.mdx b/workflows/concepts/runners.mdx new file mode 100644 index 0000000..f91719f --- /dev/null +++ b/workflows/concepts/runners.mdx @@ -0,0 +1,357 @@ +--- +title: Runners +sidebarTitle: Runners +description: Learn how Tilebox tasks are executed, selected for execution, versioned, and compare release runners and direct runners. +icon: list-check +--- + +Runners are continuously running processes that listen for new tasks to execute. They claim queued tasks, execute them, and report task results back to Tilebox. You can start multiple runners in parallel to execute tasks concurrently or to provide different hardware and network access. + + + Runner architecture showing jobs submitted to Tilebox and a runner receiving assigned tasks, executing them, reporting results, and optionally submitting subtasks + Runner architecture showing jobs submitted to Tilebox and a runner receiving assigned tasks, executing them, reporting results, and optionally submitting subtasks + + +## Runner modes + +Tilebox supports two runner modes. A **release runner** is started with the Tilebox CLI, loads [workflow releases](/workflows/concepts/workflow-releases) deployed to its cluster, and reacts to updated cluster deployments while it runs. A **direct runner** is a standalone script, service, or binary that uses the Tilebox SDK to connect to the API and register tasks directly. +Release runners still run in an environment you control, but the workflow code they execute is selected through cluster deployments. This separates compute operations from workflow release rollout. Direct runners are scaled and rolled out by your own infrastructure. + + + Release runner and direct runner modes both claim tasks from the Tilebox API after discovering task identifiers from deployments or SDK code + Release runner and direct runner modes both claim tasks from the Tilebox API after discovering task identifiers from deployments or SDK code + + +The two modes differ in how the runner gets its task registrations and how you roll out code changes. + +| | Release runner | Direct runner | +| --- | --- | --- | +| Executable tasks | Loaded from workflow releases deployed to the runner's cluster | Registered directly in your script, service, or binary | +| Runtime | [Tilebox CLI](/agentic-development/tilebox-cli.mdx) invokes the Python workflow project runtime from the release artifact | Your python script or Go binary, implemented with the Tilebox SDK | +| Start command | `tilebox runner start --cluster ` | `python runner.py`, `./my-runner-binary`, or your own deployment | +| rollout model | You publish releases and deploy them to clusters, the runner automatically picks up deployment changes | You deploy, restart, scale, and roll back the runner process yourself | +| Best for | Reproducible releases, fast cluster deployments, and AI-assisted workflow iteration | Custom deployments, Go runners, and direct SDK control | + +### Release runners + +A release runner runs Python workflow releases deployed to a cluster. Start it with the Tilebox CLI: + +```bash +tilebox runner start --cluster dev-cluster +``` + +The release runner can run releases from multiple workflows at the same time, however only one release per workflow. It continously polls the selected cluster for deployment updates, downloads missing release artifacts, validates and starts python processes for each workflow release, and requests work for all the task identifiers from it's deployed releases. When a new release is deployed or removed, the runner updates the task set it can execute. + + + Release runners currently only support Python workflow projects. The Tilebox CLI invokes the Python runner environment from the published release artifact using `uv`. + + +### Direct runners + +A direct runner connects to the Tilebox API from your own code. It is useful when you want full control over the process, deployment environment, dependencies, startup behavior, and scaling. You are responsible for deploying the script or binary, keeping it running, rolling out code changes, and rolling back when needed. + +Define a `Runner` instance once and connect it to a `Client` during startup. + + +```python Python +from tilebox.workflows import Client, Runner +from my_workflow.tasks import MyTask, OtherTask + +runner = Runner(tasks=[MyTask, OtherTask]) + +if __name__ == "__main__": + client = Client() + runner.connect_to(client, cluster="dev-cluster").run_forever() +``` +```go Go +package main + +import ( + "context" + "log/slog" + + "github.com/tilebox/tilebox-go/workflows/v1" + "github.com/tilebox/tilebox-go/workflows/v1/runner" + "github.com/my_org/myworkflow" +) + +func main() { + ctx := context.Background() + client := workflows.NewClient() + + workflowRunner, err := client.NewTaskRunner(ctx, runner.WithClusterSlug("dev-cluster")) + if err != nil { + slog.Error("failed to create runner", slog.Any("error", err)) + return + } + + if err := workflowRunner.RegisterTasks(&myworkflow.MyTask{}, &myworkflow.OtherTask{}); err != nil { + slog.Error("failed to register tasks", slog.Any("error", err)) + return + } + + workflowRunner.RunForever(ctx) +} +``` + + +## Task selection + +For a runner to pick up a submitted task, all of these conditions must match: + +1. The task was submitted to the same [cluster](/workflows/concepts/clusters) as the runner. +2. The runner advertises a [task identifier with the same name](/workflows/concepts/tasks#task-identifiers) and a [compatible version](/workflows/concepts/tasks#semantic-versioning). +3. The task must be in `QUEUED` [state](/workflows/concepts/tasks#task-states), its [dependencies](/workflows/concepts/tasks#dependencies) are met and it's [maximum retries](/workflows/concepts/tasks#retry-handling) aren't exhausted. + +Release runners advertise the task identifiers from workflow releases currently deployed to the cluster. Direct runners advertise the task identifiers they register in the running process. + + + If multiple tasks match those conditions, Tilebox picks one and assigns it to a runner. The remaining tasks stay queued until another matching runner is available. Parallel runner processes can speed up the job execution in such cases. + + +## Parallelism + +Start multiple runner processes to execute tasks in parallel. Each runner process claims and executes tasks independently. You can run several release runners, several direct runners, or a mix of both in the same cluster. This allows for high parallelism and can be used to scale the execution of tasks to handle large workloads. + +To test this, run multiple instances of the runner script in different terminal windows on your local machine, or use the [CLI](/agentic-development/tilebox-cli) built-in `parallel` subcommand to start multiple runners in parallel. + +```bash +# start multiple release runners in parallel +> tilebox parallel -n 5 -- tilebox runner start --cluster + +# or direct runner mode +> tilebox parallel -n 5 -- python your_direct_runner.py +``` + +## Scaling + +One key benefit of this runner architecture is the **ability to scale even while workflows are executing**. You can start new runners at any time, and they can immediately pick up queued tasks to execute. It's not necessary to have an entire processing cluster available at the start of a workflow, as additional runners can be started and stopped as needed. + +This is particularly beneficial in cloud environments, where runners can be automatically started and stopped based on current workload, measured by metrics such as CPU usage. Here's an example scenario: + +1. A single runner process is actively waiting for work in a cloud environment. +2. A large workload is submitted to the workflow orchestrator, resulting in the runner picking up the first task. +3. The first task creates new sub-tasks for processing, which the runner also picks up. +4. As the workload increases, the runner's CPU usage rises, triggering the cloud environment to automatically start up new runner instances. +5. Newly started runners begin executing queued tasks, distributing the workload among all available runners. +6. Once the workload decreases, the cloud environment automatically stops some runners. +7. The remaining work continues while runner instances are scaled back down, until everything is done. +8. Only a single runner remains idle until new tasks arrive. + +CPU usage-based auto scaling is just one method to scale runners. Other metrics, such as memory usage or network bandwidth, are also supported by many cloud environments. + +In a future release, configuration options for scaling runners based on custom metrics (for example the number of queued tasks) are planned. + +## Distributed Execution + +Runners can be distributed across different compute environments. For instance, some data stored on-premise may need pre-processing, while further processing occurs in the cloud. A job might involve tasks that filter relevant on-premise data and publish it to the cloud, and other tasks that read data from the cloud and process it. In such scenarios, one runner can run on-premise and another in a cloud environment, resulting in them effectively collaborating on the same job. + +Another advantage of distributed runners is executing workflows that require specific hardware for certain tasks. For example, one task might need a GPU, while another requires extensive memory. + +Here's an example of a distributed workflow: + + + ```python Python + from tilebox.workflows import Task, ExecutionContext + + class DistributedWorkflow(Task): + def execute(self, context: ExecutionContext) -> None: + download_task = context.submit_subtask(DownloadData()) + process_task = context.submit_subtask( + ProcessData(), + depends_on=[download_task], + ) + + class DownloadData(Task): + """ + Download a dataset and store it in a shared internal bucket. + Requires a good network connection for high download bandwidth. + """ + def execute(self, context: ExecutionContext) -> None: + pass + + class ProcessData(Task): + """ + Perform compute-intensive processing of a dataset. + The dataset must be available in an internal bucket. + Requires access to a GPU for optimal performance. + """ + def execute(self, context: ExecutionContext) -> None: + pass + ``` +```go Go +package distributed + +import ( + "context" + "fmt" + "github.com/tilebox/tilebox-go/workflows/v1" + "github.com/tilebox/tilebox-go/workflows/v1/subtask" +) + +type DistributedWorkflow struct{} + +func (t *DistributedWorkflow) Execute(ctx context.Context) error { + downloadTask, err := workflows.SubmitSubtask(ctx, &DownloadData{}) + if err != nil { + return fmt.Errorf("failed to submit download subtask: %w", err) + } + + _, err = workflows.SubmitSubtask(ctx, &ProcessData{}, subtask.WithDependencies(downloadTask)) + if err != nil { + return fmt.Errorf("failed to submit process subtask: %w", err) + } + return nil +} + +// DownloadData Download a dataset and store it in a shared internal bucket. +// Requires a good network connection for high download bandwidth. +type DownloadData struct{} + +func (t *DownloadData) Execute(ctx context.Context) error { + return nil +} + +// ProcessData Perform compute-intensive processing of a dataset. +// The dataset must be available in an internal bucket. +// Requires access to a GPU for optimal performance. +type ProcessData struct{} + +func (t *ProcessData) Execute(ctx context.Context) error { + return nil +} +``` + + +To achieve distributed execution for this workflow, no single runner capable of executing all three of the tasks is set up. +Instead, two runners, each capable of executing one of the tasks, are set up: one in a high-speed network environment and the other with GPU access. +When the distributed workflow runs, the first runner picks up the `DownloadData` task, while the second picks up the `ProcessData` task. +The `DistributedWorkflow` does not require specific hardware, so it can be registered with both runners and executed by either one. + + + + +```python Python +from tilebox.workflows import Client + +client = Client() +high_network_speed_runner = client.runner( + tasks=[DownloadData, DistributedWorkflow] +) +high_network_speed_runner.run_forever() +``` +```go Go +package main + +import ( + "context" + "log/slog" + + "github.com/tilebox/tilebox-go/workflows/v1" +) + +func main() { + ctx := context.Background() + client := workflows.NewClient() + + highNetworkSpeedRunner, err := client.NewTaskRunner(ctx) + if err != nil { + slog.Error("failed to create runner", slog.Any("error", err)) + return + } + + err = highNetworkSpeedRunner.RegisterTasks( + &DownloadData{}, + &DistributedWorkflow{}, + ) + if err != nil { + slog.Error("failed to register tasks", slog.Any("error", err)) + return + } + + highNetworkSpeedRunner.RunForever(ctx) +} +``` + + + + + +```python Python +from tilebox.workflows import Client + +client = Client() +gpu_runner = client.runner( + tasks=[ProcessData, DistributedWorkflow] +) +gpu_runner.run_forever() +``` +```go Go +package main + +import ( + "context" + "log/slog" + + "github.com/tilebox/tilebox-go/workflows/v1" +) + +func main() { + ctx := context.Background() + client := workflows.NewClient() + + gpuRunner, err := client.NewTaskRunner(ctx) + if err != nil { + slog.Error("failed to create runner", slog.Any("error", err)) + return + } + + err = gpuRunner.RegisterTasks( + &ProcessData{}, + &DistributedWorkflow{}, + ) + if err != nil { + slog.Error("failed to register tasks", slog.Any("error", err)) + return + } + + gpuRunner.RunForever(ctx) +} +``` + + + + +Now, both `download_runner.py` and `gpu_runner.py` are started, in parallel, on different machines with the required hardware for each. When `DistributedWorkflow` is submitted, it executes on one of the two runners, and it's submitted sub-tasks are handled by the appropriate runner. + +In this case, since `ProcessData` depends on `DownloadData`, the GPU runner remains idle until the download completion, then picks up the processing task. + + + You can also differentiate between runners by specifying different [clusters](/workflows/concepts/clusters) and choosing specific clusters for sub-task submissions. For more details, see the [Clusters](/workflows/concepts/clusters) section. + + +## Task Failures + +If an unhandled exception occurs during task execution, the runner captures it and reports it back to the workflow orchestrator. The orchestrator then marks the task as failed, leading to [job cancellation](/workflows/concepts/jobs#cancellation) to prevent further tasks of the same job-that may not be relevant anymore-from being executed. + +A task failure does not result in losing all previous work done by the job. If the failure is fixable—by fixing a bug in a task implementation, ensuring the task has necessary resources, or simply retrying it due to a flaky network connection—it may be worth [retrying](/workflows/concepts/jobs#retries) the job. + +When retrying a job, all failed tasks are added back to the queue, allowing a runner to potentially execute them. If execution then succeeds, the job continues smoothly. Otherwise, the task will remain marked as failed and can be retried again if desired. + +For a release runner, publish a compatible fixed release and deploy it to the same cluster before retrying. For a direct runner, deploy the fixed script or binary before retrying. Keep task identifiers and input schemas compatible when you want an existing failed job to resume from the point of failure. + +## Task idempotency + +Since a task may be retried, it's possible that a task is executed more than once. Depending on where in the execution of the task it failed, it may have already performed some side effects, such as writing to a database, or sending a message to a queue. Because of that it's crucial to ensure that tasks are [idempotent](https://en.wikipedia.org/wiki/Idempotence). Idempotent tasks can be executed multiple times without altering the outcome beyond the first successful execution. + +A special case of idempotency involves submitting sub-tasks. After a task calls `context.submit_subtask` and then fails and is retried, those submitted sub-tasks of an earlier failed execution are automatically removed, ensuring that they can be safely submitted again when the task is retried. + +## Runner Crashes + +Tilebox Workflows has an internal mechanism to handle unexpected runner crashes. When a runner picks up a task, it periodically sends a heartbeat to the workflow orchestrator. If the orchestrator does not receive this heartbeat for a defined duration, it marks the task as failed and automatically attempts to [retry](/workflows/concepts/jobs#retries) it up to 10 times. This allows another runner to pick up the task and continue executing the job. + +This mechanism ensures that scenarios such as power outages, hardware failures, or dropped network connections are handled effectively, preventing any task from remaining in a running state indefinitely. + +## Observability + +Tilebox captures logs, spans, task states, and runner context from both runner modes. Use [Workflow observability](/workflows/observability/introduction) to inspect job execution, task failures, and runner behavior. diff --git a/workflows/concepts/task-runners.mdx b/workflows/concepts/task-runners.mdx deleted file mode 100644 index fe9a6af..0000000 --- a/workflows/concepts/task-runners.mdx +++ /dev/null @@ -1,369 +0,0 @@ ---- -title: Task Runners -icon: list-check ---- - - - Task runners are the execution agents within the Tilebox Workflows ecosystem that execute tasks. They can be deployed in different computing environments, including on-premise servers and cloud-based auto-scaling clusters. Task runners execute tasks as scheduled by the workflow orchestrator, ensuring they have the necessary resources and environment for effective execution. - - -## Implementing a Task Runner - -A task runner is a continuously running process that listens for new tasks to execute. You can start multiple task runner processes to execute tasks concurrently. When a task runner receives a task, it executes it and reports the results back to the workflow orchestrator. The task runner also handles any errors that occur during task execution, reporting them to the orchestrator as well. - -To execute a task, at least one task runner must be running and available. If no task runners are available, tasks will remain queued until one becomes available. - -To create and start a task runner, follow these steps: - - - - Instantiate a client connected to the Tilebox Workflows API. - - - Select or create a [cluster](/workflows/concepts/clusters) and specify its slug when creating a task runner. - If no cluster is specified, the task runner will use the default cluster. - - - Register tasks by specifying the task classes that the task runner can execute as a list to the `runner` method. - - - Call the `run_forever` method of the task runner to listen for new tasks until the task runner process is shut down. - - - -Here is a simple example demonstrating these steps: - - -```python Python -from tilebox.workflows import Client -# your own workflow: -from my_workflow import MyTask, OtherTask - -def main(): - client = Client() # 1. connect to the Tilebox Workflows API - runner = client.runner( - cluster= "dev-cluster" # 2. select a cluster to join (optional, omit to use the default cluster) - tasks=[MyTask, OtherTask] # 3. register tasks - ) - runner.run_forever() # 4. listen for new tasks to execute - -if __name__ == "__main__": - main() -``` -```go Go -package main - -import ( - "context" - "log/slog" - - "github.com/tilebox/tilebox-go/workflows/v1" - "github.com/tilebox/tilebox-go/workflows/v1/runner" - // your own workflow: - "github.com/my_org/myworkflow" -) - -func main() { - ctx := context.Background() - - // 1. connect to the Tilebox Workflows API - client := workflows.NewClient() - - // 2. select a cluster to join (optional, omit to use the default cluster) - runner, err := client.NewTaskRunner(ctx, runner.WithClusterSlug("dev-cluster")) - if err != nil { - slog.Error("failed to create task runner", slog.Any("error", err)) - return - } - - // 3. register tasks - err = runner.RegisterTasks( - &myworkflow.MyTask{}, - &myworkflow.OtherTask{}, - ) - if err != nil { - slog.Error("failed to register task", slog.Any("error", err)) - return - } - - // 4. listen for new tasks to execute - runner.Run(ctx) -} -``` - - -To start the task runner locally, run it as a script: - - -```bash Python -> python task_runner.py -``` -```bash Go -> go run . -``` - - -## Task Selection - -For a task runner to pick up a submitted task, the following conditions must be met: - -1. The [cluster](/workflows/concepts/clusters) where the task was submitted must match the task runner's cluster. -2. The task runner must have a registered task that matches the [task identifier](/workflows/concepts/tasks#task-identifiers) of the submitted task. -3. The version of the task runner's registered task must be [compatible](/workflows/concepts/tasks#semantic-versioning) with the submitted task's version. - -If a task meets these conditions, the task runner executes it. Otherwise, the task runner remains idle until a matching task is available. - - - Often, multiple submitted tasks match the conditions for execution. In that case, the task runner selects one of the tasks to execute, and the remaining tasks stay in a queue until the selected task is completed or another task runner becomes available. - - -## Parallelism - -You can start multiple task runner instances in parallel to execute tasks concurrently. Each task runner listens for new tasks and executes them as they become available. This allows for high parallelism and can be used to scale the execution of tasks to handle large workloads. - -To test this, run multiple instances of the task runner script in different terminal windows on your local machine, or use a tool like [call-in-parallel](https://github.com/tilebox/call-in-parallel) to start the task runner script multiple times. - -For example, to start five task runners in parallel, use the following command: - -```bash -> call-in-parallel -n 5 -- python task_runner.py -``` - -## Deploying Task Runners - -Task runners are continuously running processes that can be deployed in different computing environments. The only requirement for deploying task runners is access to the Tilebox Workflows API. Once this is met, task runners can be deployed in many different environments, including: - -- On-premise servers -- Cloud-based virtual machines -- Cloud-based auto-scaling clusters - -## Scaling - -One key benefit of task runners is their **ability to scale even while workflows are executing**. You can start new task runners at any time, and they can immediately pick up queued tasks to execute. It's not necessary to have an entire processing cluster available at the start of a workflow, as task runners can be started and stopped as needed. - -This is particularly beneficial in cloud environments, where task runners can be automatically started and stopped based on current workload, measured by metrics such as CPU usage. Here's an example scenario: - -1. A single instance of a task runner is actively waiting for work in a cloud environment. -2. A large workload is submitted to the workflow orchestrator, prompting the task runner to pick up the first task. -3. The first task creates new sub-tasks for processing, which the task runner also picks up. -4. As the workload increases, the task runner's CPU usage rises, triggering the cloud environment to automatically start new task runner instances. -5. Newly started task runners begin executing queued tasks, distributing the workload among all available task runners. -6. Once the workload decreases, the cloud environment automatically stops some task runners. -7. The first task runner completes the remaining work until everything is done. -8. The first task runner remains idle until new tasks arrive. - -CPU usage-based auto scaling is just one method to scale task runners. Other metrics, such as memory usage or network bandwidth, are also supported by many cloud environments. In a future release, configuration options for scaling task runners based on custom metrics (for example the number of queued tasks) are planned. - -## Distributed Execution - -Task runners can be distributed across different compute environments. For instance, some data stored on-premise may need pre-processing, while further processing occurs in the cloud. A job might involve tasks that filter relevant on-premise data and publish it to the cloud, and other tasks that read data from the cloud and process it. In such scenarios, a task runners can run on-premise and another in a cloud environments, resulting in them effectively collaborating on the same job. - -Another advantage of distributed task runners is executing workflows that require specific hardware for certain tasks. For example, one task might need a GPU, while another requires extensive memory. - -Here's an example of a distributed workflow: - - - ```python Python - from tilebox.workflows import Task, ExecutionContext - - class DistributedWorkflow(Task): - def execute(self, context: ExecutionContext) -> None: - download_task = context.submit_subtask(DownloadData()) - process_task = context.submit_subtask( - ProcessData(), - depends_on=[download_task], - ) - - class DownloadData(Task): - """ - Download a dataset and store it in a shared internal bucket. - Requires a good network connection for high download bandwidth. - """ - def execute(self, context: ExecutionContext) -> None: - pass - - class ProcessData(Task): - """ - Perform compute-intensive processing of a dataset. - The dataset must be available in an internal bucket. - Requires access to a GPU for optimal performance. - """ - def execute(self, context: ExecutionContext) -> None: - pass - ``` -```go Go -package distributed - -import ( - "context" - "fmt" - "github.com/tilebox/tilebox-go/workflows/v1" - "github.com/tilebox/tilebox-go/workflows/v1/subtask" -) - -type DistributedWorkflow struct{} - -func (t *DistributedWorkflow) Execute(ctx context.Context) error { - downloadTask, err := workflows.SubmitSubtask(ctx, &DownloadData{}) - if err != nil { - return fmt.Errorf("failed to submit download subtask: %w", err) - } - - _, err = workflows.SubmitSubtask(ctx, &ProcessData{}, subtask.WithDependencies(downloadTask)) - if err != nil { - return fmt.Errorf("failed to submit process subtask: %w", err) - } - return nil -} - -// DownloadData Download a dataset and store it in a shared internal bucket. -// Requires a good network connection for high download bandwidth. -type DownloadData struct{} - -func (t *DownloadData) Execute(ctx context.Context) error { - return nil -} - -// ProcessData Perform compute-intensive processing of a dataset. -// The dataset must be available in an internal bucket. -// Requires access to a GPU for optimal performance. -type ProcessData struct{} - -func (t *ProcessData) Execute(ctx context.Context) error { - return nil -} -``` - - -To achieve distributed execution for this workflow, no single task runner capable of executing all three of the tasks is set up. -Instead, two task runners, each capable of executing one of the tasks are set up: one in a high-speed network environment and the other with GPU access. -When the distributed workflow runs, the first task runner picks up the `DownloadData` task, while the second picks up the `ProcessData` task. -The `DistributedWorkflow` does not require specific hardware, so it can be registered with both runners and executed by either one. - - - - -```python Python -from tilebox.workflows import Client - -client = Client() -high_network_speed_runner = client.runner( - tasks=[DownloadData, DistributedWorkflow] -) -high_network_speed_runner.run_forever() -``` -```go Go -package main - -import ( - "context" - "log/slog" - - "github.com/tilebox/tilebox-go/workflows/v1" -) - -func main() { - ctx := context.Background() - client := workflows.NewClient() - - highNetworkSpeedRunner, err := client.NewTaskRunner(ctx) - if err != nil { - slog.Error("failed to create task runner", slog.Any("error", err)) - return - } - - err = highNetworkSpeedRunner.RegisterTasks( - &DownloadData{}, - &DistributedWorkflow{}, - ) - if err != nil { - slog.Error("failed to register tasks", slog.Any("error", err)) - return - } - - highNetworkSpeedRunner.RunForever(ctx) -} -``` - - - - - -```python Python -from tilebox.workflows import Client - -client = Client() -gpu_runner = client.runner( - tasks=[ProcessData, DistributedWorkflow] -) -gpu_runner.run_forever() -``` -```go Go -package main - -import ( - "context" - "log/slog" - - "github.com/tilebox/tilebox-go/workflows/v1" -) - -func main() { - ctx := context.Background() - client := workflows.NewClient() - - gpuRunner, err := client.NewTaskRunner(ctx) - if err != nil { - slog.Error("failed to create task runner", slog.Any("error", err)) - return - } - - err = gpuRunner.RegisterTasks( - &ProcessData{}, - &DistributedWorkflow{}, - ) - if err != nil { - slog.Error("failed to register tasks", slog.Any("error", err)) - return - } - - gpuRunner.RunForever(ctx) -} -``` - - - - -Now, both `download_task_runner.py` and `gpu_task_runner.py` are started, in parallel, on different machines with the required hardware for each. When `DistributedWorkflow` is submitted, it executes on one of the two runners, and it's submitted sub-tasks are handled by the appropriate runner. - -In this case, since `ProcessData` depends on `DownloadData`, the GPU task runner remains idle until the download completion, then picks up the processing task. - - - You can also differentiate between task runners by specifying different [clusters](/workflows/concepts/clusters) and choosing specific clusters for sub-task submissions. For more details, see the [Clusters](/workflows/concepts/clusters) section. - - -## Task Failures - -If an unhandled exception occurs during task execution, the task runner captures it and reports it back to the workflow orchestrator. The orchestrator then marks the task as failed, leading to [job cancellation](/workflows/concepts/jobs#cancellation) to prevent further tasks of the same job-that may not be relevant anymore-from being executed. - -A task failure does not result in losing all previous work done by the job. If the failure is fixable—by fixing a bug in a task implementation, ensuring the task has necessary resources, or simply retrying it due to a flaky network connection—it may be worth [retrying](/workflows/concepts/jobs#retries) the job. - -When retrying a job, all failed tasks are added back to the queue, allowing a task runner to potentially execute them. If execution then succeeds, the job continues smoothly. Otherwise, the task will remain marked as failed and can be retried again if desired. - -If fixing a failure requires modifying the task implementation, it's important to deploy the updated version to the [task runners](/workflows/concepts/task-runners) before retrying the job. Otherwise, a task runner could pick up the original, faulty implementation again, leading to another failure. - -## Task idempotency - -Since a task may be retried, it's possible that a task is executed more than once. Depending on where in the execution of the task it failed, it may have already performed some side effects, such as writing to a database, or sending a message to a queue. Because of that it's crucial to ensure that tasks are [idempotent](https://en.wikipedia.org/wiki/Idempotence). Idempotent tasks can be executed multiple times without altering the outcome beyond the first successful execution. - -A special case of idempotency involves submitting sub-tasks. After a task calls `context.submit_subtask` and then fails and is retried, those submitted sub-tasks of an earlier failed execution are automatically removed, ensuring that they can be safely submitted again when the task is retried. - -## Runner Crashes - -Tilebox Workflows has an internal mechanism to handle unexpected task runner crashes. When a task runner picks up a task, it periodically sends a heartbeat to the workflow orchestrator. If the orchestrator does not receive this heartbeat for a defined duration, it marks the task as failed and automatically attempts to [retry](/workflows/concepts/jobs#retries) it up to 10 times. This allows another task runner to pick up the task and continue executing the job. - -This mechanism ensures that scenarios such as power outages, hardware failures, or dropped network connections are handled effectively, preventing any task from remaining in a running state indefinitely. - -## Observability - -Task runners are continuously running processes, making it essential to monitor their health and performance. Tilebox Workflows collects logs and traces from task runners automatically. To learn how to inspect and customize workflow observability, see [Observability](/workflows/observability/introduction). diff --git a/workflows/concepts/tasks.mdx b/workflows/concepts/tasks.mdx index 7c4eb6c..8fbc531 100644 --- a/workflows/concepts/tasks.mdx +++ b/workflows/concepts/tasks.mdx @@ -1,12 +1,13 @@ --- title: Understanding and Creating Tasks sidebarTitle: Tasks +description: Define workflow tasks, inputs, subtasks, dependencies, retries, and stable task identifiers. icon: laptop-code --- - - A Task is the smallest unit of work, designed to perform a specific operation. Each task represents a distinct operation or process that can be executed, such as processing data, performing calculations, or managing resources. Tasks can operate independently or as components of a more complex set of connected tasks known as a Workflow. Tasks are defined by their code, inputs, and dependencies on other tasks. To create tasks, you need to define the input parameters and specify the action to be performed during execution. - +A task is the unit of work Tilebox [runners](/workflows/concepts/runners) execute. A task class defines the code to run, the input fields that are serialized with each task submission, and optional relationships to other tasks through subtasks and dependencies. + +Tasks can run as the root task of a [job](/workflows/concepts/jobs) or as subtasks submitted by another task. This lets a workflow build a dynamic task graph while Tilebox schedules eligible tasks across runners in the selected [cluster](/workflows/concepts/clusters). ## Creating a Task @@ -39,7 +40,7 @@ For python, the key components of this task are: `MyFirstTask` is a subclass of the `Task` class, which serves as the base class for all defined tasks. It provides the essential structure for a task. Inheriting from `Task` automatically makes the class a `dataclass`, which is useful [for specifying inputs](#input-parameters). Additionally, by inheriting from `Task`, the task is automatically assigned an [identifier based on the class name](#task-identifiers). - The `execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [task runner](/workflows/concepts/task-runners) when the task runs and performs the task's operation. + The `execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [runner](/workflows/concepts/runners) when the task runs and performs the task's operation. The `context` argument is an `ExecutionContext` instance that provides access to an [API for submitting new tasks](/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask) as part of the same job, [task logging](/api-reference/python/tilebox.workflows/ExecutionContext.logger), [custom tracing](/api-reference/python/tilebox.workflows/ExecutionContext.tracer), and features like [shared caching](/api-reference/python/tilebox.workflows/ExecutionContext.job_cache). @@ -53,13 +54,13 @@ For Go, the key components are: `MyFirstTask` is a struct that implements the `Task` interface. It represents the task to be executed. - The `Execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [task runner](/workflows/concepts/task-runners) when the task runs and performs the task's operation. + The `Execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [runner](/workflows/concepts/runners) when the task runs and performs the task's operation. The code samples on this page do not illustrate how to execute the task. That will be covered in the - [next section on task runners](/workflows/concepts/task-runners). The reason for that is that executing tasks is a separate concern from implementing tasks. + [next section on runners](/workflows/concepts/runners). The reason for that is that executing tasks is a separate concern from implementing tasks. ## Input Parameters @@ -67,7 +68,7 @@ For Go, the key components are: Tasks often require input parameters to operate. These inputs can range from simple values to complex data structures. By inheriting from the `Task` class, the task is treated as a Python `dataclass`, allowing input parameters to be defined as class attributes. - Tasks must be **serializable to JSON or to protobuf** because they may be distributed across a cluster of [task runners](/workflows/concepts/task-runners). + Tasks must be **serializable to JSON or to protobuf** because they may be distributed across a cluster of [runners](/workflows/concepts/runners). @@ -131,7 +132,7 @@ class ChildTask(Task): def execute(self, context: ExecutionContext) -> None: context.logger.info("Executing child task", index=self.index) -# after submitting this task, a task runner may pick it up and execute it +# after submitting this task, a runner may pick it up and execute it # which will result in 5 ChildTasks being submitted and executed as well task = ParentTask(5) ``` @@ -161,7 +162,7 @@ func (t *ChildTask) Execute(context.Context) error { return nil } -// after submitting this task, a task runner may pick it up and execute it +// after submitting this task, a runner may pick it up and execute it // which will result in 5 ChildTasks being submitted and executed as well task := &ParentTask{numSubtasks: 5} ``` @@ -172,7 +173,7 @@ In this example, a `ParentTask` submits `ChildTask` tasks as subtasks. The numbe Parent task do not have access to results of subtasks, instead, tasks can use [shared caching](/workflows/caches#storing-and-retrieving-data) to share data between tasks. - By submitting a task as a subtask, its execution is scheduled as part of the same job as the parent task. Compared to just directly invoking the subtask's `execute` method, this allows the subtask's execution to occur on a different machine or in parallel with other subtasks. To learn more about how tasks are executed, see the section on [task runners](/workflows/concepts/task-runners). + By submitting a task as a subtask, its execution is scheduled as part of the same job as the parent task. Compared to just directly invoking the subtask's `execute` method, this allows the subtask's execution to occur on a different machine or in parallel with other subtasks. To learn more about how tasks are executed, see the section on [runners](/workflows/concepts/runners). ### Larger subtasks example @@ -311,7 +312,7 @@ job = jobs.submit( DownloadRandomDogImages(5), ) -# now our deployed task runners will pick up the task and execute it +# now our deployed runners will pick up the task and execute it jobs.display(job) ``` @@ -331,7 +332,7 @@ if err != nil { return } -// now our deployed task runners will pick up the task and execute it +// now our deployed runners will pick up the task and execute it ``` @@ -348,7 +349,7 @@ if err != nil { /> -In total, six tasks are executed: the `DownloadRandomDogImages` task and five `DownloadImage` tasks. The `DownloadImage` tasks can execute in parallel, as they are independent. If more than one task runner is available, the Tilebox Workflow Orchestrator **automatically parallelizes** the execution of these tasks. +In total, six tasks are executed: the `DownloadRandomDogImages` task and five `DownloadImage` tasks. The `DownloadImage` tasks can execute in parallel, as they are independent. If more than one runner is available, the Tilebox Workflow Orchestrator **automatically parallelizes** the execution of these tasks. Check out [job_client.display](/workflows/concepts/jobs#visualization) to learn how this visualization was automatically generated from the task executions. @@ -358,7 +359,7 @@ In total, six tasks are executed: the `DownloadRandomDogImages` task and five `D Every task goes through a set of states during its lifetime. -- When submitted, either as a job or as a subtask, it starts in the `QUEUED` state and transitions to `RUNNING` when a task runner picks it up. +- When submitted, either as a job or as a subtask, it starts in the `QUEUED` state and transitions to `RUNNING` when a runner picks it up. - If the task executes successfully, it transitions to `COMPUTED`. - If the task fails, it transitions to `FAILED`, unless it's an [optional task](#optional-tasks), or nested within an [optional task](#nested-optional-tasks), in which case it transitions to `FAILED_OPTIONAL`. - As soon as all subtasks of a task are `COMPUTED` (or `FAILED_OPTIONAL`), the task is considered `COMPLETED`, allowing dependent tasks to be executed. @@ -367,8 +368,8 @@ The table below summarizes the different task states and their meanings. | Task State | Description | |------------|-------------| -| **Queued** | The task is queued and waiting for execution. Any [eligible](/workflows/concepts/task-runners#task-selection) task runner can pick it up and execute it, as soon as it's parent task is `COMPUTED` and all it's dependencies are `COMPLETED`. | -| **Running** | The task is currently being executed by a task runner. | +| **Queued** | The task is queued and waiting for execution. Any [eligible](/workflows/concepts/runners#task-selection) runner can pick it up and execute it, as soon as it's parent task is `COMPUTED` and all it's dependencies are `COMPLETED`. | +| **Running** | The task is currently being executed by a runner. | | **Computed** | The task has successfully been computed, but still has outstanding subtasks. | | **Completed** | The task has successfully been computed, and all it's subtasks are also computed, making it `COMPLETED`. This is the final state of a task. Only once a task has been `COMPLETED`, dependent tasks can be executed. | | **Failed** | The task has been executed but encountered an error. | @@ -429,7 +430,7 @@ class Sum(Task): # The reduce step ``` -Submitting a job of the `SumOfSquares` task and running it with a task runner can be done as follows: +Submitting a job of the `SumOfSquares` task and running it with a runner can be done as follows: ```python Python @@ -978,7 +979,7 @@ This workflow consists of four tasks: | PrintHeadlines | FetchNews | A task that logs the headlines of the news articles. | | MostFrequentAuthors | FetchNews | A task that counts the number of articles each author has written and logs the result. | -An important aspect is that there is no dependency between the `PrintHeadlines` and `MostFrequentAuthors` tasks. This means they can execute in parallel, which the Tilebox Workflow Orchestrator will do, provided multiple task runners are available. +An important aspect is that there is no dependency between the `PrintHeadlines` and `MostFrequentAuthors` tasks. This means they can execute in parallel, which the Tilebox Workflow Orchestrator will do, provided multiple runners are available. In this example, the results from `FetchNews` are stored in a file. This is not the recommended method for passing data between tasks. When executing on a distributed cluster, the existence of a file written by a dependent task cannot be guaranteed. Instead, it's better to use a [shared cache](/workflows/caches). @@ -1239,7 +1240,7 @@ If instead `Step1B` was also marked as optional, `Step1C` and `Step2` would stil ## Task Identifiers -A task identifier is a unique string used by the Tilebox Workflow Orchestrator to identify the task. It's used by [task runners](/workflows/concepts/task-runners) to map submitted tasks to a task class and execute them. It also serves as the default name in execution visualizations. +A task identifier is a unique string used by the Tilebox Workflow Orchestrator to identify the task. It's used by [runners](/workflows/concepts/runners) to map submitted tasks to a task class and execute them. It also serves as the default name in execution visualizations. If unspecified, the identifier of a task defaults to the class name. For instance, the identifier of the `PrintHeadlines` task in the previous example is `"PrintHeadlines"`. This is good for prototyping, but not recommended for production, as changing the class name also changes the identifier, which can lead to issues during refactoring. It also prevents different tasks from sharing the same class name. @@ -1323,17 +1324,17 @@ func (t *MyTask) Execute(context.Context) error { ``` -When a task is submitted as part of a job, the version from which it's submitted is recorded and may differ from the version on the task runner executing the task. +When a task is submitted as part of a job, the version from which it's submitted is recorded and may differ from the version on the runner executing the task. -When task runners execute a task, they require a registered task with a matching identifier and compatible version number. A compatible version is where the major version number on the task runner matches that of the submitted task, and the minor version number on the task runner is equal to or greater than that of the submitted task. +When runners execute a task, they require a registered task with a matching identifier and compatible version number. A compatible version is where the major version number on the runner matches that of the submitted task, and the minor version number on the runner is equal to or greater than that of the submitted task. Examples of compatible version numbers include: - `MyTask` is submitted as part of a job. The version is `"v1.3"`. -- A task runner with version `"v1.3"` of `MyTask` would executes this task. -- A task runner with version `"v1.5"` of `MyTask` would also executes this task. -- A task runner with version `"v1.2"` of `MyTask` would not execute this task, as its minor version is lower than that of the submitted task. -- A task runner with version `"v2.5"` of `MyTask` would not execute this task, as its major version differs from that of the submitted task. +- A runner with version `"v1.3"` of `MyTask` would execute this task. +- A runner with version `"v1.5"` of `MyTask` would also execute this task. +- A runner with version `"v1.2"` of `MyTask` would not execute this task, as its minor version is lower than that of the submitted task. +- A runner with version `"v2.5"` of `MyTask` would not execute this task, as its major version differs from that of the submitted task. ## Conclusion diff --git a/workflows/concepts/workflow-releases.mdx b/workflows/concepts/workflow-releases.mdx new file mode 100644 index 0000000..a9976b3 --- /dev/null +++ b/workflows/concepts/workflow-releases.mdx @@ -0,0 +1,75 @@ +--- +title: Workflow Releases +description: Understand workflow releases, release artifacts, and how release runners execute deployed Python workflow projects. +icon: box-open +--- + +A workflow is a set of interrelated tasks. You can run those tasks directly without registering the workflow with Tilebox. Registering a workflow with the Tilebox API gives it a stable slug, which lets you publish immutable release artifacts to it and deploy a release to one or more clusters. + +That release path enables [release runners](/workflows/concepts/runners#release-runners). Release runners operate on a cluster, pick up all the releases deployed to that cluster, and execute tasks. This provides an easy way of deploying workflows to a compute cluster, including a quick and agent-accessible iteration loop: change code, publish a release, deploy it, run a job, and inspect the result. + +## Workflows and releases + +A workflow is the long-lived object referred to by slug. A release is one concrete version of that workflow. The release is immutable, so a later code change creates a new release instead of modifying the old one. You can deploy the same release to one or multiple clusters. Release runners on those clusters then pick up that release and run tasks registered by it. + + + Workflow slug with multiple releases, release artifacts, and cluster deployments + Workflow slug with multiple releases, release artifacts, and cluster deployments + + +Use this model when you want reproducible workflow execution. You can inspect which release is deployed to a cluster, promote a known release to another cluster, or retry a failed job after deploying a compatible fix. + +## Release artifacts + +The release artifact is built from the files selected by `tilebox.workflow.toml`. The build command resolves include patterns, applies exclude patterns and `.gitignore` when enabled, creates a deterministic `.tar.zst` archive, and validates the runtime by discovering registered tasks. + +The artifact should contain code and small configuration. Keep downloaded data, model checkpoints, generated caches, and local virtual environments out of the release. If a workflow needs large runtime assets, fetch them lazily from the task code into a runner-local cache. + +## Task registrations + +Task registrations are discovered from the configured Python runner object or command during release validation. The discovered task identifiers are stored in the release content and later advertised by release runners. + +For a reusable Python workflow project, define a `Runner` object: + +```python Python +# my_workflow/runner.py +from tilebox.workflows import ExecutionContext, Runner, Task + + +class FirstTask(Task): + def execute(self, context: ExecutionContext) -> None: + ... + + +class SecondTask(Task): + def execute(self, context: ExecutionContext) -> None: + ... + + +runner = Runner(tasks=[FirstTask, SecondTask]) +``` + +Then point `tilebox.workflow.toml` at that object: + +```toml +[workflow] +slug = "my-workflow" +root = "." +runner = "my_workflow.runner:runner" +``` + +## Cluster deployments + +A cluster deployment maps a workflow release to a cluster. A release runner can run multiple deployed releases for the same cluster and updates its task registrations when cluster deployments change. + +Deploying, updating, or removing a release deployment changes what the release runner can execute. It does not require rebuilding the runner process itself. + +## Fixing failed jobs + +If a job fails because of a bug in task code, publish a compatible fixed release and deploy it to the same cluster before retrying the job. Keep the task identifier name, major version, and input schema compatible when you want the existing failed job to resume from failed tasks. + +```bash +tilebox workflow publish-release --json +tilebox workflow deploy-release --latest --cluster dev-cluster --json +tilebox job retry --json +``` diff --git a/workflows/introduction.mdx b/workflows/introduction.mdx index 96b1575..46b1e30 100644 --- a/workflows/introduction.mdx +++ b/workflows/introduction.mdx @@ -1,31 +1,42 @@ --- title: Tilebox Workflows sidebarTitle: Introduction -description: The Tilebox workflow orchestrator is a parallel processing engine. It simplifies the creation of dynamic tasks that can be executed across various computing environments, including on-premise and auto-scaling clusters in public clouds. +description: Run space data workflows across local, on-premises, and cloud environments with agent-friendly iteration loops. icon: network-wired mode: wide --- -This section provides guides showcasing how to use the Tilebox workflow orchestrator effectively. Here are some of the key learning areas: +Tilebox Workflows is a parallel processing engine for space data operations. It helps you turn processing steps such as fetching scenes, validating products, generating previews, running models, and publishing outputs into [tasks](/workflows/concepts/tasks) that can run across local machines, on-premises systems, and cloud environments. + + + Tilebox Workflows architecture showing workflow code submitted as a job, Tilebox tracking queued tasks and job state, and a runner executing matching tasks + Tilebox Workflows architecture showing workflow code submitted as a job, Tilebox tracking queued tasks and job state, and a runner executing matching tasks + + +You submit a [job](/workflows/concepts/jobs) from workflow code, and Tilebox tracks the resulting task graph while [runners](/workflows/concepts/runners) execute eligible work on the selected [cluster](/workflows/concepts/clusters). During development, you can run tasks directly from your own script or service. For reproducible deployment, you can publish [workflow releases](/workflows/concepts/workflow-releases), deploy them to clusters, and let release runners execute the deployed code. + +This model also gives AI coding agents a practical iteration loop: edit workflow code, publish a release, deploy it to a development cluster, submit a test job, and inspect logs, traces, and job state before making the next change. + +## Get Started with Tilebox Workflows - - Create tasks using the Tilebox Workflow Orchestrator. + + Define task classes, inputs, subtasks, dependencies, retries, and identifiers. - - Learn how to submit jobs to the workflow orchestrator, which schedules tasks for execution. + + Submit a root task to a cluster and let runners execute the resulting task graph. - - Learn how to set up task runners to execute tasks in a distributed manner. + + Learn how to set up a runner in order to execute tasks. - - Understand how to gain insights into task executions using observability features like tracing and logging. + + Package workflow projects into immutable releases. - - Learn to configure shared data access for all tasks of a job using caches. + + Map releases to clusters and run them with `tilebox runner start`. - - Trigger jobs based on events or schedules, such as new data availability or CRON schedules. + + Use logs, traces, and job state to debug workflows across runners. @@ -40,11 +51,14 @@ Before exploring Tilebox Workflows in depth, familiarize yourself with some comm A job is a specific execution of a workflow with designated input parameters. It consists of one or more tasks that can run in parallel or sequentially, based on their dependencies. Submitting a job involves creating a root task with specific input parameters, which may trigger the execution of other tasks within the same job. - - Task runners are the execution agents within the Tilebox Workflows ecosystem that execute tasks. They can be deployed in different computing environments, including on-premise servers and cloud-based auto-scaling clusters. Task runners execute tasks as scheduled by the workflow orchestrator, ensuring they have the necessary resources and environment for effective execution. + + Runners are processes that execute workflow tasks for a cluster. Direct runners register task classes from a standalone script, service, or binary that connects to the Tilebox API through the SDK. Release runners are started by the Tilebox CLI and load task registrations from Python workflow releases deployed to their cluster. + + + A workflow release is an immutable package of a workflow project. It includes source files, the command or runner object used to start the workflow runtime, and the task identifiers discovered during release validation. - Clusters are a logical grouping for task runners. Using clusters, you can scope certain tasks to a specific group of task runners. Tasks, which are always submitted to a specific cluster, are only executed on task runners assigned to the same cluster. + Clusters group runners and receive workflow release deployments. Jobs are submitted to a cluster, and only runners assigned to that cluster can claim the job's tasks. A release runner can run multiple releases that are deployed to its cluster. Caches are shared storage that enable data storage and retrieval across tasks within a single job. They store intermediate results and share data among tasks, enabling distributed computing and reducing redundant data processing. diff --git a/workflows/near-real-time/cron.mdx b/workflows/near-real-time/cron.mdx index cc5c8fc..a827c67 100644 --- a/workflows/near-real-time/cron.mdx +++ b/workflows/near-real-time/cron.mdx @@ -57,10 +57,10 @@ cron_automation = automations.create_cron_automation( A helpful tool to test your cron expressions is [crontab.guru](https://crontab.guru/). -## Starting a Cron Task Runner +## Starting a Cron runner -With the Cron automation registered, a job is submitted whenever the Cron expression matches. But unless a [task runner](/workflows/concepts/task-runners) is available to execute the Cron task the submitted jobs remain in a task queue. -Once an [eligible task runner](/workflows/concepts/task-runners#task-selection) becomes available, all jobs in the queue are executed. +With the Cron automation registered, a job is submitted whenever the Cron expression matches. But unless a [runner](/workflows/concepts/runners) is available to execute the Cron task the submitted jobs remain in a task queue. +Once an [eligible runner](/workflows/concepts/runners#task-selection) becomes available, all jobs in the queue are executed. ```python Python from tilebox.workflows import Client @@ -70,7 +70,7 @@ runner = client.runner(tasks=[MyCronTask]) runner.run_all() ``` -If this task runner runs continuously, its logs may resemble the following: +If this runner runs continuously, its logs may resemble the following: ```plaintext Logs Cron task triggered message=World trigger_time=2023-09-25 16:12:00 diff --git a/workflows/near-real-time/storage-events.mdx b/workflows/near-real-time/storage-events.mdx index dbc182f..2cf2be2 100644 --- a/workflows/near-real-time/storage-events.mdx +++ b/workflows/near-real-time/storage-events.mdx @@ -98,7 +98,7 @@ local_object = local_folder.read("my-object.txt") The `read` method instantiates a client for the specific storage location. This requires that - the storage location is accessible by a task runner and may require credentials for cloud storage + the storage location is accessible by a runner and may require credentials for cloud storage or physical/network access to a locally mounted file system. @@ -142,10 +142,10 @@ Here are some examples of valid glob patterns: | `folder/**` | Any file directly or recursively part of a `folder` subdirectory | | `[a-z].txt`| Matches `a.txt`, `b.txt`, etc. | -## Start a Storage Event Task Runner +## Start a Storage Event runner -With the Storage Event automation registered, a job is submitted whenever a storage event occurs. But unless a [task runner](/workflows/concepts/task-runners) is available to execute the Storage Event task the submitted jobs remain in a task queue. -Once an [eligible task runner](/workflows/concepts/task-runners#task-selection) becomes available, all jobs in the queue are executed. +With the Storage Event automation registered, a job is submitted whenever a storage event occurs. But unless a [runner](/workflows/concepts/runners) is available to execute the Storage Event task the submitted jobs remain in a task queue. +Once an [eligible runner](/workflows/concepts/runners#task-selection) becomes available, all jobs in the queue are executed. ```python Python from tilebox.workflows import Client @@ -164,7 +164,7 @@ echo "Hello World" > my-object.txt gcloud storage cp my-object.txt gs://gcs-bucket-fab3fa2 ``` -Inspecting the task runner output reveals that the job was submitted and the task executed: +Inspecting the runner output reveals that the job was submitted and the task executed: ```plaintext Output 2024-09-25 16:51:45,621 INFO A new object was created: my-object.txt diff --git a/workflows/observability/integrations/axiom.mdx b/workflows/observability/integrations/axiom.mdx index eea1e70..5651f0c 100644 --- a/workflows/observability/integrations/axiom.mdx +++ b/workflows/observability/integrations/axiom.mdx @@ -86,7 +86,7 @@ func main() { client := workflows.NewClient() runner, err := client.NewTaskRunner(ctx) if err != nil { - slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err)) + slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err)) return } diff --git a/workflows/observability/integrations/open-telemetry.mdx b/workflows/observability/integrations/open-telemetry.mdx index 4fc3ab5..323716e 100644 --- a/workflows/observability/integrations/open-telemetry.mdx +++ b/workflows/observability/integrations/open-telemetry.mdx @@ -84,7 +84,7 @@ func main() { client := workflows.NewClient() runner, err := client.NewTaskRunner(ctx) if err != nil { - slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err)) + slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err)) return } diff --git a/workflows/observability/introduction.mdx b/workflows/observability/introduction.mdx index a863b24..fcee3ea 100644 --- a/workflows/observability/introduction.mdx +++ b/workflows/observability/introduction.mdx @@ -5,7 +5,7 @@ description: Inspect workflow logs, traces, task status, and runner behavior. icon: lightbulb --- -Tilebox Workflows gives each job a live observability record. As task runners execute work, Tilebox captures logs, traces, task status, and runner context. You can follow a job from the root task through its subtasks, inspect failures, and compare slow steps across distributed runners. +Tilebox Workflows gives each job a live observability record. As runners execute work, Tilebox captures logs, traces, task status, and runner context. You can follow a job from the root task through its subtasks, inspect failures, and compare slow steps across distributed runners. Job Execution Trace View @@ -45,7 +45,7 @@ Use the built-in view for day-to-day debugging and operations. Add structured lo A submitted job starts a trace. Each task run creates a span, and custom spans sit under the task that creates them. Log records emitted from task code attach to active spans, which connects messages to timing data. -Tilebox adds job, task, runner, and service metadata to telemetry records. This data helps you filter by job, inspect a single task run, or compare work across task runners. +Tilebox adds job, task, runner, and service metadata to telemetry records. This data helps you filter by job, inspect a single task run, or compare work across runners. ## Observability example diff --git a/workflows/observability/logging.mdx b/workflows/observability/logging.mdx index afac772..ff96818 100644 --- a/workflows/observability/logging.mdx +++ b/workflows/observability/logging.mdx @@ -4,7 +4,7 @@ description: Emit structured task logs and tune how workflow clients export log icon: rectangle-terminal --- -Tilebox collects workflow logs automatically. When a task runner is created from a `Client`, logs emitted through the task execution context are exported to Tilebox and correlated with the active job, task, and trace. +Tilebox collects workflow logs automatically. When a runner is created from a `Client`, logs emitted through the task execution context are exported to Tilebox and correlated with the active job, task, and trace. Job Logs View @@ -104,7 +104,7 @@ func main() { client := workflows.NewClient() runner, err := client.NewTaskRunner(ctx) if err != nil { - slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err)) + slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err)) return } @@ -152,7 +152,7 @@ func main() { ``` -The Python `level` argument applies to logs emitted with `context.logger`. The optional `runner_level` argument applies to internal task runner logs. If `runner_level` is omitted, it uses the same value as `level`. In Go, `workflows.ConfigureConsoleLogging()` sets the local console log level, and `workflows.NewClient()` configures Tilebox workflow log export. +The Python `level` argument applies to logs emitted with `context.logger`. The optional `runner_level` argument applies to internal runner logs. If `runner_level` is omitted, it uses the same value as `level`. In Go, `workflows.ConfigureConsoleLogging()` sets the local console log level, and `workflows.NewClient()` configures Tilebox workflow log export. ## Query logs diff --git a/workflows/observability/tracing.mdx b/workflows/observability/tracing.mdx index 7429ae4..3b07e08 100644 --- a/workflows/observability/tracing.mdx +++ b/workflows/observability/tracing.mdx @@ -4,7 +4,7 @@ description: Use built-in workflow traces and custom spans to inspect job execut icon: chart-gantt --- -Tilebox traces workflow jobs automatically. Job submission creates a root trace, task runners continue that trace across machines, and every task execution creates a span. +Tilebox traces workflow jobs automatically. Job submission creates a root trace, runners continue that trace across machines, and every task execution creates a span. Job Execution Trace View