

Completed
@@ -264,7 +265,7 @@ The following diagram represents the job execution as a graph. Each task is show

-Below is another visualization of a job currently being executed by multiple task runners.
+Below is another visualization of a job currently being executed by multiple runners.

@@ -274,7 +275,7 @@ Below is another visualization of a job currently being executed by multiple tas
From the diagram, the following can be inferred:
- The root task, `MyTask`, has been executed, is marked as `COMPUTED` and submitted three sub-tasks.
-- At least three task runners are available, as three tasks currently are executed simultaneously.
+- At least three runners are available, as three tasks currently are executed simultaneously.
- The `SubTask` that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed.
- The queued `DependentTask` requires the `LeafTask` to complete before it can be executed.
@@ -355,7 +356,7 @@ job, err := client.Jobs.Submit(ctx, "custom-display-names",
## Cancellation
-You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by task runners and executed even if task runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by task runners.
+You can cancel a job at any time. When a job is canceled, no queued tasks will be picked up by runners and executed even if runners are idle. Tasks that are already being executed will finish their execution and not be interrupted. All sub-tasks spawned from such tasks after the cancellation will not be picked up by runners.
Use the `cancel` method on the job client to cancel a job.
@@ -602,7 +603,7 @@ func (t *PrintMovieStats) Execute(ctx context.Context) error {
```
-With this fix, and after redeploying the task runners with the updated `PrintMovieStats` implementation, you can retry the job:
+With this fix, and after redeploying the runners with the updated `PrintMovieStats` implementation, you can retry the job:
```python Python
diff --git a/workflows/concepts/runners.mdx b/workflows/concepts/runners.mdx
new file mode 100644
index 0000000..f91719f
--- /dev/null
+++ b/workflows/concepts/runners.mdx
@@ -0,0 +1,357 @@
+---
+title: Runners
+sidebarTitle: Runners
+description: Learn how Tilebox tasks are executed, selected for execution, versioned, and compare release runners and direct runners.
+icon: list-check
+---
+
+Runners are continuously running processes that listen for new tasks to execute. They claim queued tasks, execute them, and report task results back to Tilebox. You can start multiple runners in parallel to execute tasks concurrently or to provide different hardware and network access.
+
+
+
+
+
+
+## Runner modes
+
+Tilebox supports two runner modes. A **release runner** is started with the Tilebox CLI, loads [workflow releases](/workflows/concepts/workflow-releases) deployed to its cluster, and reacts to updated cluster deployments while it runs. A **direct runner** is a standalone script, service, or binary that uses the Tilebox SDK to connect to the API and register tasks directly.
+Release runners still run in an environment you control, but the workflow code they execute is selected through cluster deployments. This separates compute operations from workflow release rollout. Direct runners are scaled and rolled out by your own infrastructure.
+
+
+
+
+
+
+The two modes differ in how the runner gets its task registrations and how you roll out code changes.
+
+| | Release runner | Direct runner |
+| --- | --- | --- |
+| Executable tasks | Loaded from workflow releases deployed to the runner's cluster | Registered directly in your script, service, or binary |
+| Runtime | [Tilebox CLI](/agentic-development/tilebox-cli.mdx) invokes the Python workflow project runtime from the release artifact | Your python script or Go binary, implemented with the Tilebox SDK |
+| Start command | `tilebox runner start --cluster ` | `python runner.py`, `./my-runner-binary`, or your own deployment |
+| rollout model | You publish releases and deploy them to clusters, the runner automatically picks up deployment changes | You deploy, restart, scale, and roll back the runner process yourself |
+| Best for | Reproducible releases, fast cluster deployments, and AI-assisted workflow iteration | Custom deployments, Go runners, and direct SDK control |
+
+### Release runners
+
+A release runner runs Python workflow releases deployed to a cluster. Start it with the Tilebox CLI:
+
+```bash
+tilebox runner start --cluster dev-cluster
+```
+
+The release runner can run releases from multiple workflows at the same time, however only one release per workflow. It continously polls the selected cluster for deployment updates, downloads missing release artifacts, validates and starts python processes for each workflow release, and requests work for all the task identifiers from it's deployed releases. When a new release is deployed or removed, the runner updates the task set it can execute.
+
+
+ Release runners currently only support Python workflow projects. The Tilebox CLI invokes the Python runner environment from the published release artifact using `uv`.
+
+
+### Direct runners
+
+A direct runner connects to the Tilebox API from your own code. It is useful when you want full control over the process, deployment environment, dependencies, startup behavior, and scaling. You are responsible for deploying the script or binary, keeping it running, rolling out code changes, and rolling back when needed.
+
+Define a `Runner` instance once and connect it to a `Client` during startup.
+
+
+```python Python
+from tilebox.workflows import Client, Runner
+from my_workflow.tasks import MyTask, OtherTask
+
+runner = Runner(tasks=[MyTask, OtherTask])
+
+if __name__ == "__main__":
+ client = Client()
+ runner.connect_to(client, cluster="dev-cluster").run_forever()
+```
+```go Go
+package main
+
+import (
+ "context"
+ "log/slog"
+
+ "github.com/tilebox/tilebox-go/workflows/v1"
+ "github.com/tilebox/tilebox-go/workflows/v1/runner"
+ "github.com/my_org/myworkflow"
+)
+
+func main() {
+ ctx := context.Background()
+ client := workflows.NewClient()
+
+ workflowRunner, err := client.NewTaskRunner(ctx, runner.WithClusterSlug("dev-cluster"))
+ if err != nil {
+ slog.Error("failed to create runner", slog.Any("error", err))
+ return
+ }
+
+ if err := workflowRunner.RegisterTasks(&myworkflow.MyTask{}, &myworkflow.OtherTask{}); err != nil {
+ slog.Error("failed to register tasks", slog.Any("error", err))
+ return
+ }
+
+ workflowRunner.RunForever(ctx)
+}
+```
+
+
+## Task selection
+
+For a runner to pick up a submitted task, all of these conditions must match:
+
+1. The task was submitted to the same [cluster](/workflows/concepts/clusters) as the runner.
+2. The runner advertises a [task identifier with the same name](/workflows/concepts/tasks#task-identifiers) and a [compatible version](/workflows/concepts/tasks#semantic-versioning).
+3. The task must be in `QUEUED` [state](/workflows/concepts/tasks#task-states), its [dependencies](/workflows/concepts/tasks#dependencies) are met and it's [maximum retries](/workflows/concepts/tasks#retry-handling) aren't exhausted.
+
+Release runners advertise the task identifiers from workflow releases currently deployed to the cluster. Direct runners advertise the task identifiers they register in the running process.
+
+
+ If multiple tasks match those conditions, Tilebox picks one and assigns it to a runner. The remaining tasks stay queued until another matching runner is available. Parallel runner processes can speed up the job execution in such cases.
+
+
+## Parallelism
+
+Start multiple runner processes to execute tasks in parallel. Each runner process claims and executes tasks independently. You can run several release runners, several direct runners, or a mix of both in the same cluster. This allows for high parallelism and can be used to scale the execution of tasks to handle large workloads.
+
+To test this, run multiple instances of the runner script in different terminal windows on your local machine, or use the [CLI](/agentic-development/tilebox-cli) built-in `parallel` subcommand to start multiple runners in parallel.
+
+```bash
+# start multiple release runners in parallel
+> tilebox parallel -n 5 -- tilebox runner start --cluster
+
+# or direct runner mode
+> tilebox parallel -n 5 -- python your_direct_runner.py
+```
+
+## Scaling
+
+One key benefit of this runner architecture is the **ability to scale even while workflows are executing**. You can start new runners at any time, and they can immediately pick up queued tasks to execute. It's not necessary to have an entire processing cluster available at the start of a workflow, as additional runners can be started and stopped as needed.
+
+This is particularly beneficial in cloud environments, where runners can be automatically started and stopped based on current workload, measured by metrics such as CPU usage. Here's an example scenario:
+
+1. A single runner process is actively waiting for work in a cloud environment.
+2. A large workload is submitted to the workflow orchestrator, resulting in the runner picking up the first task.
+3. The first task creates new sub-tasks for processing, which the runner also picks up.
+4. As the workload increases, the runner's CPU usage rises, triggering the cloud environment to automatically start up new runner instances.
+5. Newly started runners begin executing queued tasks, distributing the workload among all available runners.
+6. Once the workload decreases, the cloud environment automatically stops some runners.
+7. The remaining work continues while runner instances are scaled back down, until everything is done.
+8. Only a single runner remains idle until new tasks arrive.
+
+CPU usage-based auto scaling is just one method to scale runners. Other metrics, such as memory usage or network bandwidth, are also supported by many cloud environments.
+
+In a future release, configuration options for scaling runners based on custom metrics (for example the number of queued tasks) are planned.
+
+## Distributed Execution
+
+Runners can be distributed across different compute environments. For instance, some data stored on-premise may need pre-processing, while further processing occurs in the cloud. A job might involve tasks that filter relevant on-premise data and publish it to the cloud, and other tasks that read data from the cloud and process it. In such scenarios, one runner can run on-premise and another in a cloud environment, resulting in them effectively collaborating on the same job.
+
+Another advantage of distributed runners is executing workflows that require specific hardware for certain tasks. For example, one task might need a GPU, while another requires extensive memory.
+
+Here's an example of a distributed workflow:
+
+
+ ```python Python
+ from tilebox.workflows import Task, ExecutionContext
+
+ class DistributedWorkflow(Task):
+ def execute(self, context: ExecutionContext) -> None:
+ download_task = context.submit_subtask(DownloadData())
+ process_task = context.submit_subtask(
+ ProcessData(),
+ depends_on=[download_task],
+ )
+
+ class DownloadData(Task):
+ """
+ Download a dataset and store it in a shared internal bucket.
+ Requires a good network connection for high download bandwidth.
+ """
+ def execute(self, context: ExecutionContext) -> None:
+ pass
+
+ class ProcessData(Task):
+ """
+ Perform compute-intensive processing of a dataset.
+ The dataset must be available in an internal bucket.
+ Requires access to a GPU for optimal performance.
+ """
+ def execute(self, context: ExecutionContext) -> None:
+ pass
+ ```
+```go Go
+package distributed
+
+import (
+ "context"
+ "fmt"
+ "github.com/tilebox/tilebox-go/workflows/v1"
+ "github.com/tilebox/tilebox-go/workflows/v1/subtask"
+)
+
+type DistributedWorkflow struct{}
+
+func (t *DistributedWorkflow) Execute(ctx context.Context) error {
+ downloadTask, err := workflows.SubmitSubtask(ctx, &DownloadData{})
+ if err != nil {
+ return fmt.Errorf("failed to submit download subtask: %w", err)
+ }
+
+ _, err = workflows.SubmitSubtask(ctx, &ProcessData{}, subtask.WithDependencies(downloadTask))
+ if err != nil {
+ return fmt.Errorf("failed to submit process subtask: %w", err)
+ }
+ return nil
+}
+
+// DownloadData Download a dataset and store it in a shared internal bucket.
+// Requires a good network connection for high download bandwidth.
+type DownloadData struct{}
+
+func (t *DownloadData) Execute(ctx context.Context) error {
+ return nil
+}
+
+// ProcessData Perform compute-intensive processing of a dataset.
+// The dataset must be available in an internal bucket.
+// Requires access to a GPU for optimal performance.
+type ProcessData struct{}
+
+func (t *ProcessData) Execute(ctx context.Context) error {
+ return nil
+}
+```
+
+
+To achieve distributed execution for this workflow, no single runner capable of executing all three of the tasks is set up.
+Instead, two runners, each capable of executing one of the tasks, are set up: one in a high-speed network environment and the other with GPU access.
+When the distributed workflow runs, the first runner picks up the `DownloadData` task, while the second picks up the `ProcessData` task.
+The `DistributedWorkflow` does not require specific hardware, so it can be registered with both runners and executed by either one.
+
+
+
+
+```python Python
+from tilebox.workflows import Client
+
+client = Client()
+high_network_speed_runner = client.runner(
+ tasks=[DownloadData, DistributedWorkflow]
+)
+high_network_speed_runner.run_forever()
+```
+```go Go
+package main
+
+import (
+ "context"
+ "log/slog"
+
+ "github.com/tilebox/tilebox-go/workflows/v1"
+)
+
+func main() {
+ ctx := context.Background()
+ client := workflows.NewClient()
+
+ highNetworkSpeedRunner, err := client.NewTaskRunner(ctx)
+ if err != nil {
+ slog.Error("failed to create runner", slog.Any("error", err))
+ return
+ }
+
+ err = highNetworkSpeedRunner.RegisterTasks(
+ &DownloadData{},
+ &DistributedWorkflow{},
+ )
+ if err != nil {
+ slog.Error("failed to register tasks", slog.Any("error", err))
+ return
+ }
+
+ highNetworkSpeedRunner.RunForever(ctx)
+}
+```
+
+
+
+
+
+```python Python
+from tilebox.workflows import Client
+
+client = Client()
+gpu_runner = client.runner(
+ tasks=[ProcessData, DistributedWorkflow]
+)
+gpu_runner.run_forever()
+```
+```go Go
+package main
+
+import (
+ "context"
+ "log/slog"
+
+ "github.com/tilebox/tilebox-go/workflows/v1"
+)
+
+func main() {
+ ctx := context.Background()
+ client := workflows.NewClient()
+
+ gpuRunner, err := client.NewTaskRunner(ctx)
+ if err != nil {
+ slog.Error("failed to create runner", slog.Any("error", err))
+ return
+ }
+
+ err = gpuRunner.RegisterTasks(
+ &ProcessData{},
+ &DistributedWorkflow{},
+ )
+ if err != nil {
+ slog.Error("failed to register tasks", slog.Any("error", err))
+ return
+ }
+
+ gpuRunner.RunForever(ctx)
+}
+```
+
+
+
+
+Now, both `download_runner.py` and `gpu_runner.py` are started, in parallel, on different machines with the required hardware for each. When `DistributedWorkflow` is submitted, it executes on one of the two runners, and it's submitted sub-tasks are handled by the appropriate runner.
+
+In this case, since `ProcessData` depends on `DownloadData`, the GPU runner remains idle until the download completion, then picks up the processing task.
+
+
+ You can also differentiate between runners by specifying different [clusters](/workflows/concepts/clusters) and choosing specific clusters for sub-task submissions. For more details, see the [Clusters](/workflows/concepts/clusters) section.
+
+
+## Task Failures
+
+If an unhandled exception occurs during task execution, the runner captures it and reports it back to the workflow orchestrator. The orchestrator then marks the task as failed, leading to [job cancellation](/workflows/concepts/jobs#cancellation) to prevent further tasks of the same job-that may not be relevant anymore-from being executed.
+
+A task failure does not result in losing all previous work done by the job. If the failure is fixable—by fixing a bug in a task implementation, ensuring the task has necessary resources, or simply retrying it due to a flaky network connection—it may be worth [retrying](/workflows/concepts/jobs#retries) the job.
+
+When retrying a job, all failed tasks are added back to the queue, allowing a runner to potentially execute them. If execution then succeeds, the job continues smoothly. Otherwise, the task will remain marked as failed and can be retried again if desired.
+
+For a release runner, publish a compatible fixed release and deploy it to the same cluster before retrying. For a direct runner, deploy the fixed script or binary before retrying. Keep task identifiers and input schemas compatible when you want an existing failed job to resume from the point of failure.
+
+## Task idempotency
+
+Since a task may be retried, it's possible that a task is executed more than once. Depending on where in the execution of the task it failed, it may have already performed some side effects, such as writing to a database, or sending a message to a queue. Because of that it's crucial to ensure that tasks are [idempotent](https://en.wikipedia.org/wiki/Idempotence). Idempotent tasks can be executed multiple times without altering the outcome beyond the first successful execution.
+
+A special case of idempotency involves submitting sub-tasks. After a task calls `context.submit_subtask` and then fails and is retried, those submitted sub-tasks of an earlier failed execution are automatically removed, ensuring that they can be safely submitted again when the task is retried.
+
+## Runner Crashes
+
+Tilebox Workflows has an internal mechanism to handle unexpected runner crashes. When a runner picks up a task, it periodically sends a heartbeat to the workflow orchestrator. If the orchestrator does not receive this heartbeat for a defined duration, it marks the task as failed and automatically attempts to [retry](/workflows/concepts/jobs#retries) it up to 10 times. This allows another runner to pick up the task and continue executing the job.
+
+This mechanism ensures that scenarios such as power outages, hardware failures, or dropped network connections are handled effectively, preventing any task from remaining in a running state indefinitely.
+
+## Observability
+
+Tilebox captures logs, spans, task states, and runner context from both runner modes. Use [Workflow observability](/workflows/observability/introduction) to inspect job execution, task failures, and runner behavior.
diff --git a/workflows/concepts/task-runners.mdx b/workflows/concepts/task-runners.mdx
deleted file mode 100644
index fe9a6af..0000000
--- a/workflows/concepts/task-runners.mdx
+++ /dev/null
@@ -1,369 +0,0 @@
----
-title: Task Runners
-icon: list-check
----
-
-
- Task runners are the execution agents within the Tilebox Workflows ecosystem that execute tasks. They can be deployed in different computing environments, including on-premise servers and cloud-based auto-scaling clusters. Task runners execute tasks as scheduled by the workflow orchestrator, ensuring they have the necessary resources and environment for effective execution.
-
-
-## Implementing a Task Runner
-
-A task runner is a continuously running process that listens for new tasks to execute. You can start multiple task runner processes to execute tasks concurrently. When a task runner receives a task, it executes it and reports the results back to the workflow orchestrator. The task runner also handles any errors that occur during task execution, reporting them to the orchestrator as well.
-
-To execute a task, at least one task runner must be running and available. If no task runners are available, tasks will remain queued until one becomes available.
-
-To create and start a task runner, follow these steps:
-
-
-
- Instantiate a client connected to the Tilebox Workflows API.
-
-
- Select or create a [cluster](/workflows/concepts/clusters) and specify its slug when creating a task runner.
- If no cluster is specified, the task runner will use the default cluster.
-
-
- Register tasks by specifying the task classes that the task runner can execute as a list to the `runner` method.
-
-
- Call the `run_forever` method of the task runner to listen for new tasks until the task runner process is shut down.
-
-
-
-Here is a simple example demonstrating these steps:
-
-
-```python Python
-from tilebox.workflows import Client
-# your own workflow:
-from my_workflow import MyTask, OtherTask
-
-def main():
- client = Client() # 1. connect to the Tilebox Workflows API
- runner = client.runner(
- cluster= "dev-cluster" # 2. select a cluster to join (optional, omit to use the default cluster)
- tasks=[MyTask, OtherTask] # 3. register tasks
- )
- runner.run_forever() # 4. listen for new tasks to execute
-
-if __name__ == "__main__":
- main()
-```
-```go Go
-package main
-
-import (
- "context"
- "log/slog"
-
- "github.com/tilebox/tilebox-go/workflows/v1"
- "github.com/tilebox/tilebox-go/workflows/v1/runner"
- // your own workflow:
- "github.com/my_org/myworkflow"
-)
-
-func main() {
- ctx := context.Background()
-
- // 1. connect to the Tilebox Workflows API
- client := workflows.NewClient()
-
- // 2. select a cluster to join (optional, omit to use the default cluster)
- runner, err := client.NewTaskRunner(ctx, runner.WithClusterSlug("dev-cluster"))
- if err != nil {
- slog.Error("failed to create task runner", slog.Any("error", err))
- return
- }
-
- // 3. register tasks
- err = runner.RegisterTasks(
- &myworkflow.MyTask{},
- &myworkflow.OtherTask{},
- )
- if err != nil {
- slog.Error("failed to register task", slog.Any("error", err))
- return
- }
-
- // 4. listen for new tasks to execute
- runner.Run(ctx)
-}
-```
-
-
-To start the task runner locally, run it as a script:
-
-
-```bash Python
-> python task_runner.py
-```
-```bash Go
-> go run .
-```
-
-
-## Task Selection
-
-For a task runner to pick up a submitted task, the following conditions must be met:
-
-1. The [cluster](/workflows/concepts/clusters) where the task was submitted must match the task runner's cluster.
-2. The task runner must have a registered task that matches the [task identifier](/workflows/concepts/tasks#task-identifiers) of the submitted task.
-3. The version of the task runner's registered task must be [compatible](/workflows/concepts/tasks#semantic-versioning) with the submitted task's version.
-
-If a task meets these conditions, the task runner executes it. Otherwise, the task runner remains idle until a matching task is available.
-
-
- Often, multiple submitted tasks match the conditions for execution. In that case, the task runner selects one of the tasks to execute, and the remaining tasks stay in a queue until the selected task is completed or another task runner becomes available.
-
-
-## Parallelism
-
-You can start multiple task runner instances in parallel to execute tasks concurrently. Each task runner listens for new tasks and executes them as they become available. This allows for high parallelism and can be used to scale the execution of tasks to handle large workloads.
-
-To test this, run multiple instances of the task runner script in different terminal windows on your local machine, or use a tool like [call-in-parallel](https://github.com/tilebox/call-in-parallel) to start the task runner script multiple times.
-
-For example, to start five task runners in parallel, use the following command:
-
-```bash
-> call-in-parallel -n 5 -- python task_runner.py
-```
-
-## Deploying Task Runners
-
-Task runners are continuously running processes that can be deployed in different computing environments. The only requirement for deploying task runners is access to the Tilebox Workflows API. Once this is met, task runners can be deployed in many different environments, including:
-
-- On-premise servers
-- Cloud-based virtual machines
-- Cloud-based auto-scaling clusters
-
-## Scaling
-
-One key benefit of task runners is their **ability to scale even while workflows are executing**. You can start new task runners at any time, and they can immediately pick up queued tasks to execute. It's not necessary to have an entire processing cluster available at the start of a workflow, as task runners can be started and stopped as needed.
-
-This is particularly beneficial in cloud environments, where task runners can be automatically started and stopped based on current workload, measured by metrics such as CPU usage. Here's an example scenario:
-
-1. A single instance of a task runner is actively waiting for work in a cloud environment.
-2. A large workload is submitted to the workflow orchestrator, prompting the task runner to pick up the first task.
-3. The first task creates new sub-tasks for processing, which the task runner also picks up.
-4. As the workload increases, the task runner's CPU usage rises, triggering the cloud environment to automatically start new task runner instances.
-5. Newly started task runners begin executing queued tasks, distributing the workload among all available task runners.
-6. Once the workload decreases, the cloud environment automatically stops some task runners.
-7. The first task runner completes the remaining work until everything is done.
-8. The first task runner remains idle until new tasks arrive.
-
-CPU usage-based auto scaling is just one method to scale task runners. Other metrics, such as memory usage or network bandwidth, are also supported by many cloud environments. In a future release, configuration options for scaling task runners based on custom metrics (for example the number of queued tasks) are planned.
-
-## Distributed Execution
-
-Task runners can be distributed across different compute environments. For instance, some data stored on-premise may need pre-processing, while further processing occurs in the cloud. A job might involve tasks that filter relevant on-premise data and publish it to the cloud, and other tasks that read data from the cloud and process it. In such scenarios, a task runners can run on-premise and another in a cloud environments, resulting in them effectively collaborating on the same job.
-
-Another advantage of distributed task runners is executing workflows that require specific hardware for certain tasks. For example, one task might need a GPU, while another requires extensive memory.
-
-Here's an example of a distributed workflow:
-
-
- ```python Python
- from tilebox.workflows import Task, ExecutionContext
-
- class DistributedWorkflow(Task):
- def execute(self, context: ExecutionContext) -> None:
- download_task = context.submit_subtask(DownloadData())
- process_task = context.submit_subtask(
- ProcessData(),
- depends_on=[download_task],
- )
-
- class DownloadData(Task):
- """
- Download a dataset and store it in a shared internal bucket.
- Requires a good network connection for high download bandwidth.
- """
- def execute(self, context: ExecutionContext) -> None:
- pass
-
- class ProcessData(Task):
- """
- Perform compute-intensive processing of a dataset.
- The dataset must be available in an internal bucket.
- Requires access to a GPU for optimal performance.
- """
- def execute(self, context: ExecutionContext) -> None:
- pass
- ```
-```go Go
-package distributed
-
-import (
- "context"
- "fmt"
- "github.com/tilebox/tilebox-go/workflows/v1"
- "github.com/tilebox/tilebox-go/workflows/v1/subtask"
-)
-
-type DistributedWorkflow struct{}
-
-func (t *DistributedWorkflow) Execute(ctx context.Context) error {
- downloadTask, err := workflows.SubmitSubtask(ctx, &DownloadData{})
- if err != nil {
- return fmt.Errorf("failed to submit download subtask: %w", err)
- }
-
- _, err = workflows.SubmitSubtask(ctx, &ProcessData{}, subtask.WithDependencies(downloadTask))
- if err != nil {
- return fmt.Errorf("failed to submit process subtask: %w", err)
- }
- return nil
-}
-
-// DownloadData Download a dataset and store it in a shared internal bucket.
-// Requires a good network connection for high download bandwidth.
-type DownloadData struct{}
-
-func (t *DownloadData) Execute(ctx context.Context) error {
- return nil
-}
-
-// ProcessData Perform compute-intensive processing of a dataset.
-// The dataset must be available in an internal bucket.
-// Requires access to a GPU for optimal performance.
-type ProcessData struct{}
-
-func (t *ProcessData) Execute(ctx context.Context) error {
- return nil
-}
-```
-
-
-To achieve distributed execution for this workflow, no single task runner capable of executing all three of the tasks is set up.
-Instead, two task runners, each capable of executing one of the tasks are set up: one in a high-speed network environment and the other with GPU access.
-When the distributed workflow runs, the first task runner picks up the `DownloadData` task, while the second picks up the `ProcessData` task.
-The `DistributedWorkflow` does not require specific hardware, so it can be registered with both runners and executed by either one.
-
-
-
-
-```python Python
-from tilebox.workflows import Client
-
-client = Client()
-high_network_speed_runner = client.runner(
- tasks=[DownloadData, DistributedWorkflow]
-)
-high_network_speed_runner.run_forever()
-```
-```go Go
-package main
-
-import (
- "context"
- "log/slog"
-
- "github.com/tilebox/tilebox-go/workflows/v1"
-)
-
-func main() {
- ctx := context.Background()
- client := workflows.NewClient()
-
- highNetworkSpeedRunner, err := client.NewTaskRunner(ctx)
- if err != nil {
- slog.Error("failed to create task runner", slog.Any("error", err))
- return
- }
-
- err = highNetworkSpeedRunner.RegisterTasks(
- &DownloadData{},
- &DistributedWorkflow{},
- )
- if err != nil {
- slog.Error("failed to register tasks", slog.Any("error", err))
- return
- }
-
- highNetworkSpeedRunner.RunForever(ctx)
-}
-```
-
-
-
-
-
-```python Python
-from tilebox.workflows import Client
-
-client = Client()
-gpu_runner = client.runner(
- tasks=[ProcessData, DistributedWorkflow]
-)
-gpu_runner.run_forever()
-```
-```go Go
-package main
-
-import (
- "context"
- "log/slog"
-
- "github.com/tilebox/tilebox-go/workflows/v1"
-)
-
-func main() {
- ctx := context.Background()
- client := workflows.NewClient()
-
- gpuRunner, err := client.NewTaskRunner(ctx)
- if err != nil {
- slog.Error("failed to create task runner", slog.Any("error", err))
- return
- }
-
- err = gpuRunner.RegisterTasks(
- &ProcessData{},
- &DistributedWorkflow{},
- )
- if err != nil {
- slog.Error("failed to register tasks", slog.Any("error", err))
- return
- }
-
- gpuRunner.RunForever(ctx)
-}
-```
-
-
-
-
-Now, both `download_task_runner.py` and `gpu_task_runner.py` are started, in parallel, on different machines with the required hardware for each. When `DistributedWorkflow` is submitted, it executes on one of the two runners, and it's submitted sub-tasks are handled by the appropriate runner.
-
-In this case, since `ProcessData` depends on `DownloadData`, the GPU task runner remains idle until the download completion, then picks up the processing task.
-
-
- You can also differentiate between task runners by specifying different [clusters](/workflows/concepts/clusters) and choosing specific clusters for sub-task submissions. For more details, see the [Clusters](/workflows/concepts/clusters) section.
-
-
-## Task Failures
-
-If an unhandled exception occurs during task execution, the task runner captures it and reports it back to the workflow orchestrator. The orchestrator then marks the task as failed, leading to [job cancellation](/workflows/concepts/jobs#cancellation) to prevent further tasks of the same job-that may not be relevant anymore-from being executed.
-
-A task failure does not result in losing all previous work done by the job. If the failure is fixable—by fixing a bug in a task implementation, ensuring the task has necessary resources, or simply retrying it due to a flaky network connection—it may be worth [retrying](/workflows/concepts/jobs#retries) the job.
-
-When retrying a job, all failed tasks are added back to the queue, allowing a task runner to potentially execute them. If execution then succeeds, the job continues smoothly. Otherwise, the task will remain marked as failed and can be retried again if desired.
-
-If fixing a failure requires modifying the task implementation, it's important to deploy the updated version to the [task runners](/workflows/concepts/task-runners) before retrying the job. Otherwise, a task runner could pick up the original, faulty implementation again, leading to another failure.
-
-## Task idempotency
-
-Since a task may be retried, it's possible that a task is executed more than once. Depending on where in the execution of the task it failed, it may have already performed some side effects, such as writing to a database, or sending a message to a queue. Because of that it's crucial to ensure that tasks are [idempotent](https://en.wikipedia.org/wiki/Idempotence). Idempotent tasks can be executed multiple times without altering the outcome beyond the first successful execution.
-
-A special case of idempotency involves submitting sub-tasks. After a task calls `context.submit_subtask` and then fails and is retried, those submitted sub-tasks of an earlier failed execution are automatically removed, ensuring that they can be safely submitted again when the task is retried.
-
-## Runner Crashes
-
-Tilebox Workflows has an internal mechanism to handle unexpected task runner crashes. When a task runner picks up a task, it periodically sends a heartbeat to the workflow orchestrator. If the orchestrator does not receive this heartbeat for a defined duration, it marks the task as failed and automatically attempts to [retry](/workflows/concepts/jobs#retries) it up to 10 times. This allows another task runner to pick up the task and continue executing the job.
-
-This mechanism ensures that scenarios such as power outages, hardware failures, or dropped network connections are handled effectively, preventing any task from remaining in a running state indefinitely.
-
-## Observability
-
-Task runners are continuously running processes, making it essential to monitor their health and performance. Tilebox Workflows collects logs and traces from task runners automatically. To learn how to inspect and customize workflow observability, see [Observability](/workflows/observability/introduction).
diff --git a/workflows/concepts/tasks.mdx b/workflows/concepts/tasks.mdx
index 7c4eb6c..8fbc531 100644
--- a/workflows/concepts/tasks.mdx
+++ b/workflows/concepts/tasks.mdx
@@ -1,12 +1,13 @@
---
title: Understanding and Creating Tasks
sidebarTitle: Tasks
+description: Define workflow tasks, inputs, subtasks, dependencies, retries, and stable task identifiers.
icon: laptop-code
---
-
- A Task is the smallest unit of work, designed to perform a specific operation. Each task represents a distinct operation or process that can be executed, such as processing data, performing calculations, or managing resources. Tasks can operate independently or as components of a more complex set of connected tasks known as a Workflow. Tasks are defined by their code, inputs, and dependencies on other tasks. To create tasks, you need to define the input parameters and specify the action to be performed during execution.
-
+A task is the unit of work Tilebox [runners](/workflows/concepts/runners) execute. A task class defines the code to run, the input fields that are serialized with each task submission, and optional relationships to other tasks through subtasks and dependencies.
+
+Tasks can run as the root task of a [job](/workflows/concepts/jobs) or as subtasks submitted by another task. This lets a workflow build a dynamic task graph while Tilebox schedules eligible tasks across runners in the selected [cluster](/workflows/concepts/clusters).
## Creating a Task
@@ -39,7 +40,7 @@ For python, the key components of this task are:
`MyFirstTask` is a subclass of the `Task` class, which serves as the base class for all defined tasks. It provides the essential structure for a task. Inheriting from `Task` automatically makes the class a `dataclass`, which is useful [for specifying inputs](#input-parameters). Additionally, by inheriting from `Task`, the task is automatically assigned an [identifier based on the class name](#task-identifiers).
- The `execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [task runner](/workflows/concepts/task-runners) when the task runs and performs the task's operation.
+ The `execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [runner](/workflows/concepts/runners) when the task runs and performs the task's operation.
The `context` argument is an `ExecutionContext` instance that provides access to an [API for submitting new tasks](/api-reference/python/tilebox.workflows/ExecutionContext.submit_subtask) as part of the same job, [task logging](/api-reference/python/tilebox.workflows/ExecutionContext.logger), [custom tracing](/api-reference/python/tilebox.workflows/ExecutionContext.tracer), and features like [shared caching](/api-reference/python/tilebox.workflows/ExecutionContext.job_cache).
@@ -53,13 +54,13 @@ For Go, the key components are:
`MyFirstTask` is a struct that implements the `Task` interface. It represents the task to be executed.
- The `Execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [task runner](/workflows/concepts/task-runners) when the task runs and performs the task's operation.
+ The `Execute` method is the entry point for executing the task. This is where the task's logic is defined. It's invoked by a [runner](/workflows/concepts/runners) when the task runs and performs the task's operation.
The code samples on this page do not illustrate how to execute the task. That will be covered in the
- [next section on task runners](/workflows/concepts/task-runners). The reason for that is that executing tasks is a separate concern from implementing tasks.
+ [next section on runners](/workflows/concepts/runners). The reason for that is that executing tasks is a separate concern from implementing tasks.
## Input Parameters
@@ -67,7 +68,7 @@ For Go, the key components are:
Tasks often require input parameters to operate. These inputs can range from simple values to complex data structures. By inheriting from the `Task` class, the task is treated as a Python `dataclass`, allowing input parameters to be defined as class attributes.
- Tasks must be **serializable to JSON or to protobuf** because they may be distributed across a cluster of [task runners](/workflows/concepts/task-runners).
+ Tasks must be **serializable to JSON or to protobuf** because they may be distributed across a cluster of [runners](/workflows/concepts/runners).
@@ -131,7 +132,7 @@ class ChildTask(Task):
def execute(self, context: ExecutionContext) -> None:
context.logger.info("Executing child task", index=self.index)
-# after submitting this task, a task runner may pick it up and execute it
+# after submitting this task, a runner may pick it up and execute it
# which will result in 5 ChildTasks being submitted and executed as well
task = ParentTask(5)
```
@@ -161,7 +162,7 @@ func (t *ChildTask) Execute(context.Context) error {
return nil
}
-// after submitting this task, a task runner may pick it up and execute it
+// after submitting this task, a runner may pick it up and execute it
// which will result in 5 ChildTasks being submitted and executed as well
task := &ParentTask{numSubtasks: 5}
```
@@ -172,7 +173,7 @@ In this example, a `ParentTask` submits `ChildTask` tasks as subtasks. The numbe
Parent task do not have access to results of subtasks, instead, tasks can use [shared caching](/workflows/caches#storing-and-retrieving-data) to share data between tasks.
- By submitting a task as a subtask, its execution is scheduled as part of the same job as the parent task. Compared to just directly invoking the subtask's `execute` method, this allows the subtask's execution to occur on a different machine or in parallel with other subtasks. To learn more about how tasks are executed, see the section on [task runners](/workflows/concepts/task-runners).
+ By submitting a task as a subtask, its execution is scheduled as part of the same job as the parent task. Compared to just directly invoking the subtask's `execute` method, this allows the subtask's execution to occur on a different machine or in parallel with other subtasks. To learn more about how tasks are executed, see the section on [runners](/workflows/concepts/runners).
### Larger subtasks example
@@ -311,7 +312,7 @@ job = jobs.submit(
DownloadRandomDogImages(5),
)
-# now our deployed task runners will pick up the task and execute it
+# now our deployed runners will pick up the task and execute it
jobs.display(job)
```
@@ -331,7 +332,7 @@ if err != nil {
return
}
-// now our deployed task runners will pick up the task and execute it
+// now our deployed runners will pick up the task and execute it
```
@@ -348,7 +349,7 @@ if err != nil {
/>
-In total, six tasks are executed: the `DownloadRandomDogImages` task and five `DownloadImage` tasks. The `DownloadImage` tasks can execute in parallel, as they are independent. If more than one task runner is available, the Tilebox Workflow Orchestrator **automatically parallelizes** the execution of these tasks.
+In total, six tasks are executed: the `DownloadRandomDogImages` task and five `DownloadImage` tasks. The `DownloadImage` tasks can execute in parallel, as they are independent. If more than one runner is available, the Tilebox Workflow Orchestrator **automatically parallelizes** the execution of these tasks.
Check out [job_client.display](/workflows/concepts/jobs#visualization) to learn how this visualization was automatically generated from the task executions.
@@ -358,7 +359,7 @@ In total, six tasks are executed: the `DownloadRandomDogImages` task and five `D
Every task goes through a set of states during its lifetime.
-- When submitted, either as a job or as a subtask, it starts in the `QUEUED` state and transitions to `RUNNING` when a task runner picks it up.
+- When submitted, either as a job or as a subtask, it starts in the `QUEUED` state and transitions to `RUNNING` when a runner picks it up.
- If the task executes successfully, it transitions to `COMPUTED`.
- If the task fails, it transitions to `FAILED`, unless it's an [optional task](#optional-tasks), or nested within an [optional task](#nested-optional-tasks), in which case it transitions to `FAILED_OPTIONAL`.
- As soon as all subtasks of a task are `COMPUTED` (or `FAILED_OPTIONAL`), the task is considered `COMPLETED`, allowing dependent tasks to be executed.
@@ -367,8 +368,8 @@ The table below summarizes the different task states and their meanings.
| Task State | Description |
|------------|-------------|
-| **Queued** | The task is queued and waiting for execution. Any [eligible](/workflows/concepts/task-runners#task-selection) task runner can pick it up and execute it, as soon as it's parent task is `COMPUTED` and all it's dependencies are `COMPLETED`. |
-| **Running** | The task is currently being executed by a task runner. |
+| **Queued** | The task is queued and waiting for execution. Any [eligible](/workflows/concepts/runners#task-selection) runner can pick it up and execute it, as soon as it's parent task is `COMPUTED` and all it's dependencies are `COMPLETED`. |
+| **Running** | The task is currently being executed by a runner. |
| **Computed** | The task has successfully been computed, but still has outstanding subtasks. |
| **Completed** | The task has successfully been computed, and all it's subtasks are also computed, making it `COMPLETED`. This is the final state of a task. Only once a task has been `COMPLETED`, dependent tasks can be executed. |
| **Failed** | The task has been executed but encountered an error. |
@@ -429,7 +430,7 @@ class Sum(Task): # The reduce step
```
-Submitting a job of the `SumOfSquares` task and running it with a task runner can be done as follows:
+Submitting a job of the `SumOfSquares` task and running it with a runner can be done as follows:
```python Python
@@ -978,7 +979,7 @@ This workflow consists of four tasks:
| PrintHeadlines | FetchNews | A task that logs the headlines of the news articles. |
| MostFrequentAuthors | FetchNews | A task that counts the number of articles each author has written and logs the result. |
-An important aspect is that there is no dependency between the `PrintHeadlines` and `MostFrequentAuthors` tasks. This means they can execute in parallel, which the Tilebox Workflow Orchestrator will do, provided multiple task runners are available.
+An important aspect is that there is no dependency between the `PrintHeadlines` and `MostFrequentAuthors` tasks. This means they can execute in parallel, which the Tilebox Workflow Orchestrator will do, provided multiple runners are available.
In this example, the results from `FetchNews` are stored in a file. This is not the recommended method for passing data between tasks. When executing on a distributed cluster, the existence of a file written by a dependent task cannot be guaranteed. Instead, it's better to use a [shared cache](/workflows/caches).
@@ -1239,7 +1240,7 @@ If instead `Step1B` was also marked as optional, `Step1C` and `Step2` would stil
## Task Identifiers
-A task identifier is a unique string used by the Tilebox Workflow Orchestrator to identify the task. It's used by [task runners](/workflows/concepts/task-runners) to map submitted tasks to a task class and execute them. It also serves as the default name in execution visualizations.
+A task identifier is a unique string used by the Tilebox Workflow Orchestrator to identify the task. It's used by [runners](/workflows/concepts/runners) to map submitted tasks to a task class and execute them. It also serves as the default name in execution visualizations.
If unspecified, the identifier of a task defaults to the class name. For instance, the identifier of the `PrintHeadlines` task in the previous example is `"PrintHeadlines"`. This is good for prototyping, but not recommended for production, as changing the class name also changes the identifier, which can lead to issues during refactoring. It also prevents different tasks from sharing the same class name.
@@ -1323,17 +1324,17 @@ func (t *MyTask) Execute(context.Context) error {
```
-When a task is submitted as part of a job, the version from which it's submitted is recorded and may differ from the version on the task runner executing the task.
+When a task is submitted as part of a job, the version from which it's submitted is recorded and may differ from the version on the runner executing the task.
-When task runners execute a task, they require a registered task with a matching identifier and compatible version number. A compatible version is where the major version number on the task runner matches that of the submitted task, and the minor version number on the task runner is equal to or greater than that of the submitted task.
+When runners execute a task, they require a registered task with a matching identifier and compatible version number. A compatible version is where the major version number on the runner matches that of the submitted task, and the minor version number on the runner is equal to or greater than that of the submitted task.
Examples of compatible version numbers include:
- `MyTask` is submitted as part of a job. The version is `"v1.3"`.
-- A task runner with version `"v1.3"` of `MyTask` would executes this task.
-- A task runner with version `"v1.5"` of `MyTask` would also executes this task.
-- A task runner with version `"v1.2"` of `MyTask` would not execute this task, as its minor version is lower than that of the submitted task.
-- A task runner with version `"v2.5"` of `MyTask` would not execute this task, as its major version differs from that of the submitted task.
+- A runner with version `"v1.3"` of `MyTask` would execute this task.
+- A runner with version `"v1.5"` of `MyTask` would also execute this task.
+- A runner with version `"v1.2"` of `MyTask` would not execute this task, as its minor version is lower than that of the submitted task.
+- A runner with version `"v2.5"` of `MyTask` would not execute this task, as its major version differs from that of the submitted task.
## Conclusion
diff --git a/workflows/concepts/workflow-releases.mdx b/workflows/concepts/workflow-releases.mdx
new file mode 100644
index 0000000..a9976b3
--- /dev/null
+++ b/workflows/concepts/workflow-releases.mdx
@@ -0,0 +1,75 @@
+---
+title: Workflow Releases
+description: Understand workflow releases, release artifacts, and how release runners execute deployed Python workflow projects.
+icon: box-open
+---
+
+A workflow is a set of interrelated tasks. You can run those tasks directly without registering the workflow with Tilebox. Registering a workflow with the Tilebox API gives it a stable slug, which lets you publish immutable release artifacts to it and deploy a release to one or more clusters.
+
+That release path enables [release runners](/workflows/concepts/runners#release-runners). Release runners operate on a cluster, pick up all the releases deployed to that cluster, and execute tasks. This provides an easy way of deploying workflows to a compute cluster, including a quick and agent-accessible iteration loop: change code, publish a release, deploy it, run a job, and inspect the result.
+
+## Workflows and releases
+
+A workflow is the long-lived object referred to by slug. A release is one concrete version of that workflow. The release is immutable, so a later code change creates a new release instead of modifying the old one. You can deploy the same release to one or multiple clusters. Release runners on those clusters then pick up that release and run tasks registered by it.
+
+
+
+
+
+
+Use this model when you want reproducible workflow execution. You can inspect which release is deployed to a cluster, promote a known release to another cluster, or retry a failed job after deploying a compatible fix.
+
+## Release artifacts
+
+The release artifact is built from the files selected by `tilebox.workflow.toml`. The build command resolves include patterns, applies exclude patterns and `.gitignore` when enabled, creates a deterministic `.tar.zst` archive, and validates the runtime by discovering registered tasks.
+
+The artifact should contain code and small configuration. Keep downloaded data, model checkpoints, generated caches, and local virtual environments out of the release. If a workflow needs large runtime assets, fetch them lazily from the task code into a runner-local cache.
+
+## Task registrations
+
+Task registrations are discovered from the configured Python runner object or command during release validation. The discovered task identifiers are stored in the release content and later advertised by release runners.
+
+For a reusable Python workflow project, define a `Runner` object:
+
+```python Python
+# my_workflow/runner.py
+from tilebox.workflows import ExecutionContext, Runner, Task
+
+
+class FirstTask(Task):
+ def execute(self, context: ExecutionContext) -> None:
+ ...
+
+
+class SecondTask(Task):
+ def execute(self, context: ExecutionContext) -> None:
+ ...
+
+
+runner = Runner(tasks=[FirstTask, SecondTask])
+```
+
+Then point `tilebox.workflow.toml` at that object:
+
+```toml
+[workflow]
+slug = "my-workflow"
+root = "."
+runner = "my_workflow.runner:runner"
+```
+
+## Cluster deployments
+
+A cluster deployment maps a workflow release to a cluster. A release runner can run multiple deployed releases for the same cluster and updates its task registrations when cluster deployments change.
+
+Deploying, updating, or removing a release deployment changes what the release runner can execute. It does not require rebuilding the runner process itself.
+
+## Fixing failed jobs
+
+If a job fails because of a bug in task code, publish a compatible fixed release and deploy it to the same cluster before retrying the job. Keep the task identifier name, major version, and input schema compatible when you want the existing failed job to resume from failed tasks.
+
+```bash
+tilebox workflow publish-release --json
+tilebox workflow deploy-release --latest --cluster dev-cluster --json
+tilebox job retry --json
+```
diff --git a/workflows/introduction.mdx b/workflows/introduction.mdx
index 96b1575..46b1e30 100644
--- a/workflows/introduction.mdx
+++ b/workflows/introduction.mdx
@@ -1,31 +1,42 @@
---
title: Tilebox Workflows
sidebarTitle: Introduction
-description: The Tilebox workflow orchestrator is a parallel processing engine. It simplifies the creation of dynamic tasks that can be executed across various computing environments, including on-premise and auto-scaling clusters in public clouds.
+description: Run space data workflows across local, on-premises, and cloud environments with agent-friendly iteration loops.
icon: network-wired
mode: wide
---
-This section provides guides showcasing how to use the Tilebox workflow orchestrator effectively. Here are some of the key learning areas:
+Tilebox Workflows is a parallel processing engine for space data operations. It helps you turn processing steps such as fetching scenes, validating products, generating previews, running models, and publishing outputs into [tasks](/workflows/concepts/tasks) that can run across local machines, on-premises systems, and cloud environments.
+
+
+
+
+
+
+You submit a [job](/workflows/concepts/jobs) from workflow code, and Tilebox tracks the resulting task graph while [runners](/workflows/concepts/runners) execute eligible work on the selected [cluster](/workflows/concepts/clusters). During development, you can run tasks directly from your own script or service. For reproducible deployment, you can publish [workflow releases](/workflows/concepts/workflow-releases), deploy them to clusters, and let release runners execute the deployed code.
+
+This model also gives AI coding agents a practical iteration loop: edit workflow code, publish a release, deploy it to a development cluster, submit a test job, and inspect logs, traces, and job state before making the next change.
+
+## Get Started with Tilebox Workflows
-
- Create tasks using the Tilebox Workflow Orchestrator.
+
+ Define task classes, inputs, subtasks, dependencies, retries, and identifiers.
-
- Learn how to submit jobs to the workflow orchestrator, which schedules tasks for execution.
+
+ Submit a root task to a cluster and let runners execute the resulting task graph.
-
- Learn how to set up task runners to execute tasks in a distributed manner.
+
+ Learn how to set up a runner in order to execute tasks.
-
- Understand how to gain insights into task executions using observability features like tracing and logging.
+
+ Package workflow projects into immutable releases.
-
- Learn to configure shared data access for all tasks of a job using caches.
+
+ Map releases to clusters and run them with `tilebox runner start`.
-
- Trigger jobs based on events or schedules, such as new data availability or CRON schedules.
+
+ Use logs, traces, and job state to debug workflows across runners.
@@ -40,11 +51,14 @@ Before exploring Tilebox Workflows in depth, familiarize yourself with some comm
A job is a specific execution of a workflow with designated input parameters. It consists of one or more tasks that can run in parallel or sequentially, based on their dependencies. Submitting a job involves creating a root task with specific input parameters, which may trigger the execution of other tasks within the same job.
-
- Task runners are the execution agents within the Tilebox Workflows ecosystem that execute tasks. They can be deployed in different computing environments, including on-premise servers and cloud-based auto-scaling clusters. Task runners execute tasks as scheduled by the workflow orchestrator, ensuring they have the necessary resources and environment for effective execution.
+
+ Runners are processes that execute workflow tasks for a cluster. Direct runners register task classes from a standalone script, service, or binary that connects to the Tilebox API through the SDK. Release runners are started by the Tilebox CLI and load task registrations from Python workflow releases deployed to their cluster.
+
+
+ A workflow release is an immutable package of a workflow project. It includes source files, the command or runner object used to start the workflow runtime, and the task identifiers discovered during release validation.
- Clusters are a logical grouping for task runners. Using clusters, you can scope certain tasks to a specific group of task runners. Tasks, which are always submitted to a specific cluster, are only executed on task runners assigned to the same cluster.
+ Clusters group runners and receive workflow release deployments. Jobs are submitted to a cluster, and only runners assigned to that cluster can claim the job's tasks. A release runner can run multiple releases that are deployed to its cluster.
Caches are shared storage that enable data storage and retrieval across tasks within a single job. They store intermediate results and share data among tasks, enabling distributed computing and reducing redundant data processing.
diff --git a/workflows/near-real-time/cron.mdx b/workflows/near-real-time/cron.mdx
index cc5c8fc..a827c67 100644
--- a/workflows/near-real-time/cron.mdx
+++ b/workflows/near-real-time/cron.mdx
@@ -57,10 +57,10 @@ cron_automation = automations.create_cron_automation(
A helpful tool to test your cron expressions is [crontab.guru](https://crontab.guru/).
-## Starting a Cron Task Runner
+## Starting a Cron runner
-With the Cron automation registered, a job is submitted whenever the Cron expression matches. But unless a [task runner](/workflows/concepts/task-runners) is available to execute the Cron task the submitted jobs remain in a task queue.
-Once an [eligible task runner](/workflows/concepts/task-runners#task-selection) becomes available, all jobs in the queue are executed.
+With the Cron automation registered, a job is submitted whenever the Cron expression matches. But unless a [runner](/workflows/concepts/runners) is available to execute the Cron task the submitted jobs remain in a task queue.
+Once an [eligible runner](/workflows/concepts/runners#task-selection) becomes available, all jobs in the queue are executed.
```python Python
from tilebox.workflows import Client
@@ -70,7 +70,7 @@ runner = client.runner(tasks=[MyCronTask])
runner.run_all()
```
-If this task runner runs continuously, its logs may resemble the following:
+If this runner runs continuously, its logs may resemble the following:
```plaintext Logs
Cron task triggered message=World trigger_time=2023-09-25 16:12:00
diff --git a/workflows/near-real-time/storage-events.mdx b/workflows/near-real-time/storage-events.mdx
index dbc182f..2cf2be2 100644
--- a/workflows/near-real-time/storage-events.mdx
+++ b/workflows/near-real-time/storage-events.mdx
@@ -98,7 +98,7 @@ local_object = local_folder.read("my-object.txt")
The `read` method instantiates a client for the specific storage location. This requires that
- the storage location is accessible by a task runner and may require credentials for cloud storage
+ the storage location is accessible by a runner and may require credentials for cloud storage
or physical/network access to a locally mounted file system.
@@ -142,10 +142,10 @@ Here are some examples of valid glob patterns:
| `folder/**` | Any file directly or recursively part of a `folder` subdirectory |
| `[a-z].txt`| Matches `a.txt`, `b.txt`, etc. |
-## Start a Storage Event Task Runner
+## Start a Storage Event runner
-With the Storage Event automation registered, a job is submitted whenever a storage event occurs. But unless a [task runner](/workflows/concepts/task-runners) is available to execute the Storage Event task the submitted jobs remain in a task queue.
-Once an [eligible task runner](/workflows/concepts/task-runners#task-selection) becomes available, all jobs in the queue are executed.
+With the Storage Event automation registered, a job is submitted whenever a storage event occurs. But unless a [runner](/workflows/concepts/runners) is available to execute the Storage Event task the submitted jobs remain in a task queue.
+Once an [eligible runner](/workflows/concepts/runners#task-selection) becomes available, all jobs in the queue are executed.
```python Python
from tilebox.workflows import Client
@@ -164,7 +164,7 @@ echo "Hello World" > my-object.txt
gcloud storage cp my-object.txt gs://gcs-bucket-fab3fa2
```
-Inspecting the task runner output reveals that the job was submitted and the task executed:
+Inspecting the runner output reveals that the job was submitted and the task executed:
```plaintext Output
2024-09-25 16:51:45,621 INFO A new object was created: my-object.txt
diff --git a/workflows/observability/integrations/axiom.mdx b/workflows/observability/integrations/axiom.mdx
index eea1e70..5651f0c 100644
--- a/workflows/observability/integrations/axiom.mdx
+++ b/workflows/observability/integrations/axiom.mdx
@@ -86,7 +86,7 @@ func main() {
client := workflows.NewClient()
runner, err := client.NewTaskRunner(ctx)
if err != nil {
- slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err))
+ slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err))
return
}
diff --git a/workflows/observability/integrations/open-telemetry.mdx b/workflows/observability/integrations/open-telemetry.mdx
index 4fc3ab5..323716e 100644
--- a/workflows/observability/integrations/open-telemetry.mdx
+++ b/workflows/observability/integrations/open-telemetry.mdx
@@ -84,7 +84,7 @@ func main() {
client := workflows.NewClient()
runner, err := client.NewTaskRunner(ctx)
if err != nil {
- slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err))
+ slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err))
return
}
diff --git a/workflows/observability/introduction.mdx b/workflows/observability/introduction.mdx
index a863b24..fcee3ea 100644
--- a/workflows/observability/introduction.mdx
+++ b/workflows/observability/introduction.mdx
@@ -5,7 +5,7 @@ description: Inspect workflow logs, traces, task status, and runner behavior.
icon: lightbulb
---
-Tilebox Workflows gives each job a live observability record. As task runners execute work, Tilebox captures logs, traces, task status, and runner context. You can follow a job from the root task through its subtasks, inspect failures, and compare slow steps across distributed runners.
+Tilebox Workflows gives each job a live observability record. As runners execute work, Tilebox captures logs, traces, task status, and runner context. You can follow a job from the root task through its subtasks, inspect failures, and compare slow steps across distributed runners.

@@ -45,7 +45,7 @@ Use the built-in view for day-to-day debugging and operations. Add structured lo
A submitted job starts a trace. Each task run creates a span, and custom spans sit under the task that creates them. Log records emitted from task code attach to active spans, which connects messages to timing data.
-Tilebox adds job, task, runner, and service metadata to telemetry records. This data helps you filter by job, inspect a single task run, or compare work across task runners.
+Tilebox adds job, task, runner, and service metadata to telemetry records. This data helps you filter by job, inspect a single task run, or compare work across runners.
## Observability example
diff --git a/workflows/observability/logging.mdx b/workflows/observability/logging.mdx
index afac772..ff96818 100644
--- a/workflows/observability/logging.mdx
+++ b/workflows/observability/logging.mdx
@@ -4,7 +4,7 @@ description: Emit structured task logs and tune how workflow clients export log
icon: rectangle-terminal
---
-Tilebox collects workflow logs automatically. When a task runner is created from a `Client`, logs emitted through the task execution context are exported to Tilebox and correlated with the active job, task, and trace.
+Tilebox collects workflow logs automatically. When a runner is created from a `Client`, logs emitted through the task execution context are exported to Tilebox and correlated with the active job, task, and trace.

@@ -104,7 +104,7 @@ func main() {
client := workflows.NewClient()
runner, err := client.NewTaskRunner(ctx)
if err != nil {
- slog.ErrorContext(ctx, "failed to create task runner", slog.Any("error", err))
+ slog.ErrorContext(ctx, "failed to create runner", slog.Any("error", err))
return
}
@@ -152,7 +152,7 @@ func main() {
```
-The Python `level` argument applies to logs emitted with `context.logger`. The optional `runner_level` argument applies to internal task runner logs. If `runner_level` is omitted, it uses the same value as `level`. In Go, `workflows.ConfigureConsoleLogging()` sets the local console log level, and `workflows.NewClient()` configures Tilebox workflow log export.
+The Python `level` argument applies to logs emitted with `context.logger`. The optional `runner_level` argument applies to internal runner logs. If `runner_level` is omitted, it uses the same value as `level`. In Go, `workflows.ConfigureConsoleLogging()` sets the local console log level, and `workflows.NewClient()` configures Tilebox workflow log export.
## Query logs
diff --git a/workflows/observability/tracing.mdx b/workflows/observability/tracing.mdx
index 7429ae4..3b07e08 100644
--- a/workflows/observability/tracing.mdx
+++ b/workflows/observability/tracing.mdx
@@ -4,7 +4,7 @@ description: Use built-in workflow traces and custom spans to inspect job execut
icon: chart-gantt
---
-Tilebox traces workflow jobs automatically. Job submission creates a root trace, task runners continue that trace across machines, and every task execution creates a span.
+Tilebox traces workflow jobs automatically. Job submission creates a root trace, runners continue that trace across machines, and every task execution creates a span.