Skip to content

BrachioLab/kai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 

Repository files navigation

kai

CLI for submitting and managing jobs on a KAI Scheduler cluster.

Getting started

Step 1 — Install kai

sh -c "$(curl -fsSL https://raw.githubusercontent.com/BrachioLab/kai/main/install.sh)"

You will be prompted for the configs repository and your lab namespace. The installer checks that your account exists in the repo before proceeding — if it doesn't, ask your lab manager to run kai add-user --name <you> first.

The installer also sets up automatic update checks on every login.

Then start a new shell (or run source ~/.bashrc / source ~/.zshrc) so kai is on your PATH.

Step 2 — Get your kubeconfig from the lab manager

Your lab manager will send you kai-kubeconfig-<you>.yaml via a secure channel (Slack DM, encrypted email, etc.). Keep this secret — treat it like a password.

Step 3 — Set up kai

kai setup kai-kubeconfig-<you>.yaml

This installs your kubeconfig and fetches your CLI config automatically from the configs repo.

That's it — you're ready to submit jobs.


Submitting jobs

# Run a script on 1 GPU
kai submit --image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime --gpu 1 -- python train.py

# Run on a specific node
kai submit --image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime --gpu 2 --node carnaroli -- torchrun --nproc=2 train.py

# Interactive session (opens a shell inside the container)
kai submit --image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime --gpu 1 --interactive

# Mount a local directory
kai submit --image pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime --gpu 1 -v /data/datasets:/data -- python train.py

Managing jobs

kai list                  # show all your jobs and their status
kai logs <job>            # print recent logs
kai logs <job> -f         # stream logs live
kai bash <job>            # open an interactive bash shell inside a running job
kai describe <job>        # detailed job info and events
kai delete <job>          # cancel and remove a job

Cluster info

kai gpus                  # GPU availability across all nodes
kai status                # all resources in your namespace
kai queue list            # available queues and their GPU quotas

Updates

kai checks for updates automatically on every login. You will be prompted before anything is applied. To check manually:

kai update                # check for a config update
kai self-update           # check for a kai binary update

To apply without being prompted (e.g. in a script):

kai update --force
kai self-update --force

Troubleshooting

kai: command not found — run source ~/.bashrc (or ~/.zshrc) to pick up the PATH change from the installer, or start a new terminal.

error: namespace not set — you haven't run kai setup yet, or the config file wasn't found at ~/.kai/config.yaml.

error: unable to connect to cluster — your kubeconfig may be missing or expired. Ask your lab manager for a new one and re-run kai setup.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors