diff --git a/.claude/resources/build-and-images.md b/.claude/resources/build-and-images.md new file mode 100644 index 00000000..6078c261 --- /dev/null +++ b/.claude/resources/build-and-images.md @@ -0,0 +1,47 @@ +# Build system and images + +## rules.mk + +Every env `Makefile` includes the root `rules.mk` (builders include it via `../../rules.mk`). +Defaults to know: + +- `PLATFORMS ?= linux/amd64,linux/arm64` — multi-arch by default. +- `REPO ?= ghcr.io/fission`, `TAG ?= dev`. +- **`DOCKER_FLAGS ?= --push`** — a bare `make` attempts to push to ghcr.io. +- The generic rule is `%-img:` → `docker buildx build $($@-buildargs) --platform=$(PLATFORMS) -t $(REPO)/:$(TAG) $(DOCKER_FLAGS) -f $< .` +- Per-target build args are declared as `-img-buildargs := --build-arg KEY=value`. +- Target-specific platform overrides are supported, e.g. `tensorflow-serving-env-img: PLATFORMS=linux/amd64` (the upstream `tensorflow/serving` image is amd64-only). + +## Local builds + +```sh +cd / && make -img DOCKER_FLAGS=--load PLATFORMS=linux/arm64 +cd /builder/ && make -img DOCKER_FLAGS=--load PLATFORMS=linux/arm64 +``` + +Use a single platform with `--load`; buildx cannot `--load` a multi-arch manifest. +On Apple Silicon use `linux/arm64` for speed; CI builds run on amd64. +All base images used are multi-arch except `tensorflow/serving`. + +## Where build args live (three places, keep in sync) + +The same base-image pin is duplicated in: + +1. `/Makefile` (`-img-buildargs`) +2. `/builder/Makefile` (`-img-buildargs`) — easy to miss; it has bitten before +3. `skaffold.yaml` (the env's build profile `buildArgs`, used by CI) + +Dockerfile `ARG` defaults are a fourth copy in some envs (python). +When bumping a base image, grep for the old value across all of these plus READMEs and `envconfig.json`'s `runtimeVersion`. + +## Build contexts + +- The env image context is `/`; the builder image context is `/builder/`. +- A file needed by both images (e.g. `jvm/install-fission-java-core.sh`) must be physically duplicated into `builder/` — symlinks break because Docker contexts don't follow them out of tree. + Mark such copies with a keep-in-sync header comment. + +## skaffold.yaml + +- Per-env build profiles plus a helm deploy of the fission chart (`remoteChart` URL pins the fission version; chart release tags have no `v` prefix, e.g. `fission-all-1.25.0`). +- CI uses `SKAFFOLD_PROFILE= make skaffold-run`; skaffold with the kind profile loads built images into the kind cluster. +- Some skaffold image names differ from release image names (e.g. profile builds `jvm-jersey-env` while releases publish `jvm-jersey-env-25`). diff --git a/.claude/resources/ci.md b/.claude/resources/ci.md new file mode 100644 index 00000000..2043f472 --- /dev/null +++ b/.claude/resources/ci.md @@ -0,0 +1,42 @@ +# CI + +## Workflow structure + +`.github/workflows/environment.yaml` runs on PRs to master. +A `check` job runs `dorny/paths-filter` (filters in `.github/workflows/filters/filters.yaml`); each env job gates on its filter key, so only changed environments build and test. + +Two kinds of env jobs: + +- **Full e2e** (binary, go, jvm, nodejs, python, python-fastapi, dotnet8): setup-cluster → `SKAFFOLD_PROFILE= make skaffold-run` → `make -test-images` (kind-load) → `make router-port-forward` → `./test_utils/run_test.sh /tests/test_*_env.sh` → fission dump on failure. +- **Build-only** (perl, php7, ruby, tensorflow, jvm-jersey): setup-cluster + `make skaffold-run` only; no functional test. + Compensate with local container smoke tests (specialize + invoke) before pushing changes to these envs. + +Composite actions: `.github/actions/setup-cluster` (helm + `helm/kind-action` with `cluster_name: kind` + fission CLI + skaffold install + crds; version pins live in its input defaults) and `.github/actions/collect-fission-dump` (best-effort by design — must never mask the original failure). + +## Gotchas (each of these caused a real failure) + +1. **E2e tests must pin the local image.** + Test scripts must `export _RUNTIME_IMAGE=` to the kind-loaded name (e.g. `jvm-env`, `go-env`). + The fallback defaults in `test_utils/utils.sh` point at years-stale Docker Hub images (`fission/jvm-env` etc.) and silently test the wrong image. +2. **Workflow-only PRs exercise nothing.** + A PR touching only `.github/` triggers no env jobs, so composite-action changes go unvalidated and can break master for every subsequent run. + Include a small genuine change under one env dir (e.g. a `perl/` README fix) to force one job through the changed path before merging. +3. **Exact-match the filter gates.** + `packages` is a JSON array string; use quoted matches like `contains(needs.check.outputs.packages, '"jvm"')`. + Bare substrings cross-trigger: `jvm` matches `jvm-jersey`, `python` matches `python-fastapi`. + Also: it's `needs.check.outputs.packages` — `needs.check.outputs` alone never matches (historical bug that kept the python job from ever running). +4. **Don't reintroduce action pins that ship without compiled dist.** + `engineerd/setup-kind@v0.6.2` failed with `File not found: dist/main/index.js`; `helm/kind-action` is the maintained replacement. + `hiberbee/github-action-skaffold` pins skaffold 2.3.1 which cannot parse `skaffold/v4beta13` — use `make skaffold-run` instead. + +## Test harness + +- `test_utils/run_test.sh [files...]` runs tests via GNU parallel and aggregates logs; a file containing the line `#test:disabled` is skipped. +- macOS prerequisites: `brew install coreutils findutils gnu-sed parallel` (see `test_utils/init_tools.sh`). +- Some envs have cluster-free `local_test.sh` (binary, nodejs, python, python-fastapi) — run these first, they catch dependency breakage in seconds. + +## Debugging CI + +- `gh run view --log-failed` for failing steps; e2e test output is embedded in the `run_test.sh` log dump. +- Function-level failures need the fission dump artifact (`-fission-dump`); a `test_fn` curl loop timing out (exit 124) usually means the function pod never became ready — check the env image actually used (gotcha 1). +- Local e2e reproduction works with kind + skaffold + fission CLI installed (`make verify-kind-cluster create-crds`, then the same steps as CI). diff --git a/.claude/resources/environment-notes.md b/.claude/resources/environment-notes.md new file mode 100644 index 00000000..edbbcf1d --- /dev/null +++ b/.claude/resources/environment-notes.md @@ -0,0 +1,67 @@ +# Per-environment notes + +State as of the June 2026 dependency-update series (PRs #436–#446, #450–#451). + +## jvm (Spring Boot) + +- Java 25 LTS (eclipse-temurin alpine), Spring Boot 3.5.x, Maven 3.9.x. +- `io.fission:fission-java-core` was only ever published as `0.0.2-SNAPSHOT` to oss.sonatype.org (OSSRH), decommissioned in 2025 — it resolves from **no** remote repository. + `install-fission-java-core.sh` builds it from a pinned commit of [fission/fission-java-libs](https://github.com/fission/fission-java-libs) and installs it locally as `0.0.1`, `0.0.2`, and `0.0.2-SNAPSHOT` (the SNAPSHOT keeps pre-existing user functions building). + The script exists twice (env context and `builder/` context) — keep both copies in sync. + The library's 2018-era pom lacks XML namespace declarations; the script patches the root element before running maven plugins. +- The CI test builds the example jar in a clean maven container, so the test script must run the install script there too. + +## jvm-jersey + +- Jersey 2.x (javax namespace) on Jetty 9.4.x, Java 25; depends on `io.fission:fission-jvm-jersey:0.0.1` which IS on Maven Central (unlike fission-java-core). +- Image names carry the Java version suffix (`jvm-jersey-env-25`, `jvm-jersey-builder-25`); renaming requires touching Makefile target names, envconfig, and the fission.io catalog. + +## python / python-fastapi + +- `python:3.13-alpine`; Flask 3.x + bjoern/gevent, FastAPI + uvicorn. +- bjoern needs libev headers (alpine: in image; macOS local: `brew install libev` with `CFLAGS=-I/opt/homebrew/include LDFLAGS=-L/opt/homebrew/lib`). +- `flask_sockets.py` is vendored (upstream dead); Werkzeug ≥2.3 moved `parse_cookie` to `werkzeug.sansio.http`. +- Local tests: `USERFUNCVOL=/tmp RUNTIME_PORT= ./tests/local_test.sh`, then repeat with `WSGI_FRAMEWORK=GEVENT` — the gevent path exercises the fragile websocket stack. + +## nodejs + +- Three image flavours from one Dockerfile via `NODE_BASE_IMG`: `node-env` + `node-env-22` (alpine) and `node-env-debian`. +- ESM-first server with CJS support; `test/local_test.sh` covers both loaders. +- The Dockerfile copies only `package.json` (not the lockfile), so images resolve dependency floors at build time — lockfile refreshes need a version bump + rebuild to reach the published image. + +## go + +- Versioned image pair (`go-env-1.xx`, `go-builder-1.xx`) plus unversioned aliases; bump = rename targets/images in `go/Makefile`, `go/builder/Makefile`, `envconfig.json`, `skaffold.yaml`, the example spec, and the fission.io catalog. +- Plugin model: function `.so` must be built with the exact toolchain of the env server — env and builder share `GO_VERSION`. + +## binary + +- Alpine + a small Go server executing arbitrary binaries; `go mod init`/`tidy` at image build (stdlib only). + +## ruby + +- `ruby:3.4-alpine`; Rack pinned `~> 2.2` (Rack 3 removed `Rack::Handler`, which `server.rb` uses via thin). +- Regenerate `Gemfile.lock` inside the target container and `bundle lock --add-platform` for both gnu and musl, amd64 and arm64. +- Builder uses bundler deployment config (`bundle config set --local deployment true`), not the deprecated `--deployment` flag. + +## php7 (directory name kept; runs PHP 8.3) + +- `php:8.3-alpine`, react/http 1.x, Monolog 3, php-parser 5. +- Only compile extensions NOT bundled with the official image; rebuilding bundled exts (e.g. iconv) fails on musl. + `json` is core; `xmlrpc`/`mcrypt` were removed from PHP 8 and their PECL ports are unmaintained. +- The directory and image name stay `php7`/`php-env` — path filters and release derivation depend on them. + +## perl + +- Pinned `perl:5.42`; Dancer2 + Twiggy; v1 specialize only. + +## tensorflow-serving + +- Pinned `tensorflow/serving` tag; upstream publishes **amd64 only** — the Makefile target overrides `PLATFORMS=linux/amd64`. +- Go proxy built with modules initialized at build time (`pkg/errors`, `zap`). + +## dotnet / dotnet20 (frozen legacy) + +- .NET Core 1.1 / 2.0, both EOL years ago, on the removed `microsoft/dotnet` Docker Hub images. +- Intentionally untouched; their release matrix legs fail if a reconcile run picks them up — expected. +- `dotnet8/` is the supported .NET path. diff --git a/.claude/resources/release-process.md b/.claude/resources/release-process.md new file mode 100644 index 00000000..8b5ec52c --- /dev/null +++ b/.claude/resources/release-process.md @@ -0,0 +1,42 @@ +# Release process + +## Version-bump-driven releases + +1. Bump `version` in `/envconfig.json` (this is the image tag to publish). +2. Run `make update-env-json` — sorts every `envconfig.json` (jq) and regenerates the root `environments.json`. + Never hand-edit `environments.json`; commit the regenerated file with the bump. +3. On merge to master, `.github/workflows/release.yaml` (path filter `**/envconfig.json**`) runs `hack/release_check.py`, which emits a matrix of every `image:version` not yet on ghcr.io; the workflow's `docker-buildx-push` job then runs `TAG= make -img` (and `-img` in `builder/`) plus a `latest` push for each matrix entry. + +Image content changes without an envconfig bump never release — if a merged change should reach the published image (e.g. a lockfile refresh), follow up with a version-bump PR. +Conversely, examples/ and docs changes don't need a bump (they're not in the image). + +## release_check.py semantics + +- Checks GHCR via the v2 API with an anonymous bearer token. +- Token endpoint 401/403/404 ⇒ the package doesn't exist yet ⇒ release needed (GHCR refuses tokens for unknown packages — this is the first-release path for renamed/new images). +- tags/list 200 ⇒ skip if tag present; 404 ⇒ release; anything else raises (fail-closed so registry hiccups can't trigger mass re-pushes of `latest`). +- Outputs go to `$GITHUB_OUTPUT`; `release_needed` is lowercase `true`/`false` and release.yaml gates on `== 'true'` — keep these in sync. +- **Reconcile mode**: invoked with no package list (e.g. `gh workflow run release.yaml`), it scans every `*/envconfig.json` and releases anything unpublished. + Use this to backfill after a failed release run. + Expect the legacy `dotnet`/`dotnet20` matrix legs to fail (EOL bases); `fail-fast: false` keeps other legs going. +- Testable locally: `GITHUB_OUTPUT= python3 hack/release_check.py '[python,go]'` (needs `requests`). + +## Multi-PR trains + +Every env PR rewrites the generated `environments.json`, so PRs in a series conflict with each other on that file. +Merge serially; for each next PR: `git merge origin/master`, run `make update-env-json` to resolve the conflict canonically, `git add environments.json`, commit, push, wait for green, merge. +Take master's side for workflow-file conflicts when master's change is a superset of the branch's. + +## Version pin locations for the Fission version + +The Fission version string must be bumped together in four places (note the differing key names — grepping for `FISSION_VERSION` alone misses two): +`FISSION_VERSION` in `rules.mk` and in `environment.yaml`'s `env`, the `fission-cli-version` input default in `setup-cluster/action.yml`, and the hardcoded skaffold `remoteChart` URL (chart tags have no `v` prefix: `fission-all-1.25.0`). + +## Downstream: fission.io website + +The site mirrors this repo's catalog. +After image renames, new environments, or removals, sync the site repo (it has a `updating-environments-and-examples` skill): + +- `static/data/environments.json` stores image/builder *names* only (not versions) — only name changes matter there. +- `tools/environments.py` regenerates it from this repo's manifest and is keyed by image name (display names are not unique — both jvm and jvm-jersey report "JVM Environment"). +- Docs pages may embed versioned image names in examples (grep for `go-env-1.`, old runtime versions); leave historical release-notes pages untouched. diff --git a/.claude/resources/runtime-architecture.md b/.claude/resources/runtime-architecture.md new file mode 100644 index 00000000..25002aa5 --- /dev/null +++ b/.claude/resources/runtime-architecture.md @@ -0,0 +1,41 @@ +# Runtime architecture + +## The environment contract + +Every environment is an HTTP server listening on **port 8888** that fission's fetcher/executor drives: + +- `POST /specialize` (v1): load user code from the fixed path `/userfunc/user`. +- `POST /v2/specialize`: JSON body `{"filepath": "...", "functionName": "..."}`; `filepath` may be a single file or a directory (built package). +- All subsequent requests on `/` (any method) are routed to the loaded user function. +- Most servers also expose `GET /healthz`. + +A container specializes exactly once; pool manager replaces pods rather than re-specializing. +Unspecialized containers return an error on `/` ("Container not specialized" or similar) — that response is the expected pre-specialization behaviour, not a bug. + +## functionName semantics differ per language + +- **jvm**: fully-qualified class name implementing `io.fission.Function` (e.g. `io.fission.HelloWorld`). +- **ruby**: the *method* name defined by the loaded file(s), e.g. `handler` — NOT the filename. + Passing a filename makes `method(func)` raise and specialize returns 500. +- **php**: `module::function` (e.g. `hellopsr.php::handler`). + Without the `::` divider the env enters legacy echo mode: the file is `require`d and its buffered output is returned as the response body. +- **python**: `module.function` style handled by the server's module loader. +- **go**: entrypoint symbol in a Go plugin (`.so`) — see toolchain note below. + +## Builder contract + +Builder images run `/usr/local/bin/build` (from `build.sh`/`defaultBuildCmd`) with `SRC_PKG` and `DEPLOY_PKG` env vars, transforming a source package into a deploy package. +Examples: maven `package` (jvm), `bundle install` with deployment config (ruby), `composer install` (php), `pip install -r requirements.txt -t` (python), go plugin build. + +## Language-specific runtime notes + +- **go**: functions are Go plugins; the function build toolchain MUST exactly match the env server's toolchain. + Env and builder Dockerfiles share the `GO_VERSION` build arg — always bump them together. +- **jvm**: depends on `io.fission:fission-java-core`, which is not resolvable from any remote repository (see environment-notes.md); it is built from source by `install-fission-java-core.sh` in the env image, the builder image (pre-seeded `/root/.m2`), and the CI test container. +- **ruby**: `fission/specializer.rb` loads vendored gems from `vendor/bundle/ruby/*/gems/*/lib` and native extensions via a platform-wildcard glob (images are musl, amd64+arm64 — never hardcode a platform dir). + Stay on Rack 2.2.x: `server.rb` uses `Rack::Handler::Thin`, removed in Rack 3. +- **php**: react/http 1.x `HttpServer` with the auto-run global loop; uncaught handler Throwables become 500s and the process keeps serving (an `error` listener logs them). + `ob_start` must be balanced on every early-return path — the process is long-running, leaked buffer levels accumulate and can corrupt later echo-mode responses. +- **python**: serves via bjoern by default or gevent (`WSGI_FRAMEWORK=GEVENT`), with vendored `flask_sockets.py` for websockets (Werkzeug 3 moved `parse_cookie` to `werkzeug.sansio.http`). +- **perl**: Dancer2 + Twiggy; only `/specialize` (v1) and `/` routes. +- **tensorflow-serving**: a Go proxy (`server.go`) in front of `tensorflow_model_server`; built with go modules initialized at image build time. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..e127799f --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,45 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What this repo is + +Language runtime environments for [Fission](https://fission.io) (Kubernetes serverless framework). +Each top-level directory (`go/`, `python/`, `nodejs/`, `jvm/`, `binary/`, `dotnet8/`, etc.) is one self-contained environment that produces Docker images published to `ghcr.io/fission`. + +Every environment follows the same layout: `server.*` (the runtime HTTP server), `Dockerfile` + `Makefile` (runtime image), optional `builder/` (builder image with `build.sh`), `envconfig.json` (metadata; the `version` field drives releases), `examples/`, and `tests/` or `test/`. + +## Quick commands + +```sh +# Local image build (a bare `make` tries to PUSH multi-arch to ghcr.io!) +cd python/ && make DOCKER_FLAGS=--load PLATFORMS=linux/arm64 + +# Cluster-free unit tests (binary, nodejs, python, python-fastapi) +cd nodejs/ && ./test/local_test.sh + +# E2e against a kind cluster (envs with e2e jobs) +SKAFFOLD_PROFILE=python make skaffold-run +make python-test-images router-port-forward +./test_utils/run_test.sh ./python/tests/test_python_env.sh + +# After any envconfig.json change (never hand-edit environments.json) +make update-env-json +``` + +## Detailed guides + +Read the relevant file before working in that area: + +- [.claude/resources/build-and-images.md](.claude/resources/build-and-images.md) — make/buildx system, where build args live (and drift between), multi-arch rules, local build recipes. +- [.claude/resources/runtime-architecture.md](.claude/resources/runtime-architecture.md) — the specialize protocol (v1/v2), per-language entrypoint semantics, builder contract. +- [.claude/resources/ci.md](.claude/resources/ci.md) — workflow structure, path-filter gotchas, how e2e tests pick images, debugging CI failures. +- [.claude/resources/release-process.md](.claude/resources/release-process.md) — version-bump-driven releases, the GHCR gate, reconcile mode, multi-PR trains. +- [.claude/resources/environment-notes.md](.claude/resources/environment-notes.md) — per-environment quirks and history (jvm's vendored dependency, EOL legacy dotnet, amd64-only tensorflow, etc.). + +## Hard rules + +- `environments.json` is generated — regenerate with `make update-env-json`, never edit by hand. +- Bumping `version` in any `envconfig.json` triggers an image release when merged to master. +- Build args are duplicated across each env `Makefile`, its `builder/Makefile`, and `skaffold.yaml` — update all three together. +- The fission.io website mirrors `environments.json`; after image renames or new environments, sync the site (see release-process.md).