Skip to content

Phase 1: cluster write-federation - trust multiple writers (edge parse + master-trust script)#73

Open
ehsan6sha wants to merge 4 commits into
mainfrom
phase-1-federation
Open

Phase 1: cluster write-federation - trust multiple writers (edge parse + master-trust script)#73
ehsan6sha wants to merge 4 commits into
mainfrom
phase-1-federation

Conversation

@ehsan6sha

Copy link
Copy Markdown
Member

Part of #72.

What this delivers (standalone, verifiable)

Followers can trust more than one cluster writer, so a 2nd cloud writer's pins are accepted network-wide. Additive + backward-compatible - nothing changes until a 2nd writer is actually added and trusted.

Edge (deploys via OTA - no updater script)

File: docker/fxsupport/linux/ipfs-cluster/ipfs-cluster-container-init.d.sh

  • Parse the new ipfs-cluster-trustedpeers array from the pool-info endpoint (companion: functionland/join-server#2), falling back to the existing single ipfs-cluster-peerid -> identical to today when the API returns only the legacy field.
  • consensus.crdt.trusted_peers is set to the full set; cluster.peer_addresses, the /x/fula-cluster bootstrap and the DNS fallback all use the PRIMARY (first) peer, so single-peer multiaddrs never receive a comma-list (the easy-to-miss wiring bug).
  • Rolls out via watchtower + fula.sh once merged.

Server op (non-OTA; master is systemd-managed)

update-scripts/phase-1-master-trust.sh - appends a new writer peer id to CLUSTER_CRDT_TRUSTEDPEERS in the master systemd unit (both Environment= and ExecStart -e), backs up, daemon-reload + restart, verifies. Additive, idempotent, reversible, halts without NEW_WRITER_PEERID. Datastore/identity/pinset untouched.

Tests (pass locally under WSL bash + jq 1.7)

  • tests/test-cluster-federation-parse.sh - 7/7 (array to csv, fallback, filter empties, primary=first, split to array, split index 0, single to 1-elem array).
  • tests/test-phase-1-master-trust.sh - 6/6 (append to both lines, backup, idempotent no-double-append, halts without peer id, rejects bad id).
  • Both scripts bash -n clean; LF line endings (safe on Linux).

Data-safety

Additive trusted_peers only (pebble/pinset/identity/secret untouched); backups + documented rollback; edge change backward-compatible. Validate on one test device before fleet/OTA rollout.

Still pending in #72 (follow-ups, not in this PR)

  • New-writer setup script - needs the master ipfs.service unit as a template.
  • Master-offline connectivity - a 2nd bootstrap/tunnel so followers reach the new writer when the master is offline (needs per-writer kubo addresses from the pool endpoint + test-device validation). This PR makes followers trust the 2nd writer; full master-offline write-availability is the follow-up.

Generated with Claude Code.

ehsan6sha and others added 4 commits June 1, 2026 23:48
Edge (ipfs-cluster-container-init.d.sh): parse the new `ipfs-cluster-trustedpeers`
array from pools.fx.land/pools/{name}, fall back to the single `ipfs-cluster-peerid`
(backward-compatible), set consensus.crdt.trusted_peers to the full set, and keep the
bootstrap/tunnel/DNS pointed at the PRIMARY (first) peer so single-peer multiaddrs stay
valid. Deploys via OTA (watchtower + fula.sh) - no updater script.

Server op: update-scripts/phase-1-master-trust.sh appends a new writer's peer id to
CLUSTER_CRDT_TRUSTEDPEERS in the master systemd unit (Environment= + ExecStart -e);
additive, idempotent, backs up + restarts + verifies, halts without NEW_WRITER_PEERID.

Tests: tests/test-cluster-federation-parse.sh (jq parse + primary/split, 7/7) and
tests/test-phase-1-master-trust.sh (append to both lines, idempotency, halts, 6/6).

Part of #72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…riter.sh)

Provisions a 2nd ipfs-cluster WRITER on a plain Ubuntu/Debian cloud box (no Fula
/uniondrive layout): installs Docker/curl/jq if missing, paths under /opt/fula-writer,
default kubo datastore (writer stores ~nothing via the tag:group allocator), mirrors the
master cluster env (secret=sha256(clustername), allocator, repl, FOLLOWERMODE=false),
auto-reads the master cluster/kubo identity + bootstrap addr from the pool endpoint,
joins via direct public bootstrap, prints the new cluster + kubo peer ids for
phase-1-master-trust.sh + the pool-server. Dry-run + halts without PUBLIC_HOST.

Tests: tests/test-phase-1-setup-writer.sh (dry-run: input validation, ip4/dns4 announce,
secret derivation, zero side effects - 7/7).

Part of #72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+ .env

Add update-scripts/lib/phase-common.sh - shared helpers for re-runnable phase scripts:
pc_load_env/pc_save_env (persist inputs; a CLI/env value wins over a saved one),
pc_prompt (interactive prompt showing the saved value as default, Enter keeps it;
non-interactive uses env/.env or halts - never guesses), pc_write_if_changed
(rewrite + restart only when the unit actually changed, with backup), detection helpers.

Refactor phase-1-setup-writer.sh + phase-1-master-trust.sh onto the lib: detect what is
already installed and skip/reuse it (Docker, kubo repo, cluster identity), rewrite systemd
units only when changed, prompt for params and remember them in ENV_FILE so a re-run just
updates what is needed.

Tests (all pass under WSL bash): test-phase-common 10/10, test-phase-1-setup-writer 9/9,
test-phase-1-master-trust 7/7 - incl re-run-reuses-saved-value, non-interactive halt, and a
fixed set -u unbound-variable bug in a combined local declaration.

Part of #72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n e2e suite

phase-1-setup-writer.sh fixes found by live e2e on a clean Ubuntu 24.04 box:
- bypass the kubo image auto-init entrypoint (--entrypoint ipfs) — plain
  `docker run ... init` double-inits and fails on a fresh repo
- route `ipfs config` through the running daemon on re-runs (repo lock),
  one-shot otherwise (kubo_cfg helper); read peer id lock-free via jq
tests/e2e/phase-1: isolated-cluster e2e (sim master w/ shifted ports +
trust preservation, REAL setup-writer + master-trust runs, updated+old
followers, drills D0-D4: failover write, mixed-fleet, reconvergence,
idempotent re-runs). Result on test box: 14/14 pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ehsan6sha

Copy link
Copy Markdown
Member Author

Phase 1 e2e — real-daemon acceptance: 14/14 PASS

Ran on a clean Ubuntu 24.04 x86 box (isolated test cluster, never touching prod): simulated master (prod-shaped systemd unit, shifted ports) + this PR's real phase-1-setup-writer.sh and phase-1-master-trust.sh + an updated-config follower (trusts master+writer) + an old-config follower (trusts master only).

Drill Result
D0 4-peer topology converges
D1 pin via master → reaches old AND new followers
D2 master stopped → pin via writer succeeds; updated follower pins it
D2 old follower ignores writer-issued pin but keeps serving existing pins (mixed-fleet/no-forced-upgrade)
D3 master restarts → CRDT reconverges, learns writer-era pin, pinset never shrank
D4 both scripts re-run as no-ops (with daemons running); peerset stable

Fixes the e2e surfaced (in this PR)

  1. kubo image double-init — the official image entrypoint auto-inits an empty repo before the CMD, so docker run … init failed; one-shots now use --entrypoint ipfs.
  2. repo lock on re-runsipfs config one-shots fail while the daemon holds the repo lock; new kubo_cfg routes through the running daemon, and peer id reads use jq on the config file (lock-free).

Finding for a follow-up (not this PR)

docker/fxsupport/linux/.env.cluster sets IPFS_CLUSTER_IPFSPROXY_LISTENMULTIADDRESS — that prefix is not a valid ipfs-cluster env mapping (correct: CLUSTER_IPFSPROXY_LISTENMULTIADDRESS). Harmless today only because the value equals the default.

Unit suites (test-phase-common, test-phase-1-setup-writer, test-phase-1-master-trust): all pass after the fixes.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant