Skip to content

Cut real disk per kson checkout#394

Open
holodorum wants to merge 6 commits into
kson-org:mainfrom
holodorum:cache-kson
Open

Cut real disk per kson checkout#394
holodorum wants to merge 6 commits into
kson-org:mainfrom
holodorum:cache-kson

Conversation

@holodorum

@holodorum holodorum commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

This fixes the three biggest disk hogs at the source. Numbers are real free-space deltas (df), not du — on APFS du overcounts reflinked files.

Changes

  • pnpm for the standalone Node/TS toolinglanguage-server-protocol and the lsp-clients workspace move to pnpm with node-linker=hoisted (flat, npm-identical layout; hardlinked from the shared global store). A second checkout's node_modules costs ~10M instead of ~374M.
  • JSON conformance suite — gated off non-test builds (consumers building -x test no longer fetch it) and shrunk via a sparse/treeless git fetch of only the two dirs we read: 268M → ~5M for test runs.
  • VS Code test download — shared via VSCODE_TEST_CACHE (OS-default cache dir) instead of a per-checkout .vscode-test/, so all checkouts/CI reuse one download (~150M once).

Supporting

  • CircleCI caches the pnpm store and the VS Code download.
  • Short --user-data-dir for the VS Code desktop test (fits macOS's 103-char socket limit in deep checkouts).
  • Fixes a latent bug: dist-iframe was missing from monaco's files, so the iframe assets never shipped.

Testing

./gradlew check / allTests green (JVM + JS); the built vsix is byte-identical to the npm baseline; gating verified. The Kotlin-compiled JS (yarn-bound) is untouched.

Both test entry points called downloadAndUnzipVSCode('stable') with no
cachePath, so @vscode/test-electron materialized a full ~100-150M VS Code
into a per-checkout .vscode-test/ for every checkout and CI run. Switch to
the options-object form with a cachePath resolved from VSCODE_TEST_CACHE,
falling back to an OS-appropriate user cache dir, so multiple checkouts and
CI reuse a single shared download.
The ~257M JSONTestSuite / JSON-Schema-Test-Suite repos were cloned on
essentially every build because CleanGitCheckout clones in its constructor
and GenerateJsonTestSuiteTask constructed the checkouts in its init {} (so
the clone happened at configuration time), while a universal withType<Task>
dependsOn made every task depend on the generator. Since the 84 generated
test files are git-tracked, the clone is only ever needed to regenerate
them. Defer constructing the checkouts and generator into the @TaskAction
so the clone happens only when the task runs, compute the @OutputDirectory
from the source root + package alone (no checkout), and wire the generator
solely onto the test-compile tasks (compileTestKotlinJvm/Js, hence
jvmTest/jsTest/allTests/check) instead of every task. Downstream consumers
building assemble / -x test no longer pay the clone; local and CI test runs
still get a fresh, verified suite.
Switch the standalone Node/TS projects under tooling/ (language-server-protocol
and the lsp-clients workspace + monaco demos) from npm to pnpm so node_modules is
hardlinked from pnpm's shared global store instead of being a full per-checkout
copy. A second checkout's lsp-clients install now costs ~10M of real disk (df
free-space delta) against a ~374M apparent node_modules, versus ~364M cold.

These projects are viable for pnpm precisely because they are NOT Kotlin-managed:
they carry only the `base` Gradle plugin and run installs through plain
PixiExecTask wrappers, so the package manager is a free choice. The Kotlin-compiled
JS (build/js, kotlin-js-store/yarn.lock) is untouched and out of scope.

node-linker=hoisted keeps a flat node_modules byte-for-byte the same layout as
npm, so esbuild/vite/vitest/vsce/@vscode/test-electron all see what they expect
while still hardlinking from the store. Under hoisted, file: directory deps are
PACKED per the files/exports allowlist (not symlinked to source), which actually
improves the demos' "third-party consumer" fidelity over npm's symlink — a demo
now sees only what a real downstream `npm install` would ship. The demos are kept
OUT of the pnpm workspace (installed with --ignore-workspace) so that packed
consumption is preserved; listing them would relink @kson/monaco-editor to source.

Packing surfaced a latent gap: the iframe demo serves @kson/monaco-editor/dist-iframe
but dist-iframe was missing from monaco's files allowlist, so it never actually
shipped (npm's symlink hid this). Added dist-iframe to files so the iframe assets
ship for real downstream consumers too.

pnpm is added to the lsp-clients and language-server-protocol pixi envs (not
kson-lib's, which those builds never use) so `pixiw run pnpm` resolves
reproducibly. Per the JSON-nit preference, the only new non-JSON files are the
pnpm-mandated pnpm-workspace.yaml and pnpm-lock.yaml; build-script approvals live
in package.json via pnpm.onlyBuiltDependencies. Gradle task names are unchanged.

Also corrects a stale command in language-server-protocol/README.md (npm run
build -> pnpm run compile): the referenced `build` script never existed, so the
old line would have failed.
Adds a pnpm global-store cache (keyed on both pnpm-lock.yaml files) so CI
reuses the store across runs, and fixes the existing VS Code cache. The VS
Code cache is repointed to the electron download directories that
vscodeTestCache.ts actually uses (the OS defaults under ~/.cache/vscode-test
and ~/Library/Caches/vscode-test, plus the web variant), dropping the
now-empty .vscode-test path left behind when the download moved to a shared
OS cache dir. Its key is bumped to v2 because CircleCI caches are immutable
per key, so the changed paths would otherwise never be saved. config.yml is
generated from config.kson by transpileCircleCiConfigTask; both are committed
together.
The desktop test (runNodeTests) left VS Code's --user-data-dir at its
default in-repo .vscode-test/user-data, whose IPC socket path overflows
macOS's 103-char AF_UNIX limit in deeply nested checkouts (e.g. git
worktrees), failing the run with `listen EINVAL`. Point it at a short
os.tmpdir() path, mirroring runExtension.ts, so the test runs regardless
of checkout depth. CI is unaffected (paths are short there already).
The full clone pulled ~268M per checkout (mostly JSONTestSuite's .git history
and unused parsers/ binaries) only to use ~5M of it: test_parsing from
JSONTestSuite and tests from JSON-Schema-Test-Suite. A sparse + treeless
(--filter=blob:none) + shallow (--depth 1) git fetch now pulls only those
subdirectories, taking each checkout from ~268M to ~5M (JSONTestSuite 251M->2.0M,
JSON-Schema-Test-Suite 5.8M->3.4M) and leaving the generated tests byte-identical.

Implemented as an optional sparse mode on CleanGitCheckout: passing sparsePaths
switches it from the JGit full clone to a system-git-CLI sparse fetch, while the
default (empty) keeps the existing JGit behavior untouched. The two suite classes
pass their subdir and keep the same constructor signatures and checkoutDir, so the
generator and its wiring are unchanged. The sparse paths are anchored (/path/)
since this git is non-cone, so a nested dir of the same name upstream could never
be pulled in. Like the JGit path, the sparse mode honors the clean-checkout
contract: it verifies the working tree with `git status` and throws
DirtyRepoException on a modified checkout rather than silently reusing or nuking
it; only a clean checkout that is missing, broken, or at the wrong SHA is
refetched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant