Skip to content

fix: fall back to git blobs API for github files >1MB#3

Open
openrijal wants to merge 2 commits into
awecode:mainfrom
openrijal:fix/github-large-file-blob-fallback
Open

fix: fall back to git blobs API for github files >1MB#3
openrijal wants to merge 2 commits into
awecode:mainfrom
openrijal:fix/github-large-file-blob-fallback

Conversation

@openrijal

Copy link
Copy Markdown

Summary

  • getGithubJsonFile in server/utils/githubContents.ts currently throws GitHub response is not a single file with content. whenever the GitHub Contents API returns metadata for a file larger than 1 MB. In that case GitHub responds with type: 'file', a valid sha, and encoding: 'none' — but content is an empty string. The Contents API only inlines content for files ≤1 MB; for files 1–100 MB you must use the Git Blobs API.
  • This patch keeps the small-file fast path and, when content is missing or encoding === 'none', fetches the blob by sha via GET /repos/{owner}/{repo}/git/blobs/{sha} which streams base64 content up to 100 MB. Decoding/parsing remains unchanged.
  • Behavior for files ≤1 MB is identical (no extra request). Files >100 MB are still rejected by GitHub (403) and surfaced as a 4xx error.

Why

A JSON-array resource backed by a single file (e.g. a posts/blogs registry) grows over time. Once it crosses 1 MB the admin list/detail/update endpoints all fail with a confusing 500. This is reproducible against any repo with a >1MB JSON file registered as a json-admin resource.

Test plan

  • Verified against a 5.6 MB blogs.json in a real repo: Contents API returns encoding: 'none', fallback hits Blobs API, decode yields valid parsed JSON (719 array rows).
  • Existing small-file path unchanged — first branch (base64Content truthy and encoding !== 'none') is taken without any extra API call.
  • CI / lint on this repo.

Notes

  • Writes (putGithubJsonFile) still use the Contents API PUT, which accepts files up to ~100 MB but is slower than the Git Data API for very large payloads. Out of scope for this PR; can be revisited if write-side timeouts appear.

The GitHub Contents API only returns the `content` field for files
under 1MB. For files between 1-100 MB it responds with `type: 'file'`
and `encoding: 'none'` but an empty `content`, causing the existing
guard to throw `GitHub response is not a single file with content.`

When `content` is missing, fetch the blob by sha via the Git Blobs API
(`GET /repos/{owner}/{repo}/git/blobs/{sha}`), which streams base64
content up to 100 MB, then decode as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread server/utils/githubContents.ts Outdated
@openrijal

Copy link
Copy Markdown
Author

Pushed 34ddea3 on top of the original fix. Three orthogonal improvements, all opt-in:

1. maxBytes / warnAtBytes size guardrail

New per-resource storage.maxBytes throws a 413 with the actual byte count when a read or write exceeds it. warnAtBytes logs console.warn once per path. Enforced against the Contents API's body.size on reads and Buffer.byteLength(payload.content, 'base64') on writes.

Use it to put a hard ceiling on resources that should never grow unbounded (e.g. maxBytes: 50 * 1024 * 1024 to refuse anything that would push past GitHub's recommended 50 MB), or a soft warning at the 1 MB Blobs-fallback boundary.

2. Opt-in ETag read cache (disabled by default)

An in-process LRU (cap 64) keyed by owner/repo@ref:path stores parsed JSON alongside the response ETag. When enabled, subsequent reads send If-None-Match; GitHub returns 304 without consuming rate-limit budget. Successful and conflicting writes always invalidate the cached entry — that invalidation runs unconditionally, so leaving the gate off is genuinely safe.

Enable globally:

```ts
// nuxt.config.ts
runtimeConfig: { autoadmin: { github: { cacheReads: true } } }
```

Or per-resource via `storage.cacheReads: true`. Default is `false` at every layer — chose opt-in because module-scoped state is undesirable in multi-tenant isolates and a stale read could hide a manual repo edit.

3. Locator-prefixed error messages + narrowing fix

  • Every createError now embeds owner/repo:path[@ref] and, where relevant, the file size or short blob sha. Saves a round-trip to logs when something fails in production.
  • Replaced the post-fallback base64Content! non-null assertion (raised in review feedback) with an explicit narrowing check, and added a clear 422 File ... is empty path so empty files don't surface as the misleading "not valid JSON".

Type changes (additive only)

  • JsonStorageConfig (github variant): adds optional maxBytes, warnAtBytes, cacheReads.
  • GithubJsonRepositoryOptions: same three optional fields plumbed through.
  • getGithubJsonFile / putGithubJsonFile: new optional trailing opts parameter.
  • runtimeConfig.autoadmin.github: adds optional cacheReads.

All existing callers continue to compile unchanged.

Follow-up

Docs PR #5 covers the storage-limit operational story but doesn't yet describe maxBytes/warnAtBytes/cacheReads. Happy to amend that PR with a config table once you're directionally OK with this commit.

Builds on the Blobs API fallback with three operational improvements:

1. `maxBytes` / `warnAtBytes` size guardrail. New per-resource
   `storage.maxBytes` throws a 413 with the actual byte count when a
   read or write exceeds it; `warnAtBytes` logs `console.warn` once per
   path. Enforced against the Contents API's `body.size` on reads and
   `Buffer.byteLength(payload.content, 'base64')` on writes.

2. Locator-prefixed error messages. Every `createError` now embeds
   `owner/repo:path[@ref]` and, where relevant, the file size or short
   blob sha. This matters in serverless logs where the original
   request context is otherwise lost.

3. Explicit narrowing for `base64Content`. Replaces the post-fallback
   non-null assertion with a typed check, and surfaces empty-file
   decode as a clear 422 instead of the previous misleading
   "not valid JSON".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openrijal openrijal force-pushed the fix/github-large-file-blob-fallback branch from 34ddea3 to 8752d3e Compare May 20, 2026 22:54
@openrijal

Copy link
Copy Markdown
Author

Reworked at 8752d3e (force-pushed). On reflection the ETag caching is a meaningful behavior change that deserves its own review, so I moved it to a separate stacked PR: #6.

This PR (#3) now contains only:

  • Blobs API fallback for files >1 MB (the original fix).
  • maxBytes / warnAtBytes size guardrail.
  • Locator-prefixed error messages.
  • Narrowing fix + explicit empty-file 422 path.

Cache moved to #6, stacked on top of this branch. Suggested merge order: this first, then #6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant