fix: fall back to git blobs API for github files >1MB by openrijal · Pull Request #3 · awecode/autoadmin

openrijal · 2026-05-20T14:38:37Z

Summary

getGithubJsonFile in server/utils/githubContents.ts currently throws GitHub response is not a single file with content. whenever the GitHub Contents API returns metadata for a file larger than 1 MB. In that case GitHub responds with type: 'file', a valid sha, and encoding: 'none' — but content is an empty string. The Contents API only inlines content for files ≤1 MB; for files 1–100 MB you must use the Git Blobs API.
This patch keeps the small-file fast path and, when content is missing or encoding === 'none', fetches the blob by sha via GET /repos/{owner}/{repo}/git/blobs/{sha} which streams base64 content up to 100 MB. Decoding/parsing remains unchanged.
Behavior for files ≤1 MB is identical (no extra request). Files >100 MB are still rejected by GitHub (403) and surfaced as a 4xx error.

Why

A JSON-array resource backed by a single file (e.g. a posts/blogs registry) grows over time. Once it crosses 1 MB the admin list/detail/update endpoints all fail with a confusing 500. This is reproducible against any repo with a >1MB JSON file registered as a json-admin resource.

Test plan

Verified against a 5.6 MB blogs.json in a real repo: Contents API returns encoding: 'none', fallback hits Blobs API, decode yields valid parsed JSON (719 array rows).
Existing small-file path unchanged — first branch (base64Content truthy and encoding !== 'none') is taken without any extra API call.
CI / lint on this repo.

Notes

Writes (putGithubJsonFile) still use the Contents API PUT, which accepts files up to ~100 MB but is slower than the Git Data API for very large payloads. Out of scope for this PR; can be revisited if write-side timeouts appear.

The GitHub Contents API only returns the `content` field for files under 1MB. For files between 1-100 MB it responds with `type: 'file'` and `encoding: 'none'` but an empty `content`, causing the existing guard to throw `GitHub response is not a single file with content.` When `content` is missing, fetch the blob by sha via the Git Blobs API (`GET /repos/{owner}/{repo}/git/blobs/{sha}`), which streams base64 content up to 100 MB, then decode as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

openrijal · 2026-05-20T22:48:16Z

Pushed 34ddea3 on top of the original fix. Three orthogonal improvements, all opt-in:

1. `maxBytes` / `warnAtBytes` size guardrail

New per-resource storage.maxBytes throws a 413 with the actual byte count when a read or write exceeds it. warnAtBytes logs console.warn once per path. Enforced against the Contents API's body.size on reads and Buffer.byteLength(payload.content, 'base64') on writes.

Use it to put a hard ceiling on resources that should never grow unbounded (e.g. maxBytes: 50 * 1024 * 1024 to refuse anything that would push past GitHub's recommended 50 MB), or a soft warning at the 1 MB Blobs-fallback boundary.

2. Opt-in ETag read cache (disabled by default)

An in-process LRU (cap 64) keyed by owner/repo@ref:path stores parsed JSON alongside the response ETag. When enabled, subsequent reads send If-None-Match; GitHub returns 304 without consuming rate-limit budget. Successful and conflicting writes always invalidate the cached entry — that invalidation runs unconditionally, so leaving the gate off is genuinely safe.

Enable globally:

```ts
// nuxt.config.ts
runtimeConfig: { autoadmin: { github: { cacheReads: true } } }
```

Or per-resource via `storage.cacheReads: true`. Default is `false` at every layer — chose opt-in because module-scoped state is undesirable in multi-tenant isolates and a stale read could hide a manual repo edit.

3. Locator-prefixed error messages + narrowing fix

Every createError now embeds owner/repo:path[@ref] and, where relevant, the file size or short blob sha. Saves a round-trip to logs when something fails in production.
Replaced the post-fallback base64Content! non-null assertion (raised in review feedback) with an explicit narrowing check, and added a clear 422 File ... is empty path so empty files don't surface as the misleading "not valid JSON".

Type changes (additive only)

JsonStorageConfig (github variant): adds optional maxBytes, warnAtBytes, cacheReads.
GithubJsonRepositoryOptions: same three optional fields plumbed through.
getGithubJsonFile / putGithubJsonFile: new optional trailing opts parameter.
runtimeConfig.autoadmin.github: adds optional cacheReads.

All existing callers continue to compile unchanged.

Follow-up

Docs PR #5 covers the storage-limit operational story but doesn't yet describe maxBytes/warnAtBytes/cacheReads. Happy to amend that PR with a config table once you're directionally OK with this commit.

@ref

Builds on the Blobs API fallback with three operational improvements: 1. `maxBytes` / `warnAtBytes` size guardrail. New per-resource `storage.maxBytes` throws a 413 with the actual byte count when a read or write exceeds it; `warnAtBytes` logs `console.warn` once per path. Enforced against the Contents API's `body.size` on reads and `Buffer.byteLength(payload.content, 'base64')` on writes. 2. Locator-prefixed error messages. Every `createError` now embeds `owner/repo:path[@ref]` and, where relevant, the file size or short blob sha. This matters in serverless logs where the original request context is otherwise lost. 3. Explicit narrowing for `base64Content`. Replaces the post-fallback non-null assertion with a typed check, and surfaces empty-file decode as a clear 422 instead of the previous misleading "not valid JSON". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

openrijal · 2026-05-20T22:59:42Z

Reworked at 8752d3e (force-pushed). On reflection the ETag caching is a meaningful behavior change that deserves its own review, so I moved it to a separate stacked PR: #6.

This PR (#3) now contains only:

Blobs API fallback for files >1 MB (the original fix).
maxBytes / warnAtBytes size guardrail.
Locator-prefixed error messages.
Narrowing fix + explicit empty-file 422 path.

Cache moved to #6, stacked on top of this branch. Suggested merge order: this first, then #6.

openrijal commented May 20, 2026

View reviewed changes

Comment thread server/utils/githubContents.ts Outdated

This was referenced May 20, 2026

docs: operational gaps for JSON admin (storage limits, serverless deploy, error reference) #4

Open

docs: storage limits, Cloudflare deploy, and storage error reference #5

Open

openrijal force-pushed the fix/github-large-file-blob-fallback branch from 34ddea3 to 8752d3e Compare May 20, 2026 22:54

openrijal mentioned this pull request May 20, 2026

feat: opt-in ETag read cache for github storage #6

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fall back to git blobs API for github files >1MB#3

fix: fall back to git blobs API for github files >1MB#3
openrijal wants to merge 2 commits into
awecode:mainfrom
openrijal:fix/github-large-file-blob-fallback

openrijal commented May 20, 2026

Uh oh!

Uh oh!

openrijal commented May 20, 2026

Uh oh!

openrijal commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

openrijal commented May 20, 2026

Summary

Why

Test plan

Notes

Uh oh!

Uh oh!

openrijal commented May 20, 2026

1. maxBytes / warnAtBytes size guardrail

2. Opt-in ETag read cache (disabled by default)

3. Locator-prefixed error messages + narrowing fix

Type changes (additive only)

Follow-up

Uh oh!

openrijal commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `maxBytes` / `warnAtBytes` size guardrail