fix: fall back to git blobs API for github files >1MB#3
Conversation
The GitHub Contents API only returns the `content` field for files
under 1MB. For files between 1-100 MB it responds with `type: 'file'`
and `encoding: 'none'` but an empty `content`, causing the existing
guard to throw `GitHub response is not a single file with content.`
When `content` is missing, fetch the blob by sha via the Git Blobs API
(`GET /repos/{owner}/{repo}/git/blobs/{sha}`), which streams base64
content up to 100 MB, then decode as before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 1.
|
Builds on the Blobs API fallback with three operational improvements: 1. `maxBytes` / `warnAtBytes` size guardrail. New per-resource `storage.maxBytes` throws a 413 with the actual byte count when a read or write exceeds it; `warnAtBytes` logs `console.warn` once per path. Enforced against the Contents API's `body.size` on reads and `Buffer.byteLength(payload.content, 'base64')` on writes. 2. Locator-prefixed error messages. Every `createError` now embeds `owner/repo:path[@ref]` and, where relevant, the file size or short blob sha. This matters in serverless logs where the original request context is otherwise lost. 3. Explicit narrowing for `base64Content`. Replaces the post-fallback non-null assertion with a typed check, and surfaces empty-file decode as a clear 422 instead of the previous misleading "not valid JSON". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34ddea3 to
8752d3e
Compare
|
Reworked at This PR (#3) now contains only:
Cache moved to #6, stacked on top of this branch. Suggested merge order: this first, then #6. |
Summary
getGithubJsonFileinserver/utils/githubContents.tscurrently throwsGitHub response is not a single file with content.whenever the GitHub Contents API returns metadata for a file larger than 1 MB. In that case GitHub responds withtype: 'file', a validsha, andencoding: 'none'— butcontentis an empty string. The Contents API only inlinescontentfor files ≤1 MB; for files 1–100 MB you must use the Git Blobs API.contentis missing orencoding === 'none', fetches the blob by sha viaGET /repos/{owner}/{repo}/git/blobs/{sha}which streams base64 content up to 100 MB. Decoding/parsing remains unchanged.Why
A JSON-array resource backed by a single file (e.g. a posts/blogs registry) grows over time. Once it crosses 1 MB the admin list/detail/update endpoints all fail with a confusing 500. This is reproducible against any repo with a
>1MBJSON file registered as a json-admin resource.Test plan
blogs.jsonin a real repo: Contents API returnsencoding: 'none', fallback hits Blobs API, decode yields valid parsed JSON (719 array rows).base64Contenttruthy andencoding !== 'none') is taken without any extra API call.Notes
putGithubJsonFile) still use the Contents API PUT, which accepts files up to ~100 MB but is slower than the Git Data API for very large payloads. Out of scope for this PR; can be revisited if write-side timeouts appear.