Skip to content

fix: preserve utf8 decoding across response chunks#5414

Open
pupuking723 wants to merge 2 commits into
nodejs:mainfrom
pupuking723:fix/utf8-response-decoder-boundaries
Open

fix: preserve utf8 decoding across response chunks#5414
pupuking723 wants to merge 2 commits into
nodejs:mainfrom
pupuking723:fix/utf8-response-decoder-boundaries

Conversation

@pupuking723

Copy link
Copy Markdown

This relates to...

Fixes #5002

Rationale

BodyReadable#setEncoding() only stored the encoding on the readable state. That made streamed consumers decode each buffered chunk independently, so an incomplete UTF-8 sequence at a response chunk boundary produced replacement characters.

The internal body consumers still need raw Buffer chunks so body.text() and body.json() can aggregate and decode the full payload.

Changes

Features

N/A

Bug Fixes

  • Keep raw Buffer chunks for internal body consumption.
  • Use a StringDecoder when BodyReadable is consumed through the Readable API after setEncoding().
  • Add a regression test for async iteration over a response body where a 3-byte UTF-8 character spans response chunks.

Breaking Changes and Deprecations

N/A

Status

  • I have read and agreed to the Developer's Certificate of Origin
  • Tested
  • Benchmarked (optional)
  • Documented
  • Review ready
  • In review
  • Merge ready

Verification:

  • node --test --test-name-pattern "request multibyte (json|text) with setEncoding|async iteration and setEncoding" test/client-request.js
  • npx borp --timeout 180000 -p test/client-request.js
  • npm run lint
  • npm run test:typescript
  • git diff --check

Note: npm ci --ignore-scripts was used locally because npm install exited with "Exit handler never called!" after dependency extraction. The test/lint commands above were run against the installed dependency tree. Local Node.js is v22.17.0 while the package currently requires >=22.19.0.

Signed-off-by: 王胜 <2318857637@qq.com>
@metcoder95 metcoder95 requested a review from ronag June 12, 2026 08:59
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.78261% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.24%. Comparing base (ac5394b) to head (85fc024).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
lib/api/readable.js 84.78% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5414      +/-   ##
==========================================
- Coverage   93.25%   93.24%   -0.02%     
==========================================
  Files         110      110              
  Lines       36825    36868      +43     
==========================================
+ Hits        34340    34376      +36     
- Misses       2485     2492       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mcollina mcollina left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm relatively certain that Readable implements this logic. I think the problem is in the override of setEncoding, we should also call super.setEncoding

Signed-off-by: 王胜 <2318857637@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

setEncoding('utf8') on response body corrupts multi-byte UTF-8 characters at chunk boundaries

4 participants