http.BodyWriter: handle EOF in chunkedSendFile, simplify#24864
Conversation
| sendfile_limit.min(limit), | ||
| ) catch |err| switch (err) { | ||
| error.EndOfStream => { | ||
| chunked.* = .{ .chunk_len = 0 }; |
There was a problem hiding this comment.
shouldn't this be based on observing how many bytes were read from file_reader? something like const before = file_reader.logicalPos(); ...; const after = file_reader.logicalPos(); const n = after - before;
and then we still would need to do the w.consume(n) logic
There was a problem hiding this comment.
Hmm, I guess that would potentially give us more robust assertions but I'm not sure there would be any behavioral difference. My understanding of the sendFile interface is that EndOfStream is only returned by the writer when all bytes have been read from the reader.
Perhaps I've misunderstood that though? I'll defer to your judgement here, I'm still learning the new interfaces (which is why I thought it would be nice to fix some bugs to enhance my understanding).
There was a problem hiding this comment.
before you change it, let me think about it a bit more...
There was a problem hiding this comment.
My understanding of the sendFile interface is that EndOfStream is only returned by the writer when all bytes have been read from the reader.
That's correct:
pub const FileError = error{
...
/// Reached the end of the file being read.
EndOfStream,
...
};The question is whether chunk_len might be greater than the size of the file. We can see it is calculated like this:
const data_len = Writer.countSendFileLowerBound(w.end, file_reader, limit) orelse {
// If the file size is unknown, we cannot lower to a `sendFile` since we would
// have to flush the chunk header before knowing the chunk length.
return error.Unimplemented;
};...which is based on reported size from stat combined with any already-buffered data.
In theory, larger chunk sizes could be emitted if the stream implementation somehow knew there would be more data after this, but in practice, it does not. So I think your direct setting of chunk_len to zero is indeed behaviorally equivalent.
However, still 2 problems:
- this is nontrivial and so should be asserted that these values equal each other
- the consume() call still needs to happen to account for consumed buffered data
There was a problem hiding this comment.
Hmmmm, thinking about this even more... why does sendFileHeader have EndOfStream in the error set? I think that should be handled in that function and then return how many bytes were read, since it already supports short reads.
That's the difference between the "limit" functions and "exact" functions - "limit" functions can return short and do not EndOfStream. "exact" functions take an integer and return EndOfStream if that many bytes cannot be read.
There was a problem hiding this comment.
Hmm, the doc comment for the sendFile function in the vtable seems to suggest that error.EndOfStream was not the intended way to signal stream end originally:
/// Number of bytes returned may be zero, which does not indicate stream
/// end. A subsequent call may return nonzero, or signal end of stream via
/// `error.WriteFailed`. Caller may check `file_reader` state
/// (`File.Reader.atEnd`) to disambiguate between a zero-length read or
/// write, and whether the file reached the end.
I'm going to experiment with deleting EndOfStream from Writer.FileError.
0e0277c to
885e737
Compare
With these changes, the `zig std` command now works again and doesn't trigger assertion failures or mess up the chunked transfer encoding.
|
@andrewrk Let me know what you think of this version, it now fixes all known regressions with the I explored deleting EndOfStream from all error sets in Writer but backed out the change for this branch as the patch got too invasive and it did not seem like a good idea to lump in with a bugfix. Tests pass with |
andrewrk
left a comment
There was a problem hiding this comment.
I'm not convinced this code is in its final form, but this is clearly a step in the right direction, so let's merge it!
| chunked.chunk_len = chunk_len - n; | ||
| return w.consume(n); | ||
| }, | ||
| 1 => unreachable, |
There was a problem hiding this comment.
is this really unreachable? it looks like it could be reached from line 970 setting chunk_len to 1, or line 987 setting chunk_len to 1.
There was a problem hiding this comment.
Yes it is unreachable. It is in fact an assertion that we do not violate the protocol by e.g. writing only \n rather than \r\n or writing more data than specified in the chunk header. I do see a way to make this code a bit more clear though.
There was a problem hiding this comment.
I opened #24884 to further improve the clarity of this code and document the assertions better.
| .end, .none, .content_length => return out.flush(), | ||
| .chunked => |*chunked| switch (chunked.*) { | ||
| .offset => |offset| { | ||
| const chunk_len = out.end - offset - chunk_header_template.len; | ||
| if (chunk_len > 0) { | ||
| writeHex(out.buffer[offset..][0..chunk_len_digits], chunk_len); | ||
| chunked.* = .{ .chunk_len = 2 }; | ||
| } else { | ||
| out.end = offset; | ||
| chunked.* = .{ .chunk_len = 0 }; | ||
| } | ||
| try out.flush(); | ||
| }, | ||
| .chunk_len => return out.flush(), | ||
| }, |
There was a problem hiding this comment.
to give you some background for why this was done: the idea was that when flushing a chunked http transfer request, it should finish the chunk and send the end-of-chunk trailer and start a new chunk. Looking at it with fresh eyes, I'm not sure how valuable that is, although I can think of some reasons for it.
anyway I think the simplification that you brought is nice, so let's run with that.
There was a problem hiding this comment.
Hmm, this offset code in flush was actually unreachable before my commit.
I don't think it's possible to implement what you describe either. We can't finish the chunk early without writing the full amount of data specified in the chunk header, and we don't have access to data that the user of the API started but did not finish writing here in flush().
http.BodyWriter: handle EOF in chunkedSendFile, simplify
With these changes, the
zig stdcommand now works again and doesn't trigger assertion failures or mess up the chunked transfer encoding.Closes #24760
Closes #24859