Add unintrusive h264 DTS extractor #61#124
Conversation
| let sps = match self.spsp.as_mut() { | ||
| Some(sps) => sps, | ||
| None => { | ||
| let sps_rbsp = h264_reader::rbsp::decode_nal(sps).map_err(DecodeSps)?; |
There was a problem hiding this comment.
It's a bit inconvenient to parse the SPS again when we already parsed it internally in Depacketizer. I'm guessing you don't want to expose SeqParameterSet in the public API surface. Maybe we could find a way to access it from a private method on Stream or VideoFrame?
|
I'm struggling with this one. Maybe it will help to talk it through. I'm starting to understand mediamtx's algorithm for getting the difference in numbers of frames. Arguably it's good enough although a couple things seem sketchy to me:
fwiw, it doesn't support a few cases at all:
Then it comes up with a dts which matches the original order and mostly tries to space frames out but in one case ends up piling them 1 ms apart. I think it could then exceed the bit rate limits in H.264 Table A-1 where a more evenly-spaced dts wouldn't. Stepping back a bit, I think the intent is basically to come up with decode timestamps without ever having to buffer frames. vs say gstreamer's https://github.com/GStreamer/gstreamer/tree/main/subprojects/gst-plugins-bad/gst/codectimestamper which does delay frames. But...I'm wondering if that difference matters:
I was wondering if Retina did the right thing with backwards timestamps anyway. I guess In some cases Retina callers may want to come up with timestamps in a completely different way (see scottlamb/moonfire-nvr#322). In that case I don't know if this extractor helps as much as it should. Maybe having it at a higher level after you plug in your own timestamp would be better then. (But on the other hand, I don't know how easy it will be to come up with those timestamps without understanding the decode order, so again I'm not quite sure if the interface is what it'd need to be for that use case. Speaking of: MediaMTX has some of these things figured out. Do they handle these cameras with totally broken timestamps, and if so, how?) |
My streamer wraps every frame in it's own mp4 fragment. How many frames does the gstreamer algorithm actually buffer? If the stream only has a frame rate of 3 fps then that could be several seconds of delay.
I couldn't find any code that deals with identical RTP timestamps, but I did find some interesting stuff: https://github.com/bluenviron/gortsplib/blob/7dbc38520457792ce32f9a3c13a4388d36d471ea/client.go#L2423 https://mediamtx.org/docs/usage/route-absolute-timestamps bluenviron/mediamtx#1002 (comment)
|
Fair enough. I guess this is probably what Moonfire would do too—even though the B-frame won't be displayed until after some sent-later frame, it's still better to start transferring it to the client before receiving that later frame from the camera and likewise to feed it into the decoder as soon as possible. (when you're doing low-latency serving with |
|
Browser streaming without dts extraction would be a huge deal. We probably wouldn't even need to buffer one frame to calculate the frame duration. Do you think decode timestamps need to be stored on disk or can they always be generated on demand? A timeline feature would need to fetch and decode GOPs in any arbitrary order. Does gstreamer's timestamper work on GOPs independently or does the output differ if it receives multiple GOPs? I'm curious if you'll run into any fun issues with WebCodecs if you switch. Do you plan to keep the old streamer around as a fallback for Firefox? Do you still want this PR? A zero latency dts extractor would still be useful for anyone that wants to convert RTSP streams to SRT or RTMP. |
Same but it's all dumb flat files: https://codeberg.org/SentryShot/sentryshot/src/branch/master/src/recording
Would it be much harder to seek to the nearest start and end of the GOPs and then feed all the timestamps into gstreamer's dts extractor?
I'm thinking it might create a stutter between the GOPs.
Sure
Can't they do that already? |

Rust implementation of MediaMTX's magic DTS extractor.
RTSP streams don't keep track of when video frames need to be decoded, and there doesn't seem to be any official specification for how to do it in real-time.
I didn't document how the algorithm works because I don't really know and it'd probably be better to document it upstream anyway.
I'd try to document the API, but it'd probably take longer to review my bad writing than to do it yourself, and I don't want to waste your time.