[Mp4-tech] [H.264] output timing, bumping process, missing HRD parameters

Thu May 4 14:53:07 ESTEDT 2006

Martin,
You seem to have a good grasp of the basic issue.  Timing information
must be provided by encoders or systems, or fully-proper decoded output
display may be difficult.  Further response in-line.
Best Regards,
Gary Sullivan
+> -----Original Message-----
+> From: mp4-tech-bounces lists.mpegif.org 
+> [mailto:mp4-tech-bounces lists.mpegif.org] On Behalf Of 
+> Martin.Lange sci-worx.com
+> Sent: Thursday, May 04, 2024 12:57 AM
+> To: mp4-tech lists.mpegif.org
+> Subject: [Mp4-tech] [H.264] output timing, bumping 
+> process,missing HRD parameters
+> 
+> Hello everyone,
+> 
+> I have a question on H.264 regarding the relationship between output
+> timing and the bumping process in the HRD when no HRD parameters are
+> sent by the encoder:
+> 
+> To be able to display a picture in each time slot, I 
It is not necessarily the case that there should be a new picture to
display in every "time slot", unless fixed_frame_rate_flag is equal to
1.
+> obviously need an
+> initial delay between decoding and displaying the first 
+> picture. Now how
+> do I know about this delay when there are no HRD parameters present?
Timing information equivalent to the HRD picture timing data is supposed
to be conveyed to the decoder in some fashion.  Otherwise, a decoder
would, at best, be able to do something like the bumping process and
would have difficulty providing perfect timed-output behavior unless it
has some extra memory capacity.
I believe it is generally agreed that if such timing information is not
provided, decoders may not display some of the output pictures for the
reasons you describe.
+> Is the delay inferred to be equal to 
+> max_dec_frame_buffering, which is
+> inferred to be equal to MaxDpbSize when it is not present? 
+> When this is
+> the case, I don't see where this is pointed out in the 
+> standard. 
Perhaps you're looking for this sentence from E.2.1: "When the
max_dec_frame_buffering syntax element is not present, the value of
max_dec_frame_buffering shall be inferred to be equal to MaxDpbSize."
+> I will
+> give an example about the problem which can arise when the encoder
+> behaves in some manner and doesn't tell the decoder about it 
+> and there
+> is no inferred default value of that delay:
+> 
+> Let's assume max_dec_frame_buffering = 6. The encoder could however
+> commit itself for some reason to let the output delay be 2, 
+> i.e. a frame
+> will be able to be displayed at most 2 decoded frames later. 
+> The picture
+> sequence with the following POCs would satisfy this constraint:
+> 0, 3, 2, 1, 5, 4, 6, ... (continues somehow in a manner 
+> which is ok for
+> the mentioned constraint) After the frame with POC 2 has been decoded
+> (which is also the 3rd frame in decoding order in this example) a
+> decoder knowing about the delay being 2 could display frame 
+> 0 and then
+> display a frame after each decoded frame like this:
+> decoding: 0, 3, 2, 1, 5, 4, 6, ...
+> output:   -, -, 0, 1, 2, 3, 4, 5, 6, ...
+> 
+> Let's now assume the frame with POC 4 is a non-reference frame. After
+> frame 6 has been decoded, frame 4 will be displayed. So it has been
+> output and is no reference frame, so gets discarded from the 
+> DBP and a
+> place is free for the frame with POC 6.
+> 
+> A decoder without this information must assume a delay of 6 since the
+> encoder could as well base it's calculations on this "worst 
+> case" delay.
+> So it will not be able to output frame 0 until frame 6 has 
+> been decoded.
+> But now the bumping process says of course that frame 4 must 
+> be evicted
+> from the DPB to able to store frame 6. Thus, the decoder 
+> would have to
+> output frames 1 .. 3 in no time.
Time is not really part of the concept of the bumping process decoder
operation.  Neither is display.  For example, "output" does not
necessarily imply display.  Note, for example, that a decoder that only
claims output order conformance is not required to be able to keep up
with the full input rate of the incoming video bitstream.  The feeding
of bits into such a decoder follows a "pull" model, not a "push" model.
Note where the standard says the following: "For output order decoder
conformance, the HSS delivers the bitstream to the DUT "by demand" from
the DUT, meaning that the HSS delivers bits (in decoding order) only
when the DUT requires more bits to proceed with its processing. NOTE -
This means that for this test, the coded picture buffer of the DUT could
be as small as the size of the largest access unit."
+> 
+> What is the solution to this problem? Is it that the encoder, when it
+> doesn't send the appropiate information, must take the worst 
+> case delay
+> as the base for it's calculations of the DPB state, which 
+> would make the
+> above sequence non conformant with the HRD? If yes, where do 
+> I find this
+> agreement in the standard?
The best solution is that the encoder must send something equivalent to
the DPB decoding and output timing information.  Without that, as you
have figured out, a decoder may not be able to provide proper output
behavior with proper output timing unless the decoder includes
sufficient extra memory to hold some extra pictures that are queued for
display.
Note that the standard says: "In order to check conformance of a
bitstream using the HRD, all sequence parameter sets and picture
parameters sets referred to in the VCL NAL units, and corresponding
buffering period and picture timing SEI messages shall be conveyed to
the HRD, in a timely manner, either in the bitstream (by non-VCL NAL
units), or by other means not specified in this Recommendation |
International Standard."
Without proper timing information, fully-proper decoder output behavior
may not be possible unless the decoder has some extra memory.
In my opinion it would acceptable for a real-time decoder that has
received your example bitstream and has not received any timing
information to, for example, not display any of the output frames that
precede frame 4 -- or perhaps to display frame 0 and then frames 5, 6,
etc.  Adding some more memory to the decoder could decrease the number
of frames that are not displayed -- however there may be a limit to how
much effort and cost are necessary to address this scenario that
shouldn't happen with a good system and encoder design anyway.
+> 
+> Best regards,
+> Martin Lange
+> 
+> _______________________________________________
+> NOTE: Please use clear subject lines for your posts. Include 
+> [audio, [video], [systems], [general] or another 
+> apppropriate identifier to indicate the type of question you have.
+> 
+> Note: Conduct on the mailing list is subject to the 
+> Antitrust guidelines found at 
+> http://www.mpegif.org/public/documents/vault/mp-out-30042-Ant
+> itrust.php
+>