banner
fwrite

fwrite

好好生活
twitter
github
email

H264

Concept#

  • SODB: Data bit stream -> The most primitive encoded data
  • RBSP: Raw byte sequence payload -> Appends trailing bits (RBSP trailing bits one bit "1") and several bits "0" to the end of SODB for byte alignment.
  • EBSP: Extended byte sequence payload -> Adds emulation prevention bytes (0X03) on top of RBSP. The reason is: when NALU is added to Annex B, a start code StartCodePrefix must be added before each NALU group. If the slice corresponding to the NALU is the start of a frame, it is represented with a 4-byte code, 0x00000001; otherwise, it is represented with a 3-byte code, 0x000001. To prevent conflicts with the start code in the NALU body, during encoding, whenever two consecutive bytes are 0, a byte of 0x03 is inserted. During decoding, 0x03 is removed. This is also known as the emulation prevention operation.

The functionality of H.264 is divided into two layers: Video Coding Layer (VCL) and Network Abstraction Layer (NAL)

VCL data is the compressed encoded video data sequence. The VCL data must be encapsulated into NAL units before it can be transmitted or stored.

The encoded video sequence of H.264 includes a series of NAL units, each containing an RBSP, as shown in Table 1. Encoded slices (including data partition slices and IDR slices) and the sequence RBSP end marker are defined as VCL NAL units, while the rest are NAL units. A typical RBSP unit sequence is shown in Figure 2.

2_1656856049435_0

Each unit is transmitted as an independent NAL unit. The information header of the unit (one byte) defines the type of the RBSP unit, while the rest of the NAL unit is RBSP data.

NAL Unit#

Each NAL unit is a variable-length byte string of a certain syntax element, including a one-byte header (used to indicate data type) and several integer byte payload data. A NAL unit can carry an encoded slice, A/B/C type data partition, or a sequence or picture parameter set.

The NALU header consists of one byte, and its syntax is as follows:

NAL units are transmitted in order according to the RTP sequence number. Among them, T is the payload data type, occupying 5 bits; R is the importance indication bit, occupying 2 bits; and the final F is the forbidden bit, occupying 1 bit. Specifically:

3_1656856064741_0

  1. The NALU type bit can represent 32 different types of NALU characteristics. Types 1-12 are defined for H.264, while types 24-31 are used for non-H.264. The RTP payload specification uses some of these values to define packet aggregation and fragmentation, while other values are reserved for H.264.
  2. The importance indication bit is used to mark the importance of a NAL unit during reconstruction; the higher the value, the more important it is. A value of 0 indicates that this NAL unit is not used for prediction and can be discarded by the decoder without error propagation; a value greater than 0 indicates that this NAL unit is to be used for drift-free reconstruction, and the higher the value, the greater the impact of losing this NAL unit.
  3. The forbidden bit has a default value of 0 in encoding. When the network detects a bit error in this unit, it can be set to 1 so that the receiver discards this unit, mainly to adapt to different types of network environments (such as a combination of wired and wireless environments).

Common frame header data for H.264:

00 00 00 01 67 (SPS)

00 00 00 01 68 (PPS)

00 00 00 01 65 (IDR frame)

00 00 00 01 61 (P frame)

The above 67, 68, 65, 61, and 41, etc., are all identification levels of the NALU.

F: Forbidden bit, 0 indicates normal, 1 indicates error, generally 0

NRI: Importance level, 11 indicates very important.

TYPE: Indicates what type this NALU is.

See the table below, from which it can be seen that 7 corresponds to the sequence parameter set (SPS), 8 corresponds to the picture parameter set (PPS), 5 represents I frames, and 1 represents non-I frames.

4_1656856114733_0

Thus, it can be seen that both 61 and 41 are actually P frames (type value is 1), but with different importance levels (their NRI values are 11BIN and 10BIN respectively).

H264 (Introduction to NAL and I Frame Judgment)#

h264_1656856123091_0

We will continue to analyze the data corresponding to the stream in the topmost figure layer by layer. The next byte after the split at 00 00 00 01 is the NALU type. Converting it to binary data, the interpretation order is from left to right, as follows:

(1) The 1st bit is the forbidden bit; a value of 1 indicates a syntax error.

(2) The 2nd and 3rd bits are the reference level.

(3) The 4th to 8th bits are the NAL unit type.

For example, after 00000001, there are 67, 68, and 65.

The binary code for 0x67 is:

0110 0111

Bits 4-8 are 00111, converted to decimal 7, referring to the second figure: 7 corresponds to the sequence parameter set SPS.

The binary code for 0x68 is:

0110 1000

Bits 4-8 are 01000, converted to decimal 8, referring to the second figure: 8 corresponds to the picture parameter set PPS.

The binary code for 0x65 is:

011 00101

Bits 4-8 are 00101, converted to decimal 5, referring to the second figure: 5 corresponds to the IDR picture slice (I frame).

Therefore, the algorithm to determine whether it is an I frame is:

(NALU type & 0001 1111) = 5, i.e., (NALU type & 31) = 5, for example, 0x65 & 31 = 5.

RTP Packaging and Sending H264 Detailed Explanation#

RFC3984 is the specification for transmitting H.264 baseline streams in RTP format.

H264 Stream Structure

5_1656856166759_0

Single NALU#

The audio and video data follows the 12-byte RTP header, which is relatively simple. A single NAL unit packet encapsulated into the RTP NAL unit stream must conform to the decoding order of the NAL unit. For NALUs with a length smaller than the MTU size, a single NAL unit mode is generally used. A raw H.264 NAL unit typically consists of three parts: [Start Code] [NALU Header] [NALU Payload], where the Start Code is used to indicate the beginning of a NAL unit and must be "00 00 00 01" or "00 00 01". The NALU header is only one byte, and the rest is the content of the NAL unit.

When packaging, remove the "00 00 01" or "00 00 00 01" start code, and package the other data into the RTP packet.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |F|NRI|  type   |                                               |
  +-+-+-+-+-+-+-+-+                                               |
  |                                                               |
  |               Bytes 2..n of a Single NAL unit                 |
  |                                                               |
  |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                               :...OPTIONAL RTP padding        |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

For example, if an H.264 NALU is as follows:

[00 00 00 01 67 42 A0 1E 23 56 0E 2F …]

This is a sequence parameter set NAL unit. [00 00 00 01] is the four-byte start code, 67 is the NALU header, and 42 starts the NALU content.

Encapsulated into an RTP packet, it will be as follows:

[RTP Header] [67 42 A0 1E 23 56 0E 2F]

That is, just remove the 4-byte start code.

Packet Aggregation#

When the length of the NALU is particularly small, several NAL units can be encapsulated in one RTP packet.

To reflect/cope with the huge differences in MTU between wired and wireless networks, the RTP protocol defines a packet aggregation strategy:

  • STAP-A: Aggregated NALUs have the same timestamp, without DON (decoding order number);
  • STAP-B: Aggregated NALUs have the same timestamp, with DON;
  • MTAP16: Aggregated NALUs have different timestamps, with the timestamp difference recorded in 16 bits;
  • MTAP24: Aggregated NALUs have different timestamps, with the timestamp difference recorded in 24 bits;
  • During packet aggregation, the RTP timestamp is the minimum value of all NALU timestamps;
0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |                                               |
+-+-+-+-+-+-+-+-+                                               |
|                                                               |
|             one or more aggregation units                     |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3.  RTP payload format for aggregation packets

STAP-A Example:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          RTP Header                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         NALU 1 Data                           |
:                                                               :
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | NALU 2 Size                   | NALU 2 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         NALU 2 Data                           |
:                                                               :
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 7.  An example of an RTP packet including an STAP-A
containing two single-time aggregation units

FU-A Fragmentation Format#

Larger H264 video packets are sent fragmented by RTP. Following the 12-byte RTP header is the FU-A fragment: when the length of the NALU exceeds the MTU, the NALU unit must be fragmented for packaging. This is also known as Fragmentation Units (FUs).

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU indicator  |   FU header  |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                                               |
|                         FU payload                            |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 14.  RTP payload format for FU-A

The FU indicator has the following format:

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |
+---------------+

The type field of the FU indicator Type=28 indicates FU-A. The NRI field value must be set according to the NRI field value of the fragmented NAL unit.

The format of the FU header is as follows:

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E|R|  Type   |
+---------------+

S: 1 bit, when set to 1 indicates this is the first fragment of the NALU. When the following FU payload is not the start of the fragmented NAL unit payload, the start bit is set to 0.

E: 1 bit, when set to 1 indicates this is the last fragment of the NALU, i.e., the last byte of the payload is also the last byte of the fragmented NAL unit. When the following FU payload is not the last fragment of the fragmented NAL unit, the end bit is set to 0.

R: 1 bit, reserved bit must be set to 0, the receiver must ignore this bit.

Type: 5 bits, the definition of the NAL unit payload type is shown in the table below.

Summary of Unit Types and Payload Structures

.Type   Packet      Type name                       
  ---------------------------------------------------------
  0      undefined                                    -
  1-23   NAL unit    Single NAL unit packet per H.264  
  24     STAP-A     Single-time aggregation packet    
  25     STAP-B     Single-time aggregation packet    
  26     MTAP16    Multi-time aggregation packet     
  27     MTAP24    Multi-time aggregation packet     
  28     FU-A      Fragmentation unit                
  29     FU-B      Fragmentation unit                 
  30-31  undefined                            

Unpacking and Packing

Packing: When the encoder needs to fragment the original NAL according to FU-A during encoding, the first three bits of the original NAL header correspond to the first three bits of the FU indicator, and the last five bits of the original NAL header correspond to the last five bits of the FU header.

Unpacking: When the receiver receives the FU-A fragmented data, it needs to combine all the fragmented packets to restore the original NAL packet. The relationship between the FU-A header and the restored NAL is as follows:

The eight bits of the restored NAL header are composed of the first three bits of the FU indicator and the last five bits of the FU header, i.e.:

nal_unit_type = (fu_indicator & 0xe0) | (fu_header & 0x1f)

libx264#

libx264 Learning Notes

FFmpeg Calls libx264

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.