Introduction to RTP Protocol#
Real-time Transport Protocol (RTP) is a network transport protocol that was published by the IETF's multimedia transport working group in RFC 1889 in 1996 and later updated in RFC 3550.
It is detailed as an Internet standard in RFC 3550. The RTP protocol specifies the standard packet format for delivering audio and video over the Internet. It was initially designed as a multicast protocol but has since been used in many unicast applications. RTP is commonly used in streaming media systems (in conjunction with the RTSP protocol), video conferencing, and Push-to-Talk systems (in conjunction with H.323 or SIP), making it a technical foundation for the IP telephony industry. RTP is used together with the RTP Control Protocol (RTCP) and is built on the User Datagram Protocol (UDP).
RTP and RTCP#
-
The data transmission protocol RTP is used for real-time data transmission. The information provided by this protocol includes: timestamps (for synchronization), sequence numbers (for packet loss and reordering detection), and payload format (to specify the encoding format of the data).
-
The control protocol RTCP is used for QoS feedback and synchronization of media streams. Compared to RTP, RTCP occupies very little bandwidth, typically only 5%.
Advantages of RTP#
When it comes to streaming media transmission, video surveillance, video conferencing, and voice over IP (VOIP), the application of the RTP protocol is indispensable. But why use RTP for streaming transmission? Why must RTP be used?
Reliable transport protocols like TCP ensure the correctness of every bit in the data stream through timeout and retransmission mechanisms, but this complicates both the implementation of the protocol and the transmission process. Additionally, when data loss occurs during transmission, the detection of data loss (timeout detection) and retransmission can force the data stream's transmission to pause and delay.
The RTP protocol is a transport protocol based on UDP. RTP itself does not provide a reliable delivery mechanism for delivering packets in order, nor does it provide flow control or congestion control; it relies on RTCP to provide these services. Thus, for lost packets, there is no delay caused by timeout detection, and discarded packets can be selectively retransmitted by the upper layer based on their importance.
Protocol Hierarchy of RTP#
Streaming Media Architecture
A typical protocol architecture in streaming media applications.
From the diagram, it can be seen that RTP is categorized at the transport layer, built on UDP. Like the UDP protocol, to achieve its real-time transmission functionality, RTP also has a fixed encapsulation format. RTP is used to provide timing information and stream synchronization for end-to-end real-time transmission but does not guarantee quality of service. Quality of service is provided by RTCP.
RTP Working Mechanism#
When an application establishes an RTP session, it will determine a pair of destination transmission addresses. The destination transmission address consists of a network address and a pair of ports, with two ports: one for RTP packets and one for RTCP packets, ensuring that RTP/RTCP data can be sent correctly. RTP data is sent to the even UDP port, while the corresponding control signal RTCP data is sent to the adjacent odd UDP port (even UDP port + 1), forming a UDP port pair. The sending process of RTP is as follows, and the receiving process is the opposite.
- The RTP protocol receives the streaming media information stream (e.g., H.263) from the upper layer and encapsulates it into RTP packets; RTCP receives control information from the upper layer and encapsulates it into RTCP control packets.
- RTP sends RTP packets to the even port in the UDP port pair; RTCP sends RTCP control packets to the odd port in the UDP port pair.
RTP packets only contain RTP data, while control is provided by the RTCP protocol. RTP selects an unused even UDP port number between 1025 and 65535, while RTCP in the same session uses the next odd UDP port number. Port numbers 5004 and 5005 are used as the default port numbers for RTP and RTCP, respectively. The header format of RTP packets is shown in Figure 2, where the first 12 bytes are mandatory.
Application Layer#
RTP should be part of the application layer. On the sending end of the application, developers must write program code to encapsulate packets with RTP and then hand the RTP packets over to UDP. On the receiving end, after the RTP packets enter the application layer through the UDP interface, the program code written by the developer must be used to extract the application data block from the RTP packets.
RTP Message#
First, let's look at the RTP header. The RTP message header format (see RFC3550 Page 12):
- Version (V): 2 bits, used to indicate the RTP version in use.
- Padding (P): 1 bit, if this bit is set, the end of the RTP packet contains additional padding bytes.
- Extension (X): 1 bit, if this bit is set, an extension header follows the fixed RTP header.
- CSRC Count (CC): 4 bits, contains the number of CSRCs that follow the fixed header.
- Marker (M): 1 bit, the interpretation of this bit is determined by the profile document. For audio streams operating under the minimal control profile in RTP during audio and video conferencing, the marker bit is set to 1, indicating the first packet sent after a period of silence; otherwise, it is set to 0.
- Payload Type (PayloadType): 7 bits, identifies the type of RTP payload.
- Sequence Number (SN): 16 bits, the sequence number increases by 1 for each RTP packet sent. The receiving end can use this to detect packet loss and reconstruct the packet sequence.
- Timestamp (Timestamp): 32 bits, records the sampling time of the first byte of data in the packet.
- Synchronization Source Identifier (SSRC): 32 bits, the synchronization source refers to the source of the RTP packet stream. There cannot be two identical SSRC values in the same RTP session. This identifier is randomly selected, and RFC1889 recommends the MD5 random algorithm.
- Contributing Source List (CSRC List): 0 to 15 items, each 32 bits, used to indicate all RTP packet sources contributing to a new packet generated by an RTP mixer. The mixer inserts these contributing SSRC identifiers into the list. SSRC identifiers are listed so that the receiving end can correctly identify the identities of the parties involved in the conversation.
RTP Extension Header Structure#
If the extension bit in the RTP fixed header is set to 1 (note: if there is a CSRC list, it follows the CSRC list), a variable-length header extension is added after the RTP fixed header. The header extension contains a 16-bit length field indicating the number of 32-bit words in the extension item, excluding the 4-byte extension header (thus zero is a valid value).
Only one header extension is allowed after the RTP fixed header. To allow multiple interoperable implementations to independently generate different header extensions, or for a specific implementation to have multiple different header extensions, the first 16 bits of the extension item are used to identify the identifier or parameters. The format of these 16 bits is defined by the upper layer protocol of the specific implementation. The basic RTP specification does not define any header extensions themselves.
RTP Session#
When an application establishes an RTP session, it will determine a pair of destination transmission addresses. The destination transmission address consists of a network address and a pair of ports, with two ports: one for RTP packets and one for RTCP packets, ensuring that RTP/RTCP data can be sent correctly. RTP data is sent to the even UDP port, while the corresponding control signal RTCP data is sent to the adjacent odd UDP port (even UDP port + 1), forming a UDP port pair.
RTP Sending Process#
- The RTP protocol receives the streaming media information stream (e.g., H.263) from the upper layer and encapsulates it into RTP packets; RTCP receives control information from the upper layer and encapsulates it into RTCP control packets.
- RTP packets are sent to the even port in the UDP port pair; RTCP sends RTCP control packets to the receiving port in the UDP port pair.
RTP Profile Mechanism#
RTP provides great flexibility for specific applications, separating the transport protocol from the specific application environment and control strategies. The transport protocol itself only provides the mechanism for completing real-time transmission, allowing developers to choose suitable configuration environments and control strategies based on different application environments.
The control strategies mentioned here refer to the ability to implement specific RTCP control algorithms based on specific application needs, such as the packet loss detection algorithm, packet retransmission strategies, and control schemes in some video conferencing applications (these strategies may be described in subsequent articles).
The suitable configuration environment mentioned above mainly refers to the relevant configuration of RTP and the definition of payload formats. To widely support various multimedia formats (such as H.264, MPEG-4, MJPEG, MPEG), the RTP protocol does not reflect specific application configurations in the protocol but provides them through profile configuration files and payload type format specification files. For any specific application, RTP defines a profile file and related payload format specifications.
RTCP#
Quality of service monitoring and feedback, synchronization between media, and identification of members in multicast groups. During the RTP session, each participant periodically sends RTCP packets. The RTCP packets contain statistics such as the number of packets sent and the number of packets lost, allowing participants to dynamically adjust the transmission rate and even change the payload type. RTP and RTCP work together to optimize transmission efficiency with effective feedback and minimal overhead, making them particularly suitable for transmitting real-time data over the Internet.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+
|V=2|P| IC | PT | Length |
+---------------+---------------+-------------------------------+
| |
| Format-specific information |
| |
| +-----------------------+
| | Padding if P = 1 |
+---------------------------------------+-----------------------+
The meanings of the various fields are:
- Version, fixed to 2;
- Padding flag, 1 indicates padding is present;
- Item Count (IC), used to indicate the number of items when the content of the RTCP packet is a list of items; otherwise, it can have other meanings;
- Packet Type (PT), RFC3550 defines five standard packet types: sender report (SR), receiver report (RR), source description (SDES), goodbye (BYE), application-specific message (APP);
- Length, total length of the content after the header, in four-byte units, can be 0;
RTCP packets are not transmitted individually; they need to be packaged together to form compound packets for transmission. Each compound packet is encapsulated by a lower-layer packet (usually a UDP/IP packet) for transmission. If the compound packet is to be encrypted, the prefix of the RTCP packet group is usually a 32-bit random number. The structure of the compound packet is shown in the figure below:
+---------------------------------------------------------------+
| |
| IP header |
| |
+---------------------------------------------------------------+
| |
| UDP header |
| |
+---------------------------------------------------------------+
| Random prefix (if encrypted) |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|V=2|P| IC | PT | Length |
+---------------+---------------+-------------------------------+
| | first
| | RTCP
| Format-specific information | packet
| |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|V=2|P| IC | PT | Length |
+---------------+---------------+-------------------------------+
| | second
| | RTCP
| Format-specific information | packet
| |
+---------------------------------------------------------------+
RTCP is also transmitted using UDP, but RTCP encapsulates only some control information, so the packets are short, allowing multiple RTCP packets to be encapsulated in a single UDP packet. There are five types of RTCP packets.
Type | Abbreviation | Purpose |
---|---|---|
200 | SR (Sender Report) | Sender Report |
201 | RR (Receiver Report) | Receiver Report |
202 | SDES (Source Description Items) | Source Description |
203 | BYE | End Transmission |
204 | .APP | Specific Application |
The encapsulation of the above five packet types is similar, and below we only describe the SR type; for other types, please refer to RFC3550.
The Sender Report (SR) packet is used for the sender to report sending status to all receivers in a multicast manner. The main content of the SR packet includes: the SSRC of the corresponding RTP stream, the timestamp and NTP of the latest RTP packet generated in the RTP stream, the number of packets contained in the RTP stream, and the number of bytes contained in the RTP stream. The encapsulation of the SR packet is shown below:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P| RC | PT=SR=200 | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SSRC of packet sender |
+---------------------------------------------------------------+
| NTP timestamp |
| |
+---------------------------------------------------------------+
| RTP timestamp |
+---------------------------------------------------------------+
| Sender's packet count |
+---------------------------------------------------------------+
| Sender's octet count |
+---------------------------------------------------------------+
| Receiver report block(s) |
| |
- Version (V): same as the RTP header field.
- Padding (P): same as the RTP header field.
- Receiver Report Count (RC): 5 bits, the number of receiver report blocks in this SR packet, which can be zero.
- Packet Type (PT): 8 bits, SR packet is 200.
- Length Field (Length): 16 bits, which stores the total length of this SR packet in 32-bit units minus one.
- Synchronization Source (SSRC): the synchronization source identifier of the sender of the SR packet, the same as the SSRC in the corresponding RTP packet.
- NTP Timestamp: the absolute time value when the SR packet is sent. The role of NTP is to synchronize different RTP media streams.
- RTP Timestamp: corresponds to the NTP timestamp, with the same unit and random initial value as the RTP timestamp in the RTP data packet.
- Sender's Packet Count: the total number of RTP packets sent by the sender from the start of sending packets to the generation of this SR packet. This field is reset to zero when the SSRC changes.
- Sender's Octet Count: the total number of payload data bytes sent by the sender from the start of sending packets to the generation of this SR packet (excluding headers and padding). This field is reset to zero when the sender changes its SSRC.
- SSRC Identifier of Synchronization Source n: this report block contains statistics on the packets received from that source.
- Fraction Lost: indicates the packet loss rate of RTP packets from synchronization source n (SSRC_n) since the last SR or RR packet was sent.
- Cumulative Number of Packets Lost: the total number of RTP packets lost from the start of receiving packets from SSRC_n to the sending of the SR, from SSRC_n.
- Highest Sequence Number Received: the highest sequence number of RTP packets received from SSRC_n.
- Interarrival Jitter: statistical variance estimate of the arrival times of RTP packets.
- Last SR Timestamp (Last SR, LSR): takes the middle 32 bits of the NTP timestamp from the most recent SR packet received from SSRC_n. If no SR packet has been received yet, this field is set to zero.
- Delay Since Last SR (Delay since last SR, DLSR): the delay from the last SR packet received from SSRC_n to the sending of this report.
RTP Timestamp#
The timestamp reflects the sampling time of the first byte of data in the RTP packet, and the initial timestamp value at the start of a session is also randomly chosen. Even when no signal is being sent, the value of the timestamp must continuously increase over time. The receiving end can use the timestamp to accurately know when to restore which data block, thereby eliminating jitter during transmission. The timestamp can also be used to synchronize audio and video in video applications.
The RTP protocol does not specify the granularity of the timestamp; it depends on the type of payload. For example, if the sampling frequency is 90000 Hz, then the timestamp unit is 1/90000. If 30 frames are sent per second, then the timestamp increment is 90000/30 = 3000.
The timestamp increment is the time interval between sending the second RTP packet and sending the first RTP packet; if it is video, it should be the interval time between sending each frame.
Code#
RTP header
/*
* RTP header
*/
typedef struct
{
#if 0 //BIG_ENDIA
unsigned int version:2; /* protocol version */
unsigned int p:1; /* padding flag */
unsigned int x:1; /* header extension flag */
unsigned int cc:4; /* CSRC count */
unsigned int m:1; /* marker bit */
unsigned int pt:7; /* payload type */
unsigned int seq:16; /* sequence number */
#else
unsigned int cc:4; /* CSRC count */
unsigned int x:1; /* header extension flag */
unsigned int p:1; /* padding flag */
unsigned int version:2; /* protocol version */
unsigned int pt:7; /* payload type */
unsigned int m:1; /* marker bit */
unsigned int seq:16; /* sequence number */
#endif
u_int32 ts; /* timestamp */
u_int32 ssrc; /* synchronization source */
u_int32 csrc[1]; /* optional CSRC list */
} rtp_hdr_t;
RTCP Common header
/*
* RTCP common header word
*/
typedef struct {
#if 0 //BIG_ENDIA
unsigned int version:2; /* protocol version */
unsigned int p:1; /* padding flag */
unsigned int count:5; /* varies by packet type */
#else
unsigned int count:5; /* varies by packet type */
unsigned int p:1; /* padding flag */
unsigned int version:2; /* protocol version */
#endif
unsigned int pt:8; /* RTCP packet type */
unsigned short length; /* pkt len in words, w/o this word */
} rtcp_common_t;