RTSP Protocol#
Real Time Streaming Protocol (RTSP) is a network application protocol designed for use in entertainment and communication systems to control streaming media servers. This protocol is used to create and control media sessions between endpoints. Clients of the media server issue VCR commands such as play, record, and pause to facilitate real-time control of media streams from the server to the client (video on demand) or from the client to the server (voice recording). It is an application layer protocol in the TCP/IP protocol suite, submitted as an IETF RFC standard by Columbia University, Netscape, and RealNetworks. The corresponding RFC number is 2326, which can be searched here RFC Editor.
This protocol defines how a one-to-many application can efficiently deliver multimedia data over IP networks. RTSP is architecturally positioned above RTP and RTCP, using TCP or RTP to complete data transmission. RTSP is used to establish control over the transmission of media streams, playing the role of "network remote control" for multimedia services. Although RTSP control information and media data streams can sometimes be interleaved, generally, RTSP itself is not used to forward media stream data. The transmission of media data can be accomplished through protocols such as RTP/RTCP.
The protocol is used in a client/server model and is a text-based protocol for establishing and negotiating real-time streaming sessions between clients and servers.
Protocol System#
A basic RTSP operation process:
- First, the client connects to the streaming server and sends an RTSP describe command (DESCRIBE).
- The streaming server responds with an SDP description, which includes information such as the number of streams and media types.
- The client then analyzes the SDP description and sends an RTSP setup command (SETUP) for each stream in the session, informing the server of the port the client will use to receive media data. Once the streaming connection is established,
- The client sends a play command (PLAY), and the server begins sending the media stream (RTP packets) to the client over UDP. During playback, the client can also send commands to the server to control fast forward, rewind, and pause.
- Finally, the client can send a teardown command (TEARDOWN) to end the streaming session.
Client->Server: DESCRIBE
Server->Client: 200 OK (SDP)
Client->Server: SETUP
Server->Client: 200 OK
Client->Server: PLAY
Server->Client: (RTP packets)
Protocol Features#
- Extensibility: New methods and parameters can be easily added to RTSP.
- Easy to parse: RTSP can be parsed by standard HTTP or MIME parsers.
- Security: RTSP uses web security mechanisms.
- Transport independence: RTSP can use unreliable datagram protocols (UDP), reliable datagram protocols (RDP); for application-level reliability, reliable stream protocols can be used.
- Multi-server support: Each stream can be placed on different servers, and the client automatically establishes several concurrent control connections with different servers, with media synchronization performed at the transport layer.
- Recording device control: The protocol can control recording and playback devices.
- Separation of flow control and conference initiation: Only requires the conference initialization protocol to provide, or can be used to create a unique conference identifier. In special cases, SI or H.323 can be used to invite servers to join the conference.
- Suitable for professional applications: Through SMPTE timestamps, RTSP supports frame-level accuracy, allowing for remote digital editing.
- Presentation description neutrality: The protocol does not impose special presentations or metadata files and can transmit the format types used; however, the presentation description must include at least one RTSP URL.
- Proxy-friendly: Here, RTSP wisely adopts HTTP concepts, making the current structure reusable. The structure includes the Internet Content Selection Platform (PICS). Since controlling continuous media usually requires server state, RTSP adds methods not just to HTFP.
- Appropriate server control: If a user starts a stream, they must also be able to stop a stream.
- Transmission coordination: Users can coordinate transmission methods before actually handling continuous media streams.
- Performance coordination: If basic features are ineffective, there must be some cleanup mechanism for users to decide which methods are ineffective.
Message Structure#
RTSP URL#
An end user begins watching streaming media by entering the URL address in the player, and for mobile streaming on demand using the RTSP protocol, the general format of the URL is as follows:
A URL link starting with "rtsp" or "rtspu" is used to specify that the RTSP protocol is currently in use. The syntax structure of an RTSP URL is: rtsp_url = (“rtsp:”| “rtspu:”) “//” host [“:”port”] /[abs_path]/content_name
- host: Can be a valid domain name or IP address.
- port: Port number; for the RTSP protocol, the default port number is 554. This can be omitted when confirming that the streaming media server provides port number 554.
- abs_path: Identifies the media stream resource in the RTSP Server, see the next section on naming recording resources.
RTSP URLs are used to identify media stream resources on the RTSP Server, which can identify a single media stream resource or a collection of multiple media stream resources.
RTSP Messages#
RTSP is a text-based protocol, using CRLF as the line terminator.
RTSP has two types of messages: request messages and response messages. A request message is sent from the client to the server, while a response message is sent from the server to the client. Since RTSP is text-oriented, each field in the message is a string of ASCII codes, and thus the length of each field is uncertain. An RTSP message consists of three parts: start line, header lines, and entity body. In a request message, the start line is the request line.
The methods of RTSP request messages include: OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE, GET_PARAMETER, and SET_PARAMETER.
A request message can be initiated by the client to the server or by the server to the client. The syntax structure of a request message is as follows:
Request = Request-Line
(general-header|request-header|entity-header)
CRLF
[message-body]
Request Line
The syntax structure of the first line of the request message is as follows:
Request-Line = Method 空格 Request-URI 空格 RTSP-Version CRLF
The first word that appears in the message line is the signaling method being used. The currently known signaling methods are as follows:
Method = “DESCRIBE”
| “ANNOUNCE”
| “GET_PARAMETER”
| “OPTIONS”
| “PAUSE”
| “PLAY”
| “RECORD”
| “REDIRECT”
| “SETUP”
| “SET_PARAMETER”
| “TEARDOWN”
Example: DESCRIBE rtsp://example.com/media.mp4
Request Header Fields
In addition to the content of the first line, there are some fields in the message header that provide additional information. Some of these are mandatory, and we will detail the meanings of several commonly used fields later.
Request-header = Accept
| Accept-Encoding
| Accept-Language
| Authorization
| From
| If-Modified-Since
| Range
| Referer
| User-Agent
Response Message
The syntax structure of the response message is as follows:
Response = Status-Line
(general-header |response-header|entity-header)
CRLF
[message-body]
Status-Line
The first line of the response message is the status line, with each element separated by a space. There must be no CR or LF in the middle of this line, except for the final CRLF. Its syntax format is as follows:
Status-Line = RTSP-Version 空格 Status-Code 空格 Reason-Phrase CRLF
The status code (Status-Code) is a three-digit integer used to describe the result of the recipient's execution of the received request message.
The first digit of the Status-Code specifies the type of response message, which falls into five categories:
- 1XX: Informational – The request has been received and is continuing to be processed.
- 2XX: Success – The request has been successfully received, parsed, and accepted.
- 3XX: Redirection – More actions are needed to complete the request.
- 4XX: Client Error – The request message contains syntax errors or cannot be executed effectively.
- 5XX: Server Error – The server failed to respond and cannot process a valid request message.
The status codes we often encounter when dealing with issues include:
Status-Code= “200": OK ,
“400”: Bad Request ,
“404”: Not Found ,
“500”: Internal Server Error
Response Header Fields
The fields in the response message contain additional information that cannot be placed in the Status-Line but needs to be sent to the requester.
Response-header = Location
Proxy-Authenticate
Public
Retry-After
Server
Vary
WWW-Authenticate
Main Methods#
Method | Direction | Object | Requirement | Meaning |
---|---|---|---|---|
DESCRIBE | C->S | P S | Recommended | Checks the description of the presentation or media object; the response to DESCRIBE constitutes the initial phase of media RTSP. |
ANNOUNCE | C->S S->C | P S | Optional | Sends the object identified by the URL to the server; conversely, ANNOUNCE updates the connection description in real-time. |
GET_PARAMETER | C->S S->C | P S | Optional | Requests to check the parameter values of the presentation and media specified by the URL. |
OPTIONS | C->S S->C | P S | Required | OPTIONS requests can be issued at any time. |
PAUSE | C->S | P S | Recommended | PAUSE requests cause a temporary interruption in stream transmission. |
PLAY | C->S | P S | Required | PLAY tells the server to start sending data using the mechanism specified in SETUP. |
RECORD | C->S | P S | Optional | This method initializes the media data recording range based on the presentation description. |
REDIRECT | S->C | P S | Optional | Redirects the request to notify the client to connect to another server address. |
SETUP | C->S | S | Required | The SETUP request for the URL specifies the transmission mechanism for streaming media. |
SET_PARAMETER | C->S S->C | P S | Optional | Requests to set the parameter values of the presentation or the stream specified by the URL. |
TEARDOWN | C->S | P S | Required | TEARDOWN requests stop sending the stream for the given URL and release related resources. |
Note: P---presentation, C---client, S---server, S (object bar)---stream
Signaling refers to the operations specified in the Request-URI that need to be completed by the recipient. The signaling (The method) is case-sensitive, cannot start with the character "$", and must be a token.
RTSP Header Parameters#
- Accept: Used to specify the types of media description information that the client can accept. For example:
Accept: application/rtsl, application/sdp;level=2
. - Bandwidth: Used to describe the available bandwidth value for the client.
- CSeq: Specifies the sequence number of the RTSP request-response pair, which must be included in every request or response. Each request message containing a given sequence number will have a corresponding response message with the same sequence number.
- Range: Used to specify a time range, which can use SMPTE, NTP, or clock time units.
- Session: The Session header field identifies an RTSP session. The Session ID is chosen by the server in the response to SETUP, and once the client obtains the Session ID, it must include the Session ID in subsequent request messages for operations on the Session.
- Transport: The Transport header field contains a list of transport options that the client can accept, including transport protocols, address ports, TTL, etc. The server also returns the specific options selected through this header field. For example:
Transport:RTP/AVP;multicast;ttl=127;mode="PLAY"
,RTP/AVP;unicast;client_port=1289-1290;mode="PLAY"
.
Simple RTSP Message Interaction Process#
C represents the RTSP client, S represents the RTSP server.
Step 1: Query available methods on the server side
C->S OPTION request Ask S what methods are available.
S->C OPTION response S responds with the public header field including all available methods.
Step 2: Obtain media description information
C->S DESCRIBE request Request to obtain media description information provided by S.
S->C DESCRIBE response S responds with media description information, generally SDP information.
Step 3: Establish RTSP session
C->S SETUP request Transport header field lists acceptable transport options, requesting S to establish a session.
S->C SETUP response S establishes the session by returning the selected specific transport options through the Transport header field and returns the established Session ID.
Step 4: Request to start data transmission
C->S PLAY request C requests S to start sending data.
S->C PLAY response S responds to the request.
Step 5: Data transmission during playback
S->C sends streaming media data Data is transmitted via the RTP protocol.
Step 6: Close session and exit
C->S TEARDOWN request C requests to close the session.
S->C TEARDOWN response S responds to the request.
The above process is just a standard, friendly RTSP flow, but actual requirements may not necessarily follow this process. Steps three and four are essential! Step one can be omitted as long as the server and client agree on which methods are available. Step two can also be skipped if we have other means to obtain media initialization description information (such as HTTP requests, etc.), thus we do not need to complete it through the DESCRIBE request in RTSP.