RTSP Protocol#
Real Time Streaming Protocol (RTSP) is a network application protocol designed for use in entertainment and communications systems to control streaming media servers. This protocol is used to create and control media sessions between endpoints. The client of the media server issues VCR commands such as play, record, and pause to facilitate real-time control of media streams from the server to the client (video on demand) or from the client to the server (voice recording). It is an application layer protocol in the TCP/IP protocol suite, submitted as an IETF RFC standard by Columbia University, Netscape, and RealNetworks. The corresponding RFC number is 2326, which can be searched here RFC Editor.
This protocol defines how a one-to-many application can efficiently deliver multimedia data across IP networks. RTSP is architecturally positioned above RTP and RTCP, using TCP or RTP for data transport. RTSP is used to establish control over media stream transmission, playing the role of "network remote control" for multimedia services. Although RTSP control information and media data streams can sometimes be interleaved, generally, RTSP itself is not used to relay media stream data. Media data transmission can be accomplished through protocols like RTP/RTCP.
The protocol is used in a client/server model and is a text-based protocol for establishing and negotiating real-time streaming sessions between clients and servers.
Protocol System#
A basic RTSP operation process:
- First, the client connects to the streaming server and sends an RTSP describe command (DESCRIBE).
- The streaming server responds with an SDP description, including information such as the number of streams and media types.
- The client then analyzes the SDP description and sends an RTSP setup command (SETUP) for each stream in the session, informing the server of the port the client will use to receive media data. Once the streaming connection is established,
- The client sends a play command (PLAY), and the server begins sending media streams (RTP packets) to the client over UDP. During playback, the client can also send commands to the server to control fast forward, rewind, and pause.
- Finally, the client can send a teardown command (TEARDOWN) to end the streaming session.
Client->Server: DESCRIBE
Server->Client: 200 OK (SDP)
Client->Server: SETUP
Server->Client: 200 OK
Client->Server: PLAY
Server->Client: (RTP packets)
Protocol Features#
- Extensibility: New methods and parameters can be easily added to RTSP.
- Easy to parse: RTSP can be parsed by standard HTTP or MIME parsers.
- Security: RTSP uses web security mechanisms.
- Transport independence: RTSP can use unreliable datagram protocols (UDP) or reliable datagram protocols (RDP); for application-level reliability, reliable stream protocols can be used.
- Multi-server support: Each stream can be hosted on different servers, and the client automatically establishes several concurrent control connections with different servers, with media synchronization performed at the transport layer.
- Recording device control: The protocol can control recording and playback devices.
- Separation of flow control and conference initiation: Only requires the conference initialization protocol to provide, or can be used to create a unique conference identifier. In special cases, SI or H.323 can be used to invite servers to join the conference.
- Suitable for professional applications: Through SMPTE time codes, RTSP supports frame-level accuracy, allowing for remote digital editing.
- Presentation description neutrality: The protocol does not impose special presentations or metadata files and can transmit the format types used; however, the presentation description must include at least one RTSP URL.
- Proxy-friendly: Here, RTSP wisely adopts HTTP concepts, making the current structure reusable. The structure includes the Internet Content Selection Platform (PICS). Since controlling continuous media usually requires server state, RTSP does not merely add methods to HTTP.
- Appropriate server control: If a user starts a stream, they must also be able to stop it.
- Transport coordination: Users can coordinate transport methods before actually handling continuous media streams.
- Performance coordination: If basic features are ineffective, there must be some cleanup mechanism for users to decide which methods are not effective.
Message Structure#
RTSP URL#
A terminal user begins watching streaming media by entering a URL address in the player, and for mobile streaming media on demand using the RTSP protocol, the general format of the URL is as follows:
A URL link starting with "rtsp" or "rtspu" is used to specify that the RTSP protocol is currently in use. The syntax structure of an RTSP URL is: rtsp_url = (“rtsp:”| “rtspu:”) “//” host [“:”port”] /[abs_path]/content_name
- host: Can be a valid domain name or IP address.
- port: Port number; for the RTSP protocol, the default port number is 554. This can be omitted if the confirmed port number provided by the streaming media server is 554.
- abs_path: Identifies the media stream resource in the RTSP Server, see the next section on naming video resources.
RTSP URLs are used to identify media stream resources on the RTSP Server, which can identify a single media stream resource or a collection of multiple media stream resources.
RTSP Messages#
RTSP is a text-based protocol that uses CRLF as the line terminator.
RTSP has two types of messages: request messages and response messages. Request messages are those sent from the client to the server, while response messages are those sent from the server to the client. Since RTSP is text-oriented, each field in the message consists of ASCII character strings, making the length of each field uncertain. An RTSP message consists of three parts: start line, header lines, and message body. In a request message, the start line is the request line.
The methods of RTSP request messages include: OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE, GET_PARAMETER, and SET_PARAMETER.
A request message can be initiated by the client to the server or by the server to the client. The syntax structure of a request message is as follows:
Request = Request-Line
(general-header|request-header|entity-header)
CRLF
[message-body]
Request Line
The syntax structure of the first line of the request message is as follows:
Request-Line = Method 空格 Request-URI 空格 RTSP-Version CRLF
The first word that appears in the message line is the signaling method being used. The currently known signaling methods are as follows:
Method = “DESCRIBE”
| “ANNOUNCE”
| “GET_PARAMETER”
| “OPTIONS”
| “PAUSE”
| “PLAY”
| “RECORD”
| “REDIRECT”
| “SETUP”
| “SET_PARAMETER”
| “TEARDOWN”
Example: DESCRIBE rtsp://example.com/media.mp4 RTSP/1.0
Request Header Fields
In addition to the content of the first line, the message header contains some fields that provide additional information. Some of these are mandatory, and we will detail the meanings of several commonly used fields later.
Request-header = Accept
| Accept-Encoding
| Accept-Language
| Authorization
| From
| If-Modified-Since
| Range
| Referer
| User-Agent
Response Message
The syntax structure of the response message is as follows:
Response = Status-Line
(general-header |response-header|entity-header)
CRLF
[message-body]
Status-Line
The first line of the response message is the status line, with each element separated by a space. Except for the final CRLF, there must be no CR or LF in the middle of this line. Its syntax format is as follows:
Status-Line = RTSP-Version 空格 Status-Code 空格 Reason-Phrase CRLF
The status code (Status-Code) is a three-digit integer used to describe the result of the recipient's execution of the received request message.
The first digit of the Status-Code specifies the type of response message, which can be one of five categories:
- 1XX: Informational – The request has been received and is being processed.
- 2XX: Success – The request has been successfully received, parsed, and accepted.
- 3XX: Redirection – More actions are needed to complete the request.
- 4XX: Client Error – The request message contains syntax errors or cannot be executed effectively.
- 5XX: Server Error – The server failed to respond and cannot process a valid request message.
Common status codes we encounter when handling issues include:
Status-Code= “200": OK ,
“400”: Bad Request ,
“404”: Not Found ,
“500”: Internal Server Error
Response Header Fields
The fields in the response message contain additional information that cannot be included in the Status-Line but needs to be sent to the requester.
Response-header = Location
Proxy-Authenticate
Public
Retry-After
Server
Vary
WWW-Authenticate
Main Methods#
Method | Direction | Object | Requirement | Meaning |
---|---|---|---|---|
DESCRIBE | C->S | P S | Recommended | Check the description of the presentation or media object, the response to DESCRIBE - response constitutes the initial phase of media RTSP |
ANNOUNCE | C->S S->C | P S | Optional | The object identified by the URL is sent to the service, conversely, ANNOUNCE updates the connection description in real-time. |
GET_PARAMETER | C->S S->C | P S | Optional | Request to check the parameter values of the presentation and media specified by the URL |
OPTIONS | C->S S->C | P S | Required | OPTIONS requests can be issued at any time |
PAUSE | C->S | P S | Recommended | PAUSE request causes a temporary interruption in stream transmission |
PLAY | C->S | P S | Required | PLAY tells the server to start sending data using the mechanism specified in SETUP |
RECORD | C->S | P S | Optional | This method initializes the media data recording range based on the presentation description |
REDIRECT | S->C | P S | Optional | Redirect request notifies the client to connect to another server address |
SETUP | C->S | S | Required | The SETUP request for the URL specifies the transport mechanism for streaming media |
SET_PARAMETER | C->S S->C | P S | Optional | Request to set the parameter values for the presentation or stream specified by the URL |
TEARDOWN | C->S | P S | Required | TEARDOWN request stops sending the stream for the given URL and releases related resources |
Note: P---presentation, C---client, S---server, S (object bar)---stream
Signaling refers to the operations specified in the Request-URI that need to be completed by the recipient. The signaling (The method) is case-sensitive, cannot start with the character "$", and must be a token.
RTSP Header Parameters#
- Accept: Used to specify the types of media description information that the client can accept. For example:
Accept: application/rtsl, application/sdp;level=2
. - Bandwidth: Used to describe the available bandwidth value for the client.
- CSeq: Specifies the sequence number of the RTSP request-response pair, which must be included in every request or response. Each request message containing a given sequence number will have a corresponding response message with the same sequence number.
- Range: Used to specify a time range, which can use SMPTE, NTP, or clock time units.
- Session: The Session header field identifies an RTSP session. The Session ID is chosen by the server in the response to SETUP, and once the client obtains the Session ID, it must include the Session ID in subsequent request messages for Session operations.
- Transport: The Transport header field contains a list of transport options that the client can accept, including transport protocol, address port, TTL, etc. The server also returns the actual selected specific options through this header field. For example:
Transport:RTP/AVP;multicast;ttl=127;mode="PLAY"
,RTP/AVP;unicast;client_port=1289-1290;mode="PLAY"
Simple RTSP Message Interaction Process#
C represents the RTSP client, S represents the RTSP server
Step 1: Query available methods on the server
C->S OPTION request Ask S what methods are available
S->C OPTION response S responds with a public header field including all available methods provided
Step 2: Obtain media description information
C->S DESCRIBE request Request to obtain media description information provided by S
S->C DESCRIBE response S responds with media description information, generally SDP information
Step 3: Establish RTSP session
C->S SETUP request Transport header field lists acceptable transport options, requesting S to establish a session
S->C SETUP response S establishes a session and returns the selected specific transport options through the Transport header field, along with the established Session ID
Step 4: Request to start sending data
C->S PLAY request C requests S to start sending data
S->C PLAY response S responds to the request
Step 5: Data transmission during playback
S->C Send streaming media data Transmit data via RTP protocol
Step 6: Close session and exit
C->S TEARDOWN request C requests to close the session
S->C TEARDOWN response S responds to the request
The above process is just a standard, friendly RTSP flow, but actual requirements may not necessarily follow this process. Steps three and four are essential! Step one can be omitted as long as the server and client agree on which methods are available. Step two can also be skipped if we have other means to obtain the media initialization description information (such as HTTP requests, etc.), so we do not need to complete it through the DESCRIBE request in RTSP.