Network Knowledge#
UDP#
Message-oriented
UDP is a message-oriented protocol (a message can be understood as a segment of data). This means that UDP is just a carrier of messages and will not perform any splitting or stitching operations on the messages.
- At the sender, the application layer passes data to the transport layer's UDP protocol, which only adds a UDP header indicating that it is the UDP protocol, and then passes it to the network layer.
- At the receiver, the network layer passes the data to the transport layer, and UDP only removes the IP header and passes it to the application layer without any stitching operations.
Unreliability
- UDP is connectionless, meaning that communication does not require establishing and breaking connections.
- UDP is also unreliable. The protocol passes whatever data it receives and does not back up the data; it does not care whether the other party can receive it.
- UDP has no congestion control and will continuously send data at a constant speed. Even if network conditions are poor, it will not adjust the sending rate. The downside of this implementation is that it may lead to packet loss under poor network conditions, but the advantage is also clear: in certain scenarios that require high real-time performance (such as video conferencing), UDP is preferred over TCP.
Efficiency
Because UDP is not as complex as TCP, which needs to ensure that data is not lost and arrives in order, UDP has a small header overhead of only eight bytes, which is much less than TCP's minimum of twenty bytes, making it very efficient for transmitting data packets.
The header contains the following data:
- Two 16-bit port numbers, namely the source port (optional field) and the destination port.
- The length of the entire data packet.
- The checksum of the entire data packet (optional field in IPv4), which is used to detect errors in the header information and data.
Transmission methods
UDP supports not only one-to-one transmission but also one-to-many, many-to-many, and many-to-one methods, meaning that UDP provides unicast, multicast, and broadcast functionalities.
TCP#
For the TCP header, the following fields are very important:
- Sequence number, this number ensures that the packets transmitted by TCP are ordered, allowing the other party to stitch the packets in order based on the sequence number.
- Acknowledgement Number, this number indicates what the next byte number the data receiving end expects to receive, and also indicates that the data of the previous sequence number has been received.
- Window Size, the size of the window, indicates how many more bytes of data can be received, used for flow control.
- Identifier
- URG=1: This field indicates that the data portion of this packet contains urgent information, which is a high-priority data packet; at this time, the urgent pointer is valid. Urgent data is always located at the front of the current packet's data portion, and the urgent pointer indicates the end of the urgent data.
- ACK=1: This field indicates that the acknowledgment number field is valid. In addition, TCP also stipulates that all packets sent after the connection is established must set ACK to one.
- PSH=1: This field indicates that the receiving end should immediately push the data to the application layer, rather than waiting until the buffer is full to submit.
- RST=1: This field indicates that there is a serious problem with the current TCP connection, which may require re-establishing the TCP connection; it can also be used to reject illegal packets and connection requests.
- SYN=1: When SYN=1 and ACK=0, it indicates that the current packet is a connection request packet. When SYN=1 and ACK=1, it indicates that the current packet is a response packet agreeing to establish a connection.
- FIN=1: This field indicates that this packet is a request packet to release the connection.
State machine
HTTP is connectionless, so the underlying TCP protocol is also connectionless. Although it seems that TCP connects the two ends, in fact, both ends only maintain a state together.
The TCP state machine is very complex and closely related to the handshake when establishing and breaking connections. Next, we will describe the two types of handshakes in detail.
Before that, it is important to understand a key performance indicator: RTT. This indicator represents the round-trip time required for the sender to send data and receive data from the other end.
Three-way handshake
In the TCP protocol, the actively initiating side is the client, and the passively connecting side is called the server. Regardless of whether it is the client or the server, once the TCP connection is established, both can send and receive data, so TCP is also a full-duplex protocol.
Initially, both ends are in the CLOSED state. Before communication begins, both parties will create a TCB. After the server creates the TCB, it enters the LISTEN state, at which point it starts waiting for the client to send data.
- First handshake
The client sends a connection request packet to the server. This packet contains the client's initial data communication sequence number. After sending the request, the client enters the SYN-SENT state, where x represents the client's initial data communication sequence number.
- Second handshake
After the server receives the connection request packet, if it agrees to connect, it will send a response that also contains its own initial data communication sequence number, and after sending, it enters the SYN-RECEIVED state.
- Third handshake
When the client receives the response agreeing to the connection, it must also send a confirmation packet to the server. After the client sends this packet, it enters the ESTABLISHED state, and the server also enters the ESTABLISHED state after receiving this acknowledgment, at which point the connection is successfully established.
PS: The third handshake can include data through TCP Fast Open (TFO) technology. In fact, any protocol involving handshakes can use a similar TFO method, where the client and server store the same cookie, and the next handshake sends the cookie to reduce RTT.
Four-way handshake
TCP is full-duplex, and when disconnecting, both ends need to send FIN and ACK.
- First wave of handshakes
If client A believes that data transmission is complete, it needs to send a connection release request to server B.
- Second wave of handshakes
After B receives the connection release request, it will inform the application layer to release the TCP link. It will then send an ACK packet and enter the CLOSE_WAIT state, indicating that the connection from A to B has been released and it will no longer accept data from A. However, since the TCP connection is bidirectional, B can still send data to A.
- Third wave of handshakes
If B still has data that has not been sent, it will continue to send it. After finishing, it will send a connection release request to A, and then B will enter the LAST-ACK state.
PS: By using delayed acknowledgment technology (usually with a time limit; otherwise, the other party may mistakenly think a retransmission is needed), the second and third handshakes can be combined, delaying the sending of the ACK packet.
- Fourth wave of handshakes
After A receives the release request, it sends a confirmation response to B, at which point A enters the TIME-WAIT state. This state will last for 2MSL (Maximum Segment Lifetime, referring to the time a segment survives in the network; it will be discarded after timeout). If there is no retransmission request from B during this time, it will enter the CLOSED state. When B receives the acknowledgment, it also enters the CLOSED state.
Why does A enter the TIME-WAIT state and wait for 2MSL before entering the CLOSED state?
To ensure that B can receive A's acknowledgment. If A directly enters the CLOSED state after sending the acknowledgment, and if the acknowledgment does not arrive due to network issues, it will prevent B from closing normally.
ARQ Protocol#
The ARQ protocol is a timeout retransmission mechanism. It ensures the correct delivery of data through acknowledgment and timeout mechanisms. The ARQ protocol includes Stop-and-Wait ARQ and Continuous ARQ.
Normal transmission process
Whenever A sends a packet to B, it must stop sending and start a timer, waiting for a response from the other end. If it receives a response from the other end within the timer's duration, it cancels the timer and sends the next packet.
Packet loss or error
During the transmission of packets, packet loss may occur. In this case, if the timer exceeds the set time, it will resend the lost data until the other end responds, so it is necessary to back up the sent data each time.
Even if the packet is transmitted correctly to the other end, there may still be issues with packet errors during transmission. In this case, the other end will discard the packet and wait for A to retransmit.
PS: The timer is generally set to a duration greater than the average RTT.
ACK timeout or loss
The acknowledgment transmitted by the other end may also be lost or timeout. In this case, if the timer exceeds the set time, A will still retransmit the packet. When B receives the same sequence number packet, it will discard that packet and retransmit the acknowledgment until A sends the next sequence number packet.
In the case of a timeout, it is also possible for the acknowledgment to arrive late. At this time, A will determine whether that sequence number has been received before; if it has, it only needs to discard the acknowledgment.
The drawback of this protocol is that the transmission efficiency is low; in a good network environment, each time a packet is sent, it must wait for the other end's ACK.
Continuous ARQ#
In Continuous ARQ, the sender has a sending window and can continuously send data within the window without receiving an acknowledgment, which reduces waiting time and improves efficiency compared to Stop-and-Wait ARQ.
Cumulative acknowledgment
In Continuous ARQ, the receiving end will continuously receive packets. If it were to send an acknowledgment for each packet received as in Stop-and-Wait ARQ, it would waste resources. Through cumulative acknowledgment, it can reply with a single acknowledgment after receiving multiple packets. The acknowledgment in the packet can be used to inform the sender that all data with sequence numbers prior to this one have been received, and to send the next packet with sequence number + 1.
However, cumulative acknowledgment also has a drawback. When continuously receiving packets, it may encounter a situation where it receives packet number 5 but has not yet received packet number 6, while packets with numbers greater than 7 have already been received. In this case, the acknowledgment can only reply with 6, which will cause the sender to resend data unnecessarily. This situation can be resolved using SACK, which will be discussed later.
Congestion control
Congestion control is different from flow control; the latter acts on the receiving end to ensure that the receiver can keep up with the incoming data. The former acts on the network to prevent excessive data from congesting the network and avoid situations of excessive network load.
Congestion control includes four algorithms: Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery.
Slow Start Algorithm
As the name suggests, the Slow Start algorithm gradually increases the sending window exponentially at the beginning of transmission to avoid transmitting a large amount of data at once, which could lead to network congestion.
The steps of the Slow Start algorithm are as follows:
- Initially set the congestion window (Congestion Window) to 1 MSS (Maximum Segment Size).
- Every RTT, double the window size.
- Exponential growth cannot be unlimited, so there is a threshold limit; when the window size exceeds the threshold, the Congestion Avoidance algorithm will be activated.
Congestion Avoidance Algorithm
The Congestion Avoidance algorithm is simpler; it increases the window size by one every RTT, which helps avoid network congestion caused by exponential growth and gradually adjusts the size to the optimal value.
During transmission, if a timer times out, TCP will assume that the network is congested and will immediately take the following steps:
- Set the threshold to half of the current congestion window.
- Set the congestion window to 1 MSS.
- Activate the Congestion Avoidance algorithm.
Fast Retransmit
Fast Retransmit generally occurs together with Fast Recovery. Once the receiving end detects that the packets received are out of order, it will only reply with the last correctly ordered packet sequence number (in the absence of SACK). If three duplicate ACKs are received, it will initiate Fast Retransmit without waiting for the timer to timeout. The specific algorithm is divided into two types:
TCP Tahoe implementation is as follows:
- Set the threshold to half of the current congestion window.
- Set the congestion window to 1 MSS.
- Restart the Slow Start algorithm.
TCP Reno implementation is as follows:
- Halve the congestion window.
- Set the threshold to the current congestion window.
- Enter the Fast Recovery phase (retransmit the packets needed by the other end; once a new ACK is received, exit this phase).
- Use the Congestion Avoidance algorithm.
TCP New Reno improved Fast Recovery
The TCP New Reno algorithm improved the shortcomings of the previous TCP Reno algorithm. Previously, in Fast Recovery, as long as a new ACK packet was received, it would exit Fast Recovery.
In TCP New Reno, the TCP sender first records the maximum sequence number of the segments corresponding to three duplicate ACKs.
Suppose I have a segment of data with sequence numbers 1 to 10, where packets with sequence numbers 3 and 7 are lost. The maximum sequence number for this segment is 10. The sender will only receive an ACK for sequence number 3. At this point, it will retransmit the packet with sequence number 3, and the receiver will successfully receive it and send an ACK for sequence number 7. At this point, TCP knows that the other end has not received multiple packets and will continue to send the packet with sequence number 7, which the receiver will successfully receive and send an ACK for sequence number 11. The sender will then consider that this segment has been successfully received by the receiver and will exit the Fast Recovery phase.
HTTP#
The HTTP protocol is a stateless protocol and does not maintain state.
Differences between Post and Get
First, let's introduce the concepts of side effects and idempotence.
A side effect refers to making changes to resources on the server; searching has no side effects, while registration has side effects.
Idempotence means that sending M and N requests (both different and greater than 1) results in the same state of resources on the server. For example, registering 10 and 11 accounts is not idempotent, while making 10 and 11 changes to an article is idempotent.
In normative application scenarios, Get is more commonly used in situations without side effects and that are idempotent, such as searching for keywords. Post is more commonly used in situations with side effects and that are not idempotent, such as registration.
Technically speaking:
- Get requests can be cached, while Post cannot.
- Post is relatively safer than Get because Get requests are included in the URL and will be saved in the browser's history, while Post requests will not; however, in the case of packet capture, both are the same.
- Post can transmit more data than Get through the request body; Get does not have this capability.
- URLs have length limits, which can affect Get requests, but this length limit is specified by the browser, not by the RFC.
- Post supports more encoding types and does not restrict data types.
Common status codes
2XX Success
- 200 OK, indicates that the request sent from the client has been correctly processed by the server.
- 204 No content, indicates that the request was successful, but the response does not contain the body.
- 205 Reset Content, indicates that the request was successful, but the response does not contain the body; however, unlike the 204 response, it requires the requester to reset the content.
- 206 Partial Content, for range requests.
3XX Redirection
- 301 moved permanently, permanent redirection, indicates that the resource has been assigned a new URL.
- 302 found, temporary redirection, indicates that the resource has been temporarily assigned a new URL.
- 303 see other, indicates that the resource exists at another URL and should be retrieved using the GET method.
- 304 not modified, indicates that the server allows access to the resource, but the request did not meet the conditions.
- 307 temporary redirect, temporary redirection, similar to 302, but expects the client to keep the request method unchanged when sending requests to the new address.
4XX Client errors
- 400 bad request, indicates that there is a syntax error in the request.
- 401 unauthorized, indicates that the sent request requires authentication information via HTTP.
- 403 forbidden, indicates that access to the requested resource is denied by the server.
- 404 not found, indicates that the requested resource was not found on the server.
5XX Server errors
- 500 internal server error, indicates that an error occurred on the server while processing the request.
- 501 Not Implemented, indicates that the server does not support a certain feature required by the current request.
- 503 service unavailable, indicates that the server is temporarily overloaded or undergoing maintenance and cannot process the request.
HTTP Headers
General Fields | Function |
---|---|
Cache-Control | Controls caching behavior |
Connection | The type of connection the browser wants to prioritize, such as keep-alive |
Date | The time the message was created |
Pragma | Message directives |
Via | Information related to proxy servers |
Transfer-Encoding | The encoding method for transmission |
Warning | Indicates that there may be errors in the content |
Upgrade | Requests the client to upgrade the protocol |
Response Fields | Function |
---|---|
Accept-Ranges | Indicates whether certain types of ranges are supported |
Age | The time the resource has existed in the proxy cache |
ETag | Resource identifier |
Location | Redirects the client to a certain URL |
Proxy-Authenticate | Sends authentication information to the proxy server |
Server | Server name |
WWW-Authenticate | WWW-Authenticate |
Entity Fields | Function |
---|---|
Allow | The correct request methods for the resource |
Content-Encoding | The encoding format of the content |
Accept-Encoding | The list of acceptable encoding formats |
Content-Language | The language used in the content |
Content-Length | Length of the request body |
Content-Location | Backup address for returned data |
Content-MD5 | MD5 checksum value of the content in Base64 format |
Content-Range | The range of the content |
Content-Type | The media type of the content |
Expires | Expiration time of the content |
Last-modified | The last modification time of the content |
HTTPS#
HTTPS still transmits information through HTTP, but the information is encrypted using the TLS protocol.
TLS#
The TLS protocol is located above the transport layer and below the application layer. The first TLS protocol transmission requires two RTTs, and subsequent transmissions can be reduced to one RTT through Session Resumption.
TLS uses two types of encryption technologies: symmetric encryption and asymmetric encryption.
Symmetric encryption:
Symmetric encryption means that both sides have the same secret key, and both sides know how to encrypt and decrypt the ciphertext.
Asymmetric encryption:
There are public and private keys; the public key can be known by anyone and can be used to encrypt data, but the data must be decrypted using the private key, which is known only to the party that distributes the public key.
The TLS handshake process is illustrated below:
- The client sends a random value, the required protocol, and encryption method.
- The server receives the client's random value, generates its own random value, and sends its certificate based on the protocol and encryption method required by the client (if client certificate verification is needed, it must be specified).
- The client receives the server's certificate and verifies its validity. If the verification passes, it generates another random value, encrypts this random value using the server's certificate public key, and sends it to the server. If the server requires client certificate verification, it will attach the certificate.
- The server receives the encrypted random value and uses its private key to decrypt it to obtain the third random value. At this point, both sides have three random values and can generate a key using these three random values according to the previously agreed encryption method. Subsequent communication can be encrypted and decrypted using this key.
From the above steps, it can be seen that during the TLS handshake phase, both sides use asymmetric encryption for communication. However, due to the performance overhead of asymmetric encryption compared to symmetric encryption, both sides use symmetric encryption for formal data transmission.
PS: The above explanation pertains to the handshake situation of the TLS 1.2 protocol. In the 1.3 protocol, establishing a connection for the first time only requires one RTT, and subsequent reconnections do not require RTT.
HTTP 2.0#
HTTP 2.0 significantly improves web performance compared to HTTP 1.X.
In HTTP 1.X, to improve performance, we would introduce sprite images, inline small images, use multiple domain names, etc. All of this is due to the browser's limitation on the number of requests under the same domain. When a page needs to request many resources, head-of-line blocking can cause remaining resource requests to wait until other resource requests are completed when the maximum request number is reached.
Binary transmission
The core point of all performance enhancements in HTTP 2.0 lies here. In previous HTTP versions, we transmitted data in text format. In HTTP 2.0, a new encoding mechanism is introduced, and all transmitted data is split and encoded in binary format.
Multiplexing
In HTTP 2.0, there are two very important concepts: frames and streams.
A frame represents the smallest unit of data, and each frame identifies which stream it belongs to. A stream is a data flow composed of multiple frames.
Multiplexing means that multiple streams can exist within a single TCP connection. In other words, multiple requests can be sent, and the other end can identify which request each frame belongs to. This technology can avoid the head-of-line blocking issue present in older versions of HTTP, greatly improving transmission performance.
Header compression
In HTTP 1.X, we transmitted headers in text format, which could require repeating hundreds to thousands of bytes when cookies were included in the headers.
In HTTP 2.0, the HPACK compression format is used to encode the transmitted headers, reducing their size. An index table is maintained on both ends to record previously seen headers, allowing for the transmission of just the key names of recorded headers during subsequent transmissions. The receiving end can then find the corresponding values using these key names.
Server Push
In HTTP 2.0, the server can proactively push other resources after a client request.
Consider the following situation: certain resources are guaranteed to be requested by the client. In this case, the server push technology can be used to proactively send necessary resources to the client, thereby reducing latency. Of course, in cases where browser compatibility is a concern, you can also use prefetch.
DNS#
The role of DNS is to query the specific IP through the domain name.
Since IP addresses consist of a combination of numbers and letters (IPv6), which are not easy for humans to remember, domain names were created. You can think of a domain name as an alias for an IP address, and DNS is used to query the true name of this alias.
DNS queries are performed before the TCP handshake, and this query is done by the operating system itself. When you want to access www.google.com in your browser, the following operations occur:
- The operating system first checks the local cache.
- If not found, it queries the DNS server configured in the system.
- If it still cannot find it, it directly queries the DNS root server, which will find the server responsible for the com top-level domain.
- Then it queries that server for the google second-level domain.
- The query for the third-level domain is actually what we configure; you can assign an IP to the www domain and also assign IPs to other third-level domains.
The above describes DNS iterative queries; there is also recursive querying, where the former is initiated by the client and the latter is initiated by the system-configured DNS server, which returns the result to the client.
PS: DNS queries are based on UDP.
The process from inputting a URL to the page loading completion:
- First, a DNS query is made. If intelligent DNS resolution is performed at this step, it will provide the fastest IP address for access.
- Next is the TCP handshake, where the application layer sends data to the transport layer. The TCP protocol specifies the port numbers for both ends and then sends it to the network layer. The IP protocol in the network layer determines the IP address and indicates how to route the data through routers. The packet is then encapsulated into the data frame structure of the data link layer, and finally transmitted at the physical layer.
- After the TCP handshake is completed, a TLS handshake occurs, and then formal data transmission begins.
- Before the data enters the server, it may first pass through a load-balancing server, which distributes requests reasonably across multiple servers. Assuming the server responds with an HTML file.
- The browser first checks the status code. If it is 200, it continues parsing; if it is 400 or 500, it reports an error; if it is 300, it performs a redirection. There will be a redirection counter to avoid excessive redirections, and exceeding the limit will also result in an error.
- The browser begins to parse the file. If it is in gzip format, it will decompress it first, and then determine how to decode the file based on its encoding format.
- Once the file is successfully decoded, the rendering process officially begins. It first constructs the DOM tree based on the HTML. If there is CSS, it will construct the CSSOM tree. If it encounters a script tag, it will check for the presence of async or defer; the former will download and execute the JS in parallel, while the latter will download the file and wait for the HTML parsing to complete before executing in order. If neither is present, it will block the rendering process until the JS execution is complete. If there are files to download, it will download them. If HTTP 2.0 is used, it will greatly improve the efficiency of downloading multiple images.
- Once the initial HTML is fully loaded and parsed, the DOMContentLoaded event is triggered.
- After the CSSOM tree and DOM tree are constructed, the Render tree will be generated. This step determines the layout, style, and many other aspects of the page elements.
- During the generation of the Render tree, the browser begins to call the GPU for rendering, composing layers, and displaying content on the screen.