Network Knowledge#
UDP#
Message-oriented
UDP is a message-oriented protocol (messages can be understood as segments of data). This means that UDP is merely a carrier of messages and does not perform any splitting or stitching operations on them.
- At the sender end, the application layer passes data to the transport layer's UDP protocol, which only adds a UDP header indicating that it is the UDP protocol, and then passes it to the network layer.
- At the receiver end, the network layer passes the data to the transport layer, and UDP only removes the IP header before passing it to the application layer, without any stitching operations.
Unreliability
- UDP is connectionless, meaning that communication does not require establishing and breaking connections.
- UDP is also unreliable. The protocol transmits whatever data it receives and does not back up the data, nor does it care whether the other party receives it.
- UDP has no congestion control and will continuously send data at a constant rate. Even if network conditions are poor, it will not adjust the sending rate. The downside of this implementation is that it may lead to packet loss under poor network conditions, but the advantage is clear: in certain scenarios that require high real-time performance (such as video conferencing), UDP is preferred over TCP.
Efficiency
Because UDP is not as complex as TCP, which needs to ensure that data is not lost and arrives in order, UDP has a small header overhead of only eight bytes, which is much less than TCP's minimum of twenty bytes, making it very efficient for transmitting data packets.
The header contains the following data:
- Two 16-bit port numbers, namely the source port (optional field) and the destination port.
- The length of the entire data packet.
- The checksum of the entire data packet (optional field in IPv4), which is used to detect errors in the header information and data.
Transmission methods
UDP supports not only one-to-one transmission but also one-to-many, many-to-many, and many-to-one transmission methods, meaning that UDP provides unicast, multicast, and broadcast functionalities.
TCP#
For the TCP header, the following fields are very important:
- Sequence number, which ensures that the packets transmitted by TCP are in order, allowing the other party to stitch packets together in sequence.
- Acknowledgement Number, which indicates the number of the next byte that the data receiving end expects to receive, and also indicates that the data of the previous sequence number has been received.
- Window Size, which indicates how many more bytes of data can be received and is used for flow control.
- Identifier
- URG=1: This field indicates that the data portion of this packet contains urgent information, which is a high-priority data packet, and the urgent pointer is valid. Urgent data is always located at the front of the current packet's data portion, and the urgent pointer indicates the end of the urgent data.
- ACK=1: This field indicates that the acknowledgment number field is valid. Additionally, TCP specifies that all segments transmitted after the connection is established must set ACK to 1.
- PSH=1: This field indicates that the receiving end should immediately push the data to the application layer, rather than waiting until the buffer is full before submitting.
- RST=1: This field indicates that there is a serious problem with the current TCP connection, and it may need to re-establish the TCP connection. It can also be used to reject illegal segments and connection requests.
- SYN=1: When SYN=1 and ACK=0, it indicates that the current segment is a connection request segment. When SYN=1 and ACK=1, it indicates that the current segment is a response segment agreeing to establish a connection.
- FIN=1: This field indicates that this segment is a request to release the connection.
State machine
HTTP is connectionless, so the underlying TCP protocol is also connectionless. Although it seems that TCP connects the two ends, in fact, it is just that both ends maintain a state together.
The TCP state machine is very complex and closely related to the handshake when establishing and breaking connections. Next, we will detail the two types of handshakes.
Before that, it is important to understand a key performance metric: RTT. This metric indicates the round-trip time required for the sender to send data and receive data from the other end.
Three-way handshake
In the TCP protocol, the side that actively initiates the request is the client, while the side that passively connects is called the server. Regardless of whether it is the client or the server, once the TCP connection is established, both can send and receive data, so TCP is also a full-duplex protocol.
Initially, both ends are in the CLOSED state. Before communication begins, both parties will create a TCB. After the server creates the TCB, it enters the LISTEN state, at which point it starts waiting for the client to send data.
- First handshake
The client sends a connection request segment to the server. This segment contains the client's initial data communication sequence number. After sending the request, the client enters the SYN-SENT state, where x represents the client's initial data communication sequence number.
- Second handshake
After the server receives the connection request segment, if it agrees to the connection, it will send a response that also contains its own initial data communication sequence number, and after sending, it enters the SYN-RECEIVED state.
- Third handshake
When the client receives the response agreeing to the connection, it must send a confirmation segment to the server. After sending this segment, the client enters the ESTABLISHED state, and the server also enters the ESTABLISHED state after receiving this acknowledgment, at which point the connection is successfully established.
PS: The third handshake can include data through the TCP Fast Open (TFO) technology. In fact, any protocol involving handshakes can use a similar TFO method, where the client and server store the same cookie, and the next handshake sends the cookie to reduce RTT.
Four-way handshake
TCP is full-duplex, and when disconnecting, both ends need to send FIN and ACK.
- First wave of the handshake
If client A believes that data transmission is complete, it needs to send a connection release request to server B.
- Second wave of the handshake
After B receives the connection release request, it will inform the application layer to release the TCP connection. It will then send an ACK packet and enter the CLOSE_WAIT state, indicating that the connection from A to B has been released and it will no longer accept data from A. However, because the TCP connection is bidirectional, B can still send data to A.
- Third wave of the handshake
If B still has data that has not been sent, it will continue to send it. After finishing, it will send a connection release request to A, and then B will enter the LAST-ACK state.
PS: By using delayed acknowledgment technology (usually with a time limit, otherwise the other party may mistakenly think a retransmission is needed), the second and third handshakes can be combined, delaying the sending of the ACK packet.
- Fourth wave of the handshake
After A receives the release request, it sends a confirmation response to B, at which point A enters the TIME-WAIT state. This state will last for 2MSL (Maximum Segment Lifetime, which refers to the time a segment survives in the network before being discarded). If there are no retransmission requests from B during this time, it will enter the CLOSED state. When B receives the acknowledgment, it also enters the CLOSED state.
Why does A enter the TIME-WAIT state and wait for 2MSL before entering the CLOSED state?
To ensure that B can receive A's acknowledgment. If A directly enters the CLOSED state after sending the acknowledgment, and the acknowledgment does not arrive due to network issues, B will not be able to close properly.
ARQ Protocol#
The ARQ protocol is a timeout retransmission mechanism. It ensures the correct delivery of data through acknowledgment and timeout mechanisms, and the ARQ protocol includes Stop-and-Wait ARQ and Continuous ARQ.
Normal transmission process
Whenever A sends a segment to B, it must stop sending and start a timer, waiting for a response from the other end. If it receives a response from the other end within the timer's duration, it cancels the timer and sends the next segment.
Packet loss or error
During the transmission of packets, packet loss may occur. In this case, if the timer exceeds the set time, the lost data will be retransmitted until a response is received from the other end, so it is necessary to back up the sent data each time.
Even if the packet is transmitted correctly to the other end, there may still be errors during transmission. In this case, the other end will discard the packet and wait for A to retransmit.
PS: The timer is generally set to a duration greater than the average RTT.
ACK timeout or loss
The acknowledgment transmitted by the other end may also be lost or timeout. In this case, if the timer exceeds the set time, A will retransmit the packet. When B receives a packet with the same sequence number, it will discard that packet and retransmit the acknowledgment until A sends the next sequence number packet.
In the case of a timeout, the acknowledgment may also arrive late. At this point, A will check whether the sequence number has been received before; if it has, it can simply discard the acknowledgment.
The downside of this protocol is low transmission efficiency. In a good network environment, each time a packet is sent, it must wait for the other end's ACK.
Continuous ARQ#
In Continuous ARQ, the sender has a sending window and can continuously send data within the window without waiting for an acknowledgment, which reduces waiting time compared to the Stop-and-Wait ARQ protocol and improves efficiency.
Cumulative acknowledgment
In Continuous ARQ, the receiving end continuously receives packets. If it were to send an acknowledgment for each packet received, as in Stop-and-Wait ARQ, it would waste resources. Through cumulative acknowledgment, it can reply with a single acknowledgment after receiving multiple packets. The acknowledgment in the packet can inform the sender that all data with sequence numbers prior to this one has been received, and the next sequence number + 1 should be sent.
However, cumulative acknowledgment also has a drawback. During continuous reception of packets, it may happen that after receiving packet number 5, packet number 6 has not been received, while packets numbered 7 and beyond have already been received. In this case, the acknowledgment can only reply with 6, which will cause the sender to resend data. This situation can be resolved using SACK, which will be discussed later.
Congestion control
Congestion control is different from flow control, which acts on the receiving end to ensure that the receiver can keep up with the incoming data. The former acts on the network to prevent excessive data from congesting the network and avoid situations where the network load is too high.
Congestion control includes four algorithms: Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery.
Slow Start Algorithm
As the name suggests, the Slow Start algorithm gradually increases the sending window exponentially at the beginning of transmission to avoid sending a large amount of data at once, which could lead to network congestion.
The steps of the Slow Start algorithm are as follows:
- Initially set the congestion window (Congestion Window) to 1 MSS (Maximum Segment Size).
- Every RTT, double the window size.
- Exponential growth cannot be unlimited, so there is a threshold limit. When the window size exceeds the threshold, the Congestion Avoidance algorithm will be activated.
Congestion Avoidance Algorithm
The Congestion Avoidance algorithm is simpler; it increases the window size by one every RTT, which can avoid network congestion caused by exponential growth and gradually adjust the size to the optimal value.
During transmission, if a timer times out, TCP will assume that the network is congested and will immediately perform the following steps:
- Set the threshold to half of the current congestion window.
- Set the congestion window to 1 MSS.
- Activate the Congestion Avoidance algorithm.
Fast Retransmit
Fast Retransmit generally occurs together with Fast Recovery. Once the receiving end detects that packets have arrived out of order, it will only reply with the last correctly ordered packet's sequence number (in the absence of SACK). If three duplicate ACKs are received, it will initiate Fast Retransmit without waiting for the timer to timeout. The specific algorithm is divided into two types:
TCP Tahoe implementation:
- Set the threshold to half of the current congestion window.
- Set the congestion window to 1 MSS.
- Restart the Slow Start algorithm.
TCP Reno implementation:
- Halve the congestion window.
- Set the threshold to the current congestion window.
- Enter the Fast Recovery phase (retransmit the packets needed by the other end; once a new ACK is received, exit this phase).
- Use the Congestion Avoidance algorithm.
TCP New Reno improved Fast Recovery
The TCP New Reno algorithm improved upon the shortcomings of the previous TCP Reno algorithm. Previously, in Fast Recovery, as soon as a new ACK packet was received, it would exit Fast Recovery.
In TCP New Reno, the TCP sender first records the maximum sequence number of the segments corresponding to the three duplicate ACKs.
For example, if I have a segment of data with sequence numbers 1 to 10, where packets numbered 3 and 7 are lost, the maximum sequence number for this segment is 10. The sender will only receive an ACK for sequence number 3. At this point, it will retransmit the packet for sequence number 3, and the receiver will successfully receive it and send an ACK for sequence number 7. The TCP sender will then know that the other end has not received multiple packets and will continue to send the packet for sequence number 7. Once the receiver successfully receives it, it will send an ACK for sequence number 11. At this point, the sender believes that the segment has been successfully received by the receiver and will exit the Fast Recovery phase.
HTTP#
The HTTP protocol is a stateless protocol and does not maintain state.
Differences between Post and Get
First, let's introduce the concepts of side effects and idempotence.
Side effects refer to changes made to resources on the server; searching is without side effects, while registration has side effects.
Idempotence means that sending M and N requests (where both are different and greater than 1) results in the same state of resources on the server. For example, registering 10 and 11 accounts is not idempotent, while making 10 and 11 changes to an article is idempotent.
In normative application scenarios, Get is more commonly used for operations without side effects and idempotent scenarios, such as searching for keywords. Post is more commonly used for operations with side effects and non-idempotent scenarios, such as registration.
From a technical perspective:
- Get requests can be cached, while Post cannot.
- Post is slightly safer than Get because Get requests are included in the URL and will be saved in the browser's history, while Post will not. However, in the case of packet capture, both are the same.
- Post can transmit more data than Get through the request body; Get does not have this capability.
- URLs have length limits, which can affect Get requests, but this length limit is defined by the browser, not by the RFC.
- Post supports more encoding types and does not impose restrictions on data types.
Common status codes
2XX Success
- 200 OK: Indicates that the request sent from the client has been correctly processed on the server.
- 204 No content: Indicates that the request was successful, but the response does not contain the body of the entity.
- 205 Reset Content: Indicates that the request was successful, but the response does not contain the body of the entity; however, unlike the 204 response, it requires the requester to reset the content.
- 206 Partial Content: Indicates a range request.
3XX Redirection
- 301 moved permanently: Permanent redirection, indicating that the resource has been assigned a new URL.
- 302 found: Temporary redirection, indicating that the resource has been temporarily assigned a new URL.
- 303 see other: Indicates that the resource exists at another URL and should be retrieved using the GET method.
- 304 not modified: Indicates that the server allows access to the resource but the request did not meet the conditions.
- 307 temporary redirect: Temporary redirection, similar to 302, but expects the client to maintain the request method unchanged when sending requests to the new address.
4XX Client errors
- 400 bad request: Indicates that there is a syntax error in the request.
- 401 unauthorized: Indicates that the sent request requires authentication information through HTTP.
- 403 forbidden: Indicates that access to the requested resource is denied by the server.
- 404 not found: Indicates that the requested resource was not found on the server.
5XX Server errors
- 500 internal server error: Indicates that an error occurred on the server while executing the request.
- 501 Not Implemented: Indicates that the server does not support a certain feature required by the current request.
- 503 service unavailable: Indicates that the server is temporarily overloaded or undergoing maintenance and cannot process the request.
HTTP Headers
General Fields | Function |
---|---|
Cache-Control | Controls caching behavior |
Connection | The type of connection the browser wants to prioritize, such as keep-alive |
Date | The time the message was created |
Pragma | Message directives |
Via | Information related to proxy servers |
Transfer-Encoding | The encoding method for transmission |
Warning | Indicates that there may be errors in the content |
Upgrade | Requests the client to upgrade the protocol |
Response Fields | Function |
---|---|
Accept-Ranges | Indicates whether certain types of ranges are supported |
Age | The time the resource has existed in the proxy cache |
ETag | Resource identifier |
Location | Redirects the client to a certain URL |
Proxy-Authenticate | Sends authentication information to the proxy server |
Server | Server name |
WWW-Authenticate | WWW-Authenticate |
Entity Fields | Function |
---|---|
Allow | The correct request methods for the resource |
Content-Encoding | The encoding format of the content |
Accept-Encoding | The list of acceptable encoding formats |
Content-Language | The language used in the content |
Content-Length | Length of the request body |
Content-Location | The alternative address for the returned data |
Content-MD5 | MD5 checksum of the content in Base64 format |
Content-Range | The range of the content |
Content-Type | The media type of the content |
Expires | The expiration time of the content |
Last-modified | The last modification time of the content |
HTTPS#
HTTPS still transmits information through HTTP, but the information is encrypted using the TLS protocol.
TLS#
The TLS protocol is located above the transport layer and below the application layer. The first TLS protocol transmission requires two RTTs, and subsequent transmissions can be reduced to one RTT through Session Resumption.
TLS uses two types of encryption technologies: symmetric encryption and asymmetric encryption.
Symmetric encryption:
Symmetric encryption means that both sides have the same key and both know how to encrypt and decrypt the ciphertext.
Asymmetric encryption:
There are public and private keys; the public key can be known by anyone and can be used to encrypt data, but the data must be decrypted using the private key, which is known only to the party distributing the public key.
The TLS handshake process is illustrated below:
- The client sends a random value, the required protocol, and encryption methods.
- The server receives the client's random value, generates its own random value, and sends its certificate using the corresponding method based on the client's required protocol and encryption method (if client certificate verification is needed, it must be specified).
- The client receives the server's certificate and verifies its validity. If the verification passes, it generates another random value, encrypts this random value using the public key of the server's certificate, and sends it to the server. If the server requires client certificate verification, it will also attach the certificate.
- The server receives the encrypted random value and uses its private key to decrypt it to obtain the third random value. At this point, both sides have three random values and can generate a key using these three random values according to the previously agreed encryption method. Subsequent communication can be encrypted and decrypted using this key.
From the above steps, it can be seen that during the TLS handshake phase, both sides use asymmetric encryption for communication. However, due to the performance overhead of asymmetric encryption compared to symmetric encryption, both sides use symmetric encryption for formal data transmission.
PS: The above explanation pertains to the handshake situation of the TLS 1.2 protocol. In the 1.3 protocol, establishing a connection for the first time only requires one RTT, and subsequent reconnections do not require RTT.
HTTP 2.0#
Compared to HTTP 1.X, HTTP 2.0 significantly improves web performance.
In HTTP 1.X, to improve performance, we would introduce techniques such as sprite images, inlining small images, and using multiple domain names. All of this is due to the browser's limitation on the number of requests under the same domain. When a page needs to request many resources, head-of-line blocking can cause remaining resource requests to wait until other resource requests are completed.
Binary transmission
The core point of all performance enhancements in HTTP 2.0 lies here. In previous versions of HTTP, we transmitted data in text format. In HTTP 2.0, a new encoding mechanism is introduced, and all transmitted data is split and encoded in binary format.
Multiplexing
In HTTP 2.0, there are two very important concepts: frames and streams.
A frame represents the smallest unit of data, and each frame identifies which stream it belongs to. A stream is a data flow composed of multiple frames.
Multiplexing allows multiple streams to exist within a single TCP connection. In other words, multiple requests can be sent, and the other end can identify which request each frame belongs to. This technology can avoid the head-of-line blocking issue present in older versions of HTTP, greatly improving transmission performance.
Header compression
In HTTP 1.X, we transmitted headers in text format, which could result in several hundred to thousands of bytes being repeatedly transmitted when headers contained cookies.
In HTTP 2.0, the HPACK compression format is used to encode the transmitted headers, reducing their size. An index table is maintained on both ends to record previously seen headers, allowing for the transmission of just the key names of recorded headers during subsequent transmissions. The receiving end can then find the corresponding values using these key names.
Server Push
In HTTP 2.0, the server can proactively push other resources to the client after a certain request.
One can imagine situations where certain resources are guaranteed to be requested by the client. In such cases, the server push technology can be used to proactively send necessary resources to the client, thereby reducing latency. Of course, you can also use prefetching if browser compatibility allows.
DNS#
The role of DNS is to query the specific IP through a domain name.
Since IP addresses consist of a combination of numbers and letters (IPv6), which are not easy for humans to remember, domain names were created. You can think of a domain name as an alias for an IP address, and DNS is used to query what the real name of this alias is.
DNS queries are performed before the TCP handshake, and this query is done by the operating system itself. When you want to access www.google.com in your browser, the following operations occur:
- The operating system first checks the local cache.
- If not found, it queries the DNS server configured in the system.
- If it still doesn't find it, it directly queries the DNS root server, which will find the server responsible for the top-level domain com.
- Then it queries that server for the second-level domain google.
- The query for the third-level domain is actually what we configure; you can assign an IP to the www domain and also assign IPs to other third-level domains.
The above describes the iterative query of DNS; there is also a recursive query, where the difference is that the former is requested by the client, while the latter is requested by the system-configured DNS server, which returns the data to the client after obtaining the result.
PS: DNS queries are performed over UDP.
The process from entering a URL to loading a page completely:
- First, perform a DNS query. If this step involves intelligent DNS resolution, it will provide the fastest IP address for access.
- Next is the TCP handshake, where the application layer sends data to the transport layer. The TCP protocol specifies the port numbers for both ends and then sends it to the network layer. The IP protocol in the network layer determines the IP address and indicates how to route the data through routers. The packet is then encapsulated into the data frame structure of the data link layer, and finally transmitted at the physical layer.
- After the TCP handshake is completed, a TLS handshake occurs, and then data transmission officially begins.
- Before the data enters the server, it may first pass through a load-balancing server, which distributes requests reasonably across multiple servers. At this point, let's assume the server responds with an HTML file.
- The browser first checks the status code. If it is 200, it continues parsing; if it is 400 or 500, it reports an error; if it is 300, it performs a redirection. There will be a redirection counter to avoid excessive redirections, and exceeding the limit will also result in an error.
- The browser begins to parse the file. If it is in gzip format, it will decompress it first, and then determine how to decode the file based on its encoding format.
- Once the file is successfully decoded, the rendering process officially begins. It will first construct the DOM tree based on the HTML. If there is CSS, it will construct the CSSOM tree. If it encounters a script tag, it will check for the presence of async or defer. The former will download and execute the JS in parallel, while the latter will download the file and wait for the HTML parsing to complete before executing in order. If neither is present, it will block the rendering process until the JS execution is complete. If there are files to download, it will initiate the download. If HTTP 2.0 is used, it will greatly improve the efficiency of downloading multiple images.
- Once the initial HTML is fully loaded and parsed, the DOMContentLoaded event is triggered.
- After the CSSOM tree and DOM tree are constructed, the Render tree will begin to be generated. This step determines the layout, styles, and many other aspects of the page elements.
- During the generation of the Render tree, the browser begins to call the GPU for rendering, composing layers, and displaying the content on the screen.