Introduction
The article will introduce how to reverse proxy the WebSocket protocol.
We’ll start with the basics, understanding reverse proxying and the WebSocket protocol. Additionally, we’ll delve into learning how to implement and explore relevant details by reading the source code of Hertz’s open-source websocket reverseproxy extension.
- Hertz: ByteDance’s open-source high-performance Golang HTTP framework;
- reverseproxy: A WebSocket reverse proxy extension for the Hertz framework, inspired by fasthttp-reverse-proxy for WebSocket reverse proxying.
Basic knowledge
What is a reverse proxy?
A reverse proxy is a type of proxy server positioned between internal servers and an external network, used to handle requests to internal servers. When a client sends a request, it doesn’t directly access the target server but does so through the reverse proxy server. This proxy server is responsible for forwarding the request to one or more target servers and returning the obtained response to the client.
The typical functions of a reverse proxy are as follows:
- Hide Backend Servers: Clients communicate only with the reverse proxy, not directly with the backend servers, thereby concealing information about the backend servers.
- Load Balancing: The reverse proxy can distribute requests among multiple backend servers based on load, balancing the server load.
- Cache Static Resources: It can cache static content to reduce server load and enhance response speed.
- Security: It can act as a firewall, filtering malicious requests to improve security.
A schematic representation of a reverse proxy is as follows: NGINX is a commonly used reverse proxy server (often referred to as a gateway), where user requests are directed to the reverse proxy server, which then forwards them to a cluster of backend servers.
What is WebSocket?
WebSocket is a protocol that enables full-duplex communication over a single TCP connection. Unlike the traditional HTTP request-response model, WebSocket allows the establishment of persistent connections between servers and clients, enabling bidirectional communication for real-time data transmission. It permits servers to actively push data to clients without requiring client-initiated requests.
- Full-Duplex Communication: Allows bidirectional communication between servers and clients, enabling real-time sending and receiving of data.
- Low Latency: Established over a single TCP connection, reducing connection setup overhead and lowering communication latency.
- Reduced Data Transfer Overhead: Compared to traditional HTTP polling, WebSocket minimizes data transfer overhead, making data transmission more efficient.
Implementation Approach
After grasping the fundamental knowledge, we can now start considering the specific implementation approach.
Terminology Agreement
Let’s first establish a few terms that will be used to facilitate our understanding of the entire process:
- Client: User;
- Proxy Server: Reverse proxy server, for instance, NGINX mentioned earlier and the Hertz server used here;
- Backend Server: The actual server that receives and handles client requests, not directly communicating with the client;
Basic Concept
As we’re implementing a reverse proxy service based on the Hertz HTTP framework, the reverse proxy server here is Hertz itself. To enable websocket reverse proxying, we can achieve this by providing an http.Handler
. This handler can forward the request for establishing a websocket connection from the user to the backend server. Subsequently, a websocket connection is established between the proxy server and the backend server, and another between the client and the proxy server. The proxy server manages message transmission between these two connections.
The process of establishing a websocket reverse proxy from connection setup to completion is outlined below:
- 1. The client initiates a handshake request to establish a websocket connection with the proxy server.
- 2. The proxy server forwards the request to the backend server.
- 3. The proxy server establishes a websocket connection with the backend server.
- 4. The client establishes a websocket connection with the proxy server.
Once all connections are established, the websocket reverse proxy is set up. Below is an example of a client sending information through the websocket reverse proxy:
-
5. The client writes a message to the websocket connection established with the proxy server.
-
6. The proxy server reads the message from the websocket connection with the client.
-
7. The proxy server writes the read message to the websocket connection established with the backend server.
I’ve also provided a sequence diagram where the numbering corresponds to the steps outlined above. Readers can refer to the diagram to better understand the entire process of reverse proxying websocket connections.
Specific Implementation
With the basic concept of implementation in place, we can now delve into the specific implementation details. Here, we will comprehend the concrete implementation process by reviewing the code in the Hertz framework websocket reverse proxy extension.
The code for websocket reverse proxy in this extension resides in two main files:
- ws_reverse_proxy.go: Contains the primary business logic for websocket reverse proxying.
- ws_reverse_proxy_option.go: Provides customizable configuration options for users.
Readers can either clone this repository and review the code alongside or directly examine the source code analysis provided below.
Main Struct (WSReverseProxy)
The main structure of the websocket reverse proxy appears to be quite simple, consisting of only two fields:
- target: The target address for reverse proxying, which is the path to the backend server.
- options: Configuration options. WSReverseProxy offers three customizable configuration options to users in the form of Functional Options.
Constructor (NewWSReverseProxy)
The logic of the constructor method is as follows:
- It begins by checking whether the backend server path provided in the method parameter is empty. If it’s empty, it directly triggers a panic.
- Initialization of configuration options occurs next. If the user has configured any options, it uses those values; otherwise, it resorts to default values.
- Creation of the main structure takes place, followed by assigning the values.
- Finally, the populated main structure is returned to the user.
Core Method ServeHTTP
The ServeHTTP
method is the core function for implementing websocket reverse proxying. As mentioned earlier, it serves as the Handler. Users utilize an instance of the main structure returned by the constructor method to invoke ServeHTTP
and register the corresponding routes to facilitate the process of websocket reverse proxying.
Given that ServeHTTP
is quite extensive, I’ll break down the method into four parts according to the process of implementing websocket reverse proxying. We’ll analyze these four parts sequentially.
Part 1: Prepare Forwarding Header
In this section, the Handler processes the request header of the client’s handshake request using the prepareForwardHeader
method and returns a forwardHeader
to enable the proxy server to initiate a request to establish a websocket connection with the backend server.
Additionally, if the user has customized the Director
option, further processing of the forwardHeader
can be performed accordingly.
Next, let’s take a look at the prepareForwardHeader
method:
The method logic is as follows:
-
1. First, initialize a
http.Header
with a size of 4 (type Header map[string][]string
); -
2. Check the client’s request header for fields such as
Origin
,Sec-Websocket-Protocol
,Cookie
, andHost
(HTTP Header). If any are present, set them in theforwardHeader
;The
Sec-Websocket-Protocol
header is used to specify the subprotocol that the client and server will use when establishing a WebSocket connection. The WebSocket protocol allows defining one or more subprotocols during connection establishment, which can describe the types of data or message formats to be sent and received on that connection. -
3. Check if the client’s request header contains the
X-Forwarded-For
field. If present, append the current client’s IP to the existingX-Forwarded-For
. If not present, set theX-Forwarded-For
field inforwardHeader
to the current client’s IP;The
X-Forwarded-For
header is typically added by a proxy server when forwarding requests and is used to identify the original client’s IP address.In network communication, when a request passes through multiple proxy servers (like load balancers, reverse proxies, etc.), the actual IP address of the original client that initiated the request might be hidden. To trace the true origin of the request, proxy servers usually add the
X-Forwarded-For
field in the HTTP request header, which contains the original client’s IP address.An example might look like this:
-
4. Set the
X-Forwarded-Proto
field inforwardHeader
and return it;The
X-Forwarded-Proto
header is typically added by a proxy server when forwarding requests and indicates the protocol (HTTP or HTTPS) used in the original request.
Points to Note:
- Difference between
header.Add
andheader.Set
methods;
The Add
method checks if the key already exists. If it doesn’t, it sets the corresponding key and value. If it does exist, it appends the current value to the existing value array. On the other hand, the Set
method doesn’t perform any checks. It sets the key and value regardless of whether it already exists or not.
Combining the explanation with the following code might make it easier for you to understand:
- Why aren’t fields like
Connection
,Upgrade
,Sec-WebSocket-Key
, etc., set forforwardHeader
?
Readers familiar with the WebSocket protocol might know that these fields are all essential for a WebSocket handshake request. They indicate that the client initiating the handshake intends to upgrade to the WebSocket protocol. If you add the following code within the prepareForwardHeader
method, you’ll find that indeed these fields are present in the client’s request.
However, the reason these fields aren’t included in forwardHeader
is because, as mentioned earlier, the client doesn’t establish the WebSocket connection directly with the backend server but with the proxy server.
Now, some readers might wonder how the WebSocket connection is established with the backend server if forwardHeader
doesn’t contain these necessary fields. Please refer to the upcoming second part for further clarification.
Part 2: Proxy Server Establishes WebSocket Connection with Backend Server
In the first part, we’ve prepared the forwardHeader
. Now, the next step involves passing the backend server’s path (target) along with the forwardHeader
to the Dial
method. The Dial
method initiates a handshake request to the backend server, allowing the proxy server to establish a WebSocket connection with the backend server.
Within the Dial
method, the necessary fields mentioned in the first part are also added to the initiated request:
The connBackend
returned by the Dial
method is the instance of the WebSocket connection established between the proxy server and the backend server.
Of course, if the Dial
method encounters an error, indicating a failure to establish a WebSocket connection between the proxy server and the backend server, the corresponding response needs to be returned to the client. However, as this isn’t the primary logic, I won’t delve into further details here.
Part 3: Establishing WebSocket Connection between Client and Proxy Server
In the second part, we saw that the proxy server had successfully established a WebSocket connection with the backend server using the Dial
method. In this third part, we’ll use the Upgrade
method to enable the client to establish a WebSocket connection with the proxy server.
Upgrade
will finalize the established WebSocket connection with the client’s handshake and pass it into connClient
, signifying that connClient
is the instance of the WebSocket connection established between the client and the proxy server.
Of course, if the Upgrade
method encounters an error, indicating a failed connection establishment, it will log the error, and the entire ServeHTTP
method will consequently terminate.
Part 4: Connection Communication
In the third part, the WebSocket connection between the client and the proxy server was established, and we obtained the connection instance connClient
. Next is to enable the client to communicate with the backend server in a reverse proxy manner using the following two connections:
connClient
: WebSocket connection instance between the client and the proxy serverconnBackend
: WebSocket connection instance between the proxy server and the backend server
As seen, we first prepare two channels to receive errors during communication:
errClientC
: For receiving errors when the backend server sends messages to the clienterrBackendC
: For receiving errors when the client sends messages to the backend server
Then, we retrieve two goroutines from the coroutine pool to concurrently execute the replicateWSRespConn
and replicateWSReqConn
methods. The naming of these methods might seem peculiar because WebSocket is a full-duplex communication protocol, where requests and responses are relative.
However, here, we uniformly consider messages sent from the backend server to the client as responses and messages sent from the client to the backend server as requests, in terms of the client’s perspective.
Since this can be a bit convoluted, I’ve created a schematic diagram to aid in understanding.
Let’s continue examining these two concurrently executed communication methods. Since the logic of both methods is essentially similar, let’s take a look at the logic of the replicateWSRespConn
method:
The logic of the method is quite simple. It involves reading messages sent from the backend server to the proxy server through the ReadMessage
method of connBackend
and then writing the read messages into the connection between the client and the proxy server using the WriteMessage
method of connClient
.
Any errors encountered are passed to errClientC
for unified handling outside the method.
Returning to the outer method, errors are received from errClientC
and errBackendC
using a for-select
structure. If the error is not a CloseError
, it is logged.
With this, we have completed the understanding of the core logic within the ServeHTTP
method. The logic of the ServeHTTP
method aligns entirely with the websocket reverse proxy process outlined in the basic approach. Readers can compare the basic approach with the specific implementation to achieve a better understanding.
Example Usage (Echo Server)
Just now, we’ve finished reading through the core code of the Hertz framework’s WebSocket reverse proxy extension. Here, we’ll demonstrate the usage of this extension by creating an example of an echo server. This example aims to assist readers in better understanding how to utilize this extension.
We’ll break down this example into three parts, and the complete code will be provided at the end.
Firstly, let’s globally declare the addresses and paths for the proxy server and the backend server:
Proxy Server
Using the constructor method NewWSReverseProxy
that we learned in the specific implementation section, create a WebSocket reverse proxy instance with the backendURL
as the target parameter. Next, register the ServeHTTP
method under the /ws
route, and start the proxy server via a goroutine. It’s important to note that the address of the proxy server and the registered route should correspond to the proxyURL
.
Backend Server
Start the backend server via a goroutine and register the /backend
route. Similarly, ensure that the address of the backend server and the registered route correspond to the backendURL
. Then, handle incoming WebSocket connections within the handler.
Begin by using the Upgrade
method to upgrade the HTTP protocol to the WebSocket protocol. Subsequently, read messages sent by the client (in this context, the client refers to the proxy server) from the established WebSocket connection and write the messages back to the WebSocket connection. This completes the echo logic, which continuously repeats this process within a for
loop.
Client
Using the Dial
method, send a handshake request to the proxy server to upgrade to the WebSocket protocol. Once you obtain the returned connection instance, utilize the Scanln
method to read user-input messages and write them to the connection. Simultaneously, receive the returned messages from the connection and print them. Continue this process within a for
loop.
With this, we’ve completed an example of a WebSocket reverse proxy echo server.
Complete Code
The complete code for the echo server example is as follows:
Summary
That concludes all the content of this article. We started from the basics of WebSocket reverse proxy, determined the basic approach to implementation, and delved deeper into understanding how to reverse proxy WebSocket by reading the code of the WebSocket reverse proxy extension in the Hertz framework and using an echo server as an example.
Hopefully, this article has been helpful for readers in understanding WebSocket reverse proxy. If there are any mistakes or questions, please feel free to comment or message me :).