Hi, I’m the developer of Kourier, the fastest server on Earth, and in this post, I benchmark Kourier’s performance against Rust/Hyper and Go/http.
Kourier is open source, and all the assets used on the benchmarks are publicly available and container-based, ensuring you can reproduce the results on your local machine quickly and easily. You can find all of them in the Kourier repository.
The results show that Kourier is a performance powerhouse, capable of processing 12.1 million HTTP requests per second on an AMD Ryzen 5 1600, an 8-year-old mid-range processor, using only half of its cores (wrk uses the other half). The results set a new standard for HTTP servers, leaving the highest-performing frameworks in the dust.
I use an approach based on TechEmpower’s plaintext benchmark. “Hello World” HTTP benchmarks are relevant because a lot of code is exercised between receiving a network data payload and calling a registered HTTP handler, and it is that code that I want to benchmark.
Although Kourier is truly a wonder of software engineering and delivers never-before-seen performance, many frameworks sacrifice HTTP conformance to be faster. In another post, I show that Kourier is much more HTTP syntax-compliant than Rust/Hyper and Go/http.
I also benchmarked, in another post, the memory consumption when the servers are put under high load, where I show that Kourier consumes 4.7x less memory than Rust/Hyper and 7.7x less memory than Go/http.
All Docker images used in this benchmark can be easily built using the build script contained in Kourier’s repository.
I start the servers with the following commands:
# Kourier. The server listens on port 3275
# and uses six threads to process incoming requests.
docker run --rm -d --network host kourier-bench:kourier -a 127.0.0.1 -p 3275 --worker-count=6 --request-timeout=20 --idle-timeout=60
# Rust (Hyper). The server listens on port 8080
# and uses six threads to process incoming requests.
docker run --rm -d --network host kourier-bench:rust-hyper
# Go (net/http). The server listens on port 7080
# and uses six threads to process incoming requests.
docker run --rm -d --network host kourier-bench:go-net-http
I use wrk to load the server. As I do not want to benchmark the network, I run the benchmarks over the localhost and split the available cores evenly between the servers and wrk. I use the following command to make wrk use 512 connections over six threads during 15 seconds to load the server with pipelined requests:
# PORT is 3275 for Kourier, 8080 for Rust (Hyper), or 7080 for Go (net/http).
docker run --rm --network host -it kourier-bench:wrk -c 512 -d 15 -t 6 --latency http://localhost:PORT/hello -s /wrk/pipeline.lua -- 256
I set timers to exercise as much framework code as possible in the benchmark. On Kourier, I set request and idle timers; on Rust (Hyper), I set timers only for requests; on Go (net/http), I set request, idle, and write timers, as you can see on Kourier’s repository (all benchmark code is available at the Src/Tests/Resources/Benchmarks folder).
Results 🔗
# Testing Kourier server at 127.0.0.1:3275
glauco@ldh:~$ sudo docker run --rm --network host -it kourier-bench:wrk -c 512 -d 15 -t 6 --latency http://localhost:3275/hello -s /wrk/pipeline.lua -- 256
Running 15s test @ http://localhost:3275/hello
6 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.41ms 3.05ms 60.70ms 65.46%
Req/Sec 2.04M 104.69k 2.44M 82.17%
Latency Distribution
50% 4.23ms
75% 6.91ms
90% 9.57ms
99% 0.00us
182933248 requests in 15.07s, 17.89GB read
Requests/sec: 12138068.83
Transfer/sec: 1.19GB
# Testing Rust (Hyper) server at 127.0.0.1:8080
glauco@ldh:~$ sudo docker run --rm --network host -it kourier-bench:wrk -c 512 -d 15 -t 6 --latency http://localhost:8080/hello -s /wrk/pipeline.lua -- 256
Running 15s test @ http://localhost:8080/hello
6 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 24.34ms 19.22ms 103.72ms 65.81%
Req/Sec 737.44k 45.53k 0.97M 76.00%
Latency Distribution
50% 19.89ms
75% 37.51ms
90% 55.08ms
99% 80.51ms
66036480 requests in 15.07s, 6.33GB read
Requests/sec: 4382914.88
Transfer/sec: 430.53MB
# Testing Go (net/http) server at 127.0.0.1:7080
glauco@ldh:~$ sudo docker run --rm --network host -it kourier-bench:wrk -c 512 -d 15 -t 6 --latency http://localhost:7080/hello -s /wrk/pipeline.lua -- 256
Running 15s test @ http://localhost:7080/hello
6 threads and 512 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 491.73ms 413.30ms 2.00s 67.97%
Req/Sec 38.89k 10.21k 111.96k 70.41%
Latency Distribution
50% 402.08ms
75% 742.12ms
90% 1.15s
99% 1.81s
3480465 requests in 15.06s, 418.22MB read
Socket errors: connect 0, read 0, write 0, timeout 722
Requests/sec: 231080.00
Transfer/sec: 27.77MB
Conclusion 🔗
Kourier is the next level of network-based communication. It is on another league regarding performance, compliance, and memory consumption. Kourier leaves everything else in the dust, including enterprise network appliances.
Creating the fastest server ever requires much more than a stellar HTTP parser. From ring buffers to socket programming, to a custom timer implementation. I implemented it all and open-sourced all of it alongside Kourier.
I developed Kourier with strict and demanding requirements, where all possible behaviours are comprehensively verified in specifications written in the Gherkin style. To this end, I created Spectator, a test framework that I also open-sourced with Kourier. You can check all files ending in spec.cpp in the Kourier repository to see how meticulously tested Kourier is. There is a stark difference in testing rigor between Kourier and other frameworks.
Kourier can empower the next generation of network appliances/solutions, making businesses that rely on them run at a fraction of their infrastructure cost and in a much more HTTP-compliant way.
You can contact me if your Business is not compatible with the requirements of the AGPL and you want to license Kourier under alternative terms.