What’s blocking?
Nginx has one master process and multiple worker processes, and each worker process is single threading.
It’s well known that NGINX uses an asynchronous, event‑driven approach to handling connections. This means that instead of creating another dedicated process or thread for each request (like servers with a traditional architecture), it handles multiple connections and requests in one worker process. To achieve this, NGINX works with sockets in a non‑blocking mode and uses efficient methods such as epoll and kqueue.
However, once the connection is accepted in one worker process, it could not be moved to other processes, even when that process is busy. Not like golang, it has no work stealing. And, only socket is asynchronous, the cpu-intensive tasks and file io tasks is still synchronous.
Every phase handler for an http request runs in a single thread. OpenResty allows you to run lua code
in each phase. The rewrite
, access
, content
phases, as well as the timer (implemented as faked connection)
are running lua code within an individual lua coroutine.
Within the coroutine, you could use cosocket to communicate with the world.
The cosocket is 100% nonblocking out of the box. When you call cosocket API and it needs to wait for
more data, it would yield the coroutine, register corresponding event in epoll
and let the event-loop
handles other events.
As the name suggests, coroutines use the CPU in cooperative way. When you block a long time within one coroutine, other coroutines have no way to get executed, because all coroutines reside in the one thread, which results in uneven request handling and long request delay.
When I work in Kugou Music, my team encounters blocking issue,
and I wrote a simple tool to diagnose it. This tool
makes use of LD_PRELOAD
to hook the lua_resume
and check the function execution time, and
print the url and entry lua function for long execution.
In fact, this tool is not good, because it could not print the backtrace and does not work for compiled lua code.
Locate the blocking source
The best tool to check blocking should be systemtap.
The blocking comes from two things:
- cpu-intensive tasks
- blocking system calls
So just check the execution time, if the time exceed a threshold, print the backtraces collected during that time.
You could find the example code here:
https://gist.github.com/kingluo/50b8589069f5f11a78d2136c1155d613
workload.lua
The workload simulates both cpu-intensive tasks (random string generation and MD5 calculation) and calls blocking system calls (blocking socket to access httpbin).
|
|
nginx.conf
error_log /dev/stderr info;
worker_processes 1;
events {}
http {
server {
listen 10000;
location /run {
content_by_lua_block {
require("workload").run()
}
}
}
}
blocking-bt.sxx
It tracks the lua_resume
(the entry of coroutine) and syscalls.
...
probe process("$^libluajit_path").function("lua_resume").return
{
if (tid() == target()) {
tracing = 0
elapsed = gettimeofday_ms() - start_time
if (elapsed >= $^arg_min_elapsed :default(200)) {
printf("--- lua_resume block %d ms\n", elapsed)
banner = 0
foreach (bt in bts- limit $^arg_limit :default(3)) {
cnt = @count(bts[bt])
if (cnt <= $^arg_min_bt :default(2)) {
break
}
if (banner == 0) {
printf(">>> Hot CPU backtraces\n")
banner = 1
}
printf("%s\t%d\n", bt, cnt)
}
banner = 0
foreach (lbt in syscall_bts- limit $^arg_syscall_limit :default(2)) {
total_us = @sum(syscall_bts[lbt])
if (total_us >= 1000) {
if (banner == 0) {
printf(">>> Time-consuming syscall backtraces\n")
banner = 1
}
printf("%s\t%d us\n", lbt, total_us)
}
}
printf("\n")
}
delete bts
delete syscall_bts
}
}
probe kernel.trace("sys_enter") {
if (tid() == target() && tracing) {
syscall_start_time = gettimeofday_us()
}
}
probe kernel.trace("sys_exit") {
if (tid() == target() && tracing) {
elapsed = gettimeofday_us() - syscall_start_time
if (elapsed >= $^arg_syscall_min_elapsed_us :default(400)) {
syscall_bts[lua_bt()] <<< elapsed
}
}
}
Test
|
|
We get the expected blocking source: the blocking time is about 3 seconds.
From the backtrace, we know the blocking comes from MD5 calculation, random string generation
and connect()/receive()
syscalls.
How to slove blocking?
Since the main thread is blocking sensitive, obviously, we should delegate blocking jobs to other threads.
The nginx provides the thread pool, which is originally used for serving static files more efficiently. I wrap them in lua, so that you could use nonblocking API to execute blocking stuff.
I contribute my works to OpenResty official, that is ngx.run_worker_thread()
API:
https://github.com/openresty/lua-nginx-module#ngxrun_worker_thread
Each thread in the thread pool would be associated an individual luajit vm. ngx.run_worker_thread()
would
send the task to one thread to get executed, yield the coroutine, and when the task get done, it would
resume the coroutine.
This API sloves blocking headaches for many companys.
However, it’s far from perfect:
- the lua vm allocation is random, for each execution, you get a brand new environment, so no way to cache resident resources, e.g. keepalive tcp connection. Every time, you need to init everything.
- the throughput is limited by the number of the thread pool, and context switch overhead can not be ignored.
- only small parts (string, hashing, shared dict) of ngx APIs could be used in the separate luajit vm
Could we make a perfect way? So that we could both relax the main thread and make use of ecosystem from other mainstream programming languages?
The answer is lua-resty-nonblocking-ffi
library.
It provides an efficient, generic and noblocking API to do hybrid programming in openresty
with mainstream languages (Go, Python, Java, etc.).
Features:
- simple but extensible interface, supports any C ABI compliant language
- once and for all, no need to write C/Lua codes to do coupling anymore
- high performance, 3~4 times faster than unix domain socket way
- shim for python/java, once and for all
- any serialization message format you like
This library has been used successfully by several commercial companies for over two years. My customers come from different domains, including AI, IOT, Edge computing.
I would open source this library in near future, please stay tuned.
Conclusion
The blocking is nginx specific issue. Be aware of it, otherwise, you would not benefit high performance from OpenResty.
For the time being, use ngx.run_worker_thread()
to offload your blocking tasks.
In future, lua-resty-nonblocking-ffi
is your best choice.