In Envoy, both lua-resty-ffi (envoy porting) and the built-in golang filter allow you to use golang to develop asynchronous business logic through goroutine, but which one performs better?
Call flow
When the headers of an incoming request get parsed by envoy, an HTTP filter’s decodeHeaders()
is called. If decodeHeaders()
returns StopIteration
, other header filters in the chain will be skipped, but the request body will still be received and move forward to the data phase.
When a body chunk (partial body, not necessarily in chunk encoding) comes in, Envoy will call the decodeData()
of filters one by one. If decodeData()
returns StopIterationAndBuffer
, the filter manager will buffer this body chunk for the subsequent decodeData()
call, i.e. the next time when the decodeData()
gets called, it will take accumulated body data. If decodeData()
returns StopIterationAndNoBuffer
, the filter manager will not buffer this body chunk and continue, i.e. the filter will be responsible for the body block.
Consider a simple scenario where Envoy echoes the request body back to the client. Let’s take a look at how golang filter and lua-resty-ffi-golang work respectively.
golang filter
The golang filter consists of two parts, one is the C++ filter, and the other is the golang filter instance (in go language). There is a subtle point here. When decodeHeaders()
creates a goroutine to perform an asynchronous job and returns a StopIteration
, the decodeData()
calls on the C++ side will be made concurrently with that goroutine, so the C++ filter always buffers the request body chunks for later use. The golang side filter’s decodeData()
is called only when the goroutine calls the continue callback (f.callbacks.Continue(api.StopAndBuffer)
). That is, the golang filter buffers the entire request body by itself, instead of the filter manager.
There are 2 memory copies here:
- #1 copy envoy
Buffer
to golang string
|
|
|
|
Someone may wonder why not just use func C.GoString(*C.char) string
directly. Well, because the Buffer
is not a flattened C string, so copying it is not a trivial job. ;-)
- #2 copy golang string to
Buffer
It gets the golang string data pointer and length so that it can be copied directly to the C++ side.
The latest version uses unsafe.StringData()
while previous versions used reflect.StringHeader
casting.
|
|
lua-resty-ffi-golang
Lua filters work in different ways. It does not use a callback style like golang filter, which maps C++ interfaces to golang interfaces one by one, such as DecodeData()
. Instead, it uses coroutines to run logic (one coroutine per request), and all functions (such as getting the body) are encapsulated in wrapper objects and work in a non-blocking manner. Furthermore, it does not export the phases (header, body, encoding, or decoding) explicitly, the wrapping object methods will do the phase transition implicitly.
With lua-resty-ffi, the golang runtime (main goroutine) is started once. All subsequent messages (or requests) will be sent to that goroutine for processing or dispatch. For the echo logic here, the main goroutine creates a worker goroutine for each message processing. The worker coroutine responds to the Lua coroutine through envoy dispatcher.post()
, schedules a closure to be executed in the envoy main thread, and resumes the Lua coroutine there.
Note that we have 8 body copies here!
- #1 copy
Buffer
to lua
|
|
One more interesting point here, the body is split into slices in the Buffer, but the lua C API requires a contiguous C string, so similar to the golang filter, it first flattens the Buffer and pushes it to the lua stack. Here involves two memory allocations.
- #2 marshal the body and auth header
Unlike golang filters, where every operation is a C function call, lua-resty-ffi needs to use IPC (albeit efficient) to communicate with golang. I tested a lot of marshaling (serialization) methods and formats and found that string concatenation was the most efficient in this demo, which also made me focus on the filter efficiency itself.
|
|
- #3 malloc and copy the message containing the auth header and request body to golang runtime
- #4 unmarshal the message in the golang
|
|
- #5 dispatch the result message to the envoy main thread
- #6 in envoy main thread, the message is pushed to the lua stack
- #7 respond the HTTP request (echo the request body)
As you can see, unlike golang filters, there is a lua runtime between envoy and golang, and lua-resty-ffi also requires IPC to complete its work.
Therefore, lua-resty-ffi has the following disadvantages:
- The direction from envoy to golang brings additional OS thread scheduling costs
- To exchange messages you need to marshal and unmarshal the messages and use C malloc as transport, hence the reason for 8 memory allocations and copies in this demo.
But does this mean that lua filters are slower than golang filters? Without benchmarks, you have no answers.
There is an interesting point here that I must point out :-). Since the lua filter returns StopIterationAndBuffer
, i.e. it relies on the filter manager to buffer and collect request body chunks, the Buffer
parameter of each decodeData()
call is the currently collected request body. On the last decodeData()
call, it doesn’t have a chance to return StopIterationAndBuffer
again, so it returns the buffer to the filter manager and gets the buffer data pointer for later reading and writing.
|
|
|
|
Benchmark
Check the benchmark code here:
https://github.com/kingluo/ffi-benchmark
golang filter snippet:
|
|
lua filter snippet:
|
|
|
|
Envoy and nginx can use the same golang shared library files compiled for lua-resty-ffi, this is one of the advantages of lua-resty-ffi, just like how LSP (Language Server Protocol) works, so I include nginx in the benchmark for reference.
I use k6 as the benchmark tool.
|
|
Test:
|
|
We only check the latency metric here, which is http_req_duration
.
- GET
|
|
- POST 64k
|
|
- POST 1m
|
|
- POST 10m
|
|
You can see that due to the shortcomings I mentioned before, especially the memory copies, lua-resty-ffi takes more time than the golang filter, but not too much (the biggest time difference is approximately 20% when transferring the 10MB body) and is proportional to the size of the subject.
However, in most use cases we don’t have to move the entire bodies to and from the lua-resty-ffi runtime, because the data is arbitrary and on demand. In other words, this demo represents an extreme situation and is for reference only.
Conclusion
Since both lua-resty-ffi-golang and golang filters can meet the same development needs, which one is better?
In my opinion, the performance overhead of lua-resty-ffi is a bit higher than golang filters, but lua-resty-ffi is a better choice for extending envoy functionality in a hybrid programming approach:
- lua-resty-ffi supports multiple languages, not just golang: Rust, Golang, Java, Python, Nodejs, so it can adapt well to different technology stacks
- lua-resty-ffi supports both nginx and envoy, so as long as you develop an extension for envoy, it can also run in nginx (even without recompiling!)
- lua-resty-ffi is relatively simple, while golang’s filter implementation is more complex. I actually spent some time figuring out from the source code how golang filters work.
- Golang filters export the same interface as C++ filters, so it is subtle to make a correct decision about which status code is returned, such as the difference between
Continue
andStopIteration
when calling back envoy in a goroutine. golang filters are error-prone, while lua-resty-ffi provides a generic golang sidecar, you can write any normal golang code like in other projects and the lua filter will handle the correct phase transitions for you. - lua filter supports hot-reload of the source code, while golang filter cannot (unless re-compilation). Moreover, lua-resty-ffi supports language runtime hot-reload, e.g. Python, Java and Nodejs.
Welcome to learn more about lua-resty-ffi: