envoy asynchronous HTTP filter: lua-resty-ffi vs golang filter

chaos◎ Scarlet Witch

In Envoy, both lua-resty-ffi (envoy porting) and the built-in golang filter allow you to use golang to develop asynchronous business logic through goroutine, but which one performs better?

Call flow

When the headers of an incoming request get parsed by envoy, an HTTP filter’s decodeHeaders() is called. If decodeHeaders() returns StopIteration, other header filters in the chain will be skipped, but the request body will still be received and move forward to the data phase.

When a body chunk (partial body, not necessarily in chunk encoding) comes in, Envoy will call the decodeData() of filters one by one. If decodeData() returns StopIterationAndBuffer, the filter manager will buffer this body chunk for the subsequent decodeData() call, i.e. the next time when the decodeData() gets called, it will take accumulated body data. If decodeData() returns StopIterationAndNoBuffer, the filter manager will not buffer this body chunk and continue, i.e. the filter will be responsible for the body block.

Consider a simple scenario where Envoy echoes the request body back to the client. Let’s take a look at how golang filter and lua-resty-ffi-golang work respectively.

golang filter

The golang filter consists of two parts, one is the C++ filter, and the other is the golang filter instance (in go language). There is a subtle point here. When decodeHeaders() creates a goroutine to perform an asynchronous job and returns a StopIteration, the decodeData() calls on the C++ side will be made concurrently with that goroutine, so the C++ filter always buffers the request body chunks for later use. The golang side filter’s decodeData() is called only when the goroutine calls the continue callback (f.callbacks.Continue(api.StopAndBuffer)). That is, the golang filter buffers the entire request body by itself, instead of the filter manager.

golang_filter_call_flow

There are 2 memory copies here:

  • #1 copy envoy Buffer to golang string

capi_impl.go

1
2
3
4
5
6
func (c *httpCApiImpl) HttpGetBuffer(r unsafe.Pointer, bufferPtr uint64, length uint64) []byte {
	buf := make([]byte, length)
	res := C.envoyGoFilterHttpGetBuffer(r, C.uint64_t(bufferPtr), unsafe.Pointer(unsafe.SliceData(buf)))
	handleCApiStatus(res)
	return unsafe.Slice(unsafe.SliceData(buf), length)
}

golang_filter.cc

1
2
3
4
5
6
  for (const Buffer::RawSlice& slice : buffer->getRawSlices()) {
    // data is the heap memory of go, and the length is the total length of buffer. So use memcpy is
    // safe.
    memcpy(data, static_cast<const char*>(slice.mem_), slice.len_); // NOLINT(safe-memcpy)
    data += slice.len_;
  }

Someone may wonder why not just use func C.GoString(*C.char) string directly. Well, because the Buffer is not a flattened C string, so copying it is not a trivial job. ;-)

  • #2 copy golang string to Buffer

It gets the golang string data pointer and length so that it can be copied directly to the C++ side. The latest version uses unsafe.StringData() while previous versions used reflect.StringHeader casting.

capi_impl.go

1
2
3
func (c *httpCApiImpl) HttpSetBufferHelper(r unsafe.Pointer, bufferPtr uint64, value string, action api.BufferAction) {
	c.httpSetBufferHelper(r, bufferPtr, unsafe.Pointer(unsafe.StringData(value)), C.int(len(value)), action)
}

lua-resty-ffi-golang

Lua filters work in different ways. It does not use a callback style like golang filter, which maps C++ interfaces to golang interfaces one by one, such as DecodeData(). Instead, it uses coroutines to run logic (one coroutine per request), and all functions (such as getting the body) are encapsulated in wrapper objects and work in a non-blocking manner. Furthermore, it does not export the phases (header, body, encoding, or decoding) explicitly, the wrapping object methods will do the phase transition implicitly.

With lua-resty-ffi, the golang runtime (main goroutine) is started once. All subsequent messages (or requests) will be sent to that goroutine for processing or dispatch. For the echo logic here, the main goroutine creates a worker goroutine for each message processing. The worker coroutine responds to the Lua coroutine through envoy dispatcher.post(), schedules a closure to be executed in the envoy main thread, and resumes the Lua coroutine there.

lua_filter_call_flow

Note that we have 8 body copies here!

  • #1 copy Buffer to lua

wrapper.cc

1
2
3
4
  // TODO(mattklein123): Reduce copies here by using Lua direct buffer builds.
  std::unique_ptr<char[]> data(new char[length]);
  data_.copyOut(index, length, data.get());
  lua_pushlstring(state, data.get(), length);

One more interesting point here, the body is split into slices in the Buffer, but the lua C API requires a contiguous C string, so similar to the golang filter, it first flattens the Buffer and pushes it to the lua stack. Here involves two memory allocations.

  • #2 marshal the body and auth header

Unlike golang filters, where every operation is a C function call, lua-resty-ffi needs to use IPC (albeit efficient) to communicate with golang. I tested a lot of marshaling (serialization) methods and formats and found that string concatenation was the most efficient in this demo, which also made me focus on the filter efficiency itself.

envoy.yaml

1
2
3
4
5
local auth = req:headers():get("Authorization")
local body = req:body()
body = body and body:getBytes(0, body:length()) or "ok"
local tbl = {auth, body}
local data = table.concat(tbl, "\n")
  • #3 malloc and copy the message containing the auth header and request body to golang runtime
  • #4 unmarshal the message in the golang

filter.go

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
var rlen C.int
r := C.ngx_http_lua_ffi_get_req(task, &rlen)
c := unsafe.Pointer(r)
var i int
for ; i < int(rlen); i++ {
    b := *(*byte)(unsafe.Add(c, uintptr(i)*1))
    if b == '\n' {
        break
    }
}
str1 := unsafe.String((*byte)(unsafe.Pointer(r)), i)
str2 := unsafe.String((*byte)(unsafe.Add(unsafe.Pointer(r), uintptr(i+1)*1)), int(rlen)-i-1)
  • #5 dispatch the result message to the envoy main thread
  • #6 in envoy main thread, the message is pushed to the lua stack
  • #7 respond the HTTP request (echo the request body)

As you can see, unlike golang filters, there is a lua runtime between envoy and golang, and lua-resty-ffi also requires IPC to complete its work.

Therefore, lua-resty-ffi has the following disadvantages:

  1. The direction from envoy to golang brings additional OS thread scheduling costs
  2. To exchange messages you need to marshal and unmarshal the messages and use C malloc as transport, hence the reason for 8 memory allocations and copies in this demo.

But does this mean that lua filters are slower than golang filters? Without benchmarks, you have no answers.

There is an interesting point here that I must point out :-). Since the lua filter returns StopIterationAndBuffer, i.e. it relies on the filter manager to buffer and collect request body chunks, the Buffer parameter of each decodeData() call is the currently collected request body. On the last decodeData() call, it doesn’t have a chance to return StopIterationAndBuffer again, so it returns the buffer to the filter manager and gets the buffer data pointer for later reading and writing.

lua_filter.cc

1
2
3
4
5
  } else if (state_ == State::WaitForBody && end_stream_) {
    ENVOY_LOG(debug, "resuming body due to end stream");
    callbacks_.addData(data);
    state_ = State::Running;
    resumeCoroutine(luaBody(coroutine_.luaState()), yield_callback_);

lua_filter.cc

1
2
3
4
body_wrapper_.reset(
    Filters::Common::Lua::BufferWrapper::create(
	state, headers_, const_cast<Buffer::Instance&>(*callbacks_.bufferedBody())),
    true);

Benchmark

Check the benchmark code here:

https://github.com/kingluo/ffi-benchmark

golang filter snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
func (f *filter) DecodeHeaders(header api.RequestHeaderMap, endStream bool) api.StatusType {
	go func() {
		if ok, msg := f.verify(header); !ok {
			f.callbacks.SendLocalReply(401, msg, map[string]string{}, 0, "bad-request")
			return
		}
		// StopAndBuffer will buffer the entrie body for use by DecodeData()
		f.callbacks.Continue(api.StopAndBuffer)
	}()
	return api.Running
}

func (f *filter) DecodeData(buffer api.BufferInstance, endStream bool) api.StatusType {
	data := buffer.String()
	go func() {
		f.callbacks.SendLocalReply(200, data, map[string]string{}, 0, "basic-auth")
	}()
	return api.Running
}

lua filter snippet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  function envoy_on_request(req)
    require("resty_ffi")
    local demo = ngx.load_ffi("ffi_go_basic_auth",
      '{"user":"fooname","password":"validpassword"}')
    local auth = req:headers():get("Authorization")
    local body = req:body()
    body = body and body:getBytes(0, body:length()) or "ok"
    local tbl = {auth, body}
    local data = table.concat(tbl, "\n")
    local _, rc, res = demo:auth(data, req)
    req:respond({[":status"] = tostring(rc)}, res)
  end
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
go func() {
	for {
		task := C.ngx_http_lua_ffi_task_poll(tq)
		if task == nil {
			break
		}
		...
		req := &Request{Auth: str1, Body: str2}
		go func() {
			if ok, msg := verify(req.Auth); !ok {
				C.ngx_http_lua_ffi_respond(task, 401, (*C.char)(C.CString(msg)), C.int(len(msg)))
			} else {
				C.ngx_http_lua_ffi_respond(task, 200, (*C.char)(C.CString(req.Body)), C.int(len(req.Body)))
			}
		}()
	}
}()

Envoy and nginx can use the same golang shared library files compiled for lua-resty-ffi, this is one of the advantages of lua-resty-ffi, just like how LSP (Language Server Protocol) works, so I include nginx in the benchmark for reference.

I use k6 as the benchmark tool.

benchmark.js

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import http from 'k6/http';

export const options = {
  discardResponseBodies: true,
};

const data = open('/opt/test.data.64k', 'b');
//const data = open('/opt/test.data.1m', 'b');
//const data = open('/opt/test.data.10m', 'b');

export default function () {
  //http.get('http://envoy:10000/get', {headers:{Authorization:'Basic Zm9vbmFtZTp2YWxpZHBhc3N3b3Jk'}});
  http.post('http://envoy:10000/post', data, {
    headers: {
      'Content-Type': 'application/octet-stream',
      Authorization: 'Basic Zm9vbmFtZTp2YWxpZHBhc3N3b3Jk',
      Expect: '',
    }
  });
}

Test:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# build envoy with lua-resty-ffi
git checkout lua-resty-ffi
./ci/run_envoy_docker.sh './ci/do_ci.sh dev.contrib'
ln -sf `find . -name envoy-contrib -executable` /usr/local/bin/envoy

# build openresty with lua-resty-ffi
bash build_openresty.sh

# start golang-filter enabled envoy
GODEBUG=cgocheck=0 envoy -c envoy.yaml --concurrency 1

# start lua-resty-ffi enabled envoy
LD_LIBRARY_PATH=$PWD LUA_PATH='/opt/lua-resty-ffi/?.lua;;' envoy -c envoy.yaml --concurrency 1

# start lua-resty-ffi enabled nginx
LD_LIBRARY_PATH=/opt/ffi-benchmark/lua-filter/ LUA_PATH='/opt/lua-resty-ffi/?.lua;;' \
  nginx -p $PWD -g 'daemon off; error_log /dev/stderr info;'

# start k6 to do benchmark
for ((i=0;i<$((1024*64));i++)); do echo -n 'a' >> /opt/test.data.64k; done
for ((i=0;i<$((1024*1024*1));i++)); do echo -n 'a' >> /opt/test.data.1m; done
for ((i=0;i<$((1024*1024*10));i++)); do echo -n 'a' >> /opt/test.data.10m; done
k6 run -u 1 -d 60s -q /opt/envoy_vs_nginx_ffi.js

We only check the latency metric here, which is http_req_duration.

  • GET
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# golang filter
# cpu: 66%
avg=105.09µs min=75.32µs med=100.85µs max=5.9ms p(90)=120.53µs p(95)=129.46µs

# lua filter
# cpu: 70%
avg=111.78µs min=78.01µs med=108.98µs max=12.62ms p(90)=129.03µs p(95)=136.13µs

# nginx
# cpu: 62%
avg=74.66µs  min=48.19µs med=72.57µs  max=6.36ms p(90)=83.15µs p(95)=86.51µs
  • POST 64k
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# golang filter
# cpu: 77%
avg=218.94µs min=155.97µs med=192.04µs max=12.27ms p(90)=218.93µs p(95)=235.65µs

# lua filter
# cpu: 66%
avg=230.68µs min=170.97µs med=205.81µs max=10.74ms p(90)=235.01µs p(95)=247.15µs

# nginx
# cpu: 60%
avg=164.92µs min=107.25µs med=142.75µs max=10.89ms p(90)=167.14µs p(95)=176.01µs
  • POST 1m
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# golang filter
# cpu: 90%
avg=1.47ms min=1.07ms med=1.18ms max=9.67ms p(90)=1.49ms p(95)=3.32ms

# lua filter
# cpu: 80%
avg=1.82ms min=1.5ms med=1.59ms max=8.21ms p(90)=1.65ms p(95)=4.12ms

# nginx
# cpu: 83%
avg=2.04ms min=1.58ms med=1.77ms max=9.73ms p(90)=2.16ms p(95)=4.52ms
  • POST 10m
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# golang filter
# cpu: 90%
avg=14.06ms min=12.59ms med=13.66ms max=39.31ms p(90)=16.61ms p(95)=17.01ms

# lua filter
# cpu: 90%
avg=20.56ms min=19ms med=20.19ms max=66.24ms p(90)=22.94ms p(95)=23.56ms

# nginx
# cpu: 93%
avg=23.02ms min=21.08ms med=22.57ms max=42.85ms p(90)=24.71ms p(95)=25.99ms

You can see that due to the shortcomings I mentioned before, especially the memory copies, lua-resty-ffi takes more time than the golang filter, but not too much (the biggest time difference is approximately 20% when transferring the 10MB body) and is proportional to the size of the subject.

However, in most use cases we don’t have to move the entire bodies to and from the lua-resty-ffi runtime, because the data is arbitrary and on demand. In other words, this demo represents an extreme situation and is for reference only.

Conclusion

Since both lua-resty-ffi-golang and golang filters can meet the same development needs, which one is better?

In my opinion, the performance overhead of lua-resty-ffi is a bit higher than golang filters, but lua-resty-ffi is a better choice for extending envoy functionality in a hybrid programming approach:

  1. lua-resty-ffi supports multiple languages, not just golang: Rust, Golang, Java, Python, Nodejs, so it can adapt well to different technology stacks
  2. lua-resty-ffi supports both nginx and envoy, so as long as you develop an extension for envoy, it can also run in nginx (even without recompiling!)
  3. lua-resty-ffi is relatively simple, while golang’s filter implementation is more complex. I actually spent some time figuring out from the source code how golang filters work.
  4. Golang filters export the same interface as C++ filters, so it is subtle to make a correct decision about which status code is returned, such as the difference between Continue and StopIteration when calling back envoy in a goroutine. golang filters are error-prone, while lua-resty-ffi provides a generic golang sidecar, you can write any normal golang code like in other projects and the lua filter will handle the correct phase transitions for you.
  5. lua filter supports hot-reload of the source code, while golang filter cannot (unless re-compilation). Moreover, lua-resty-ffi supports language runtime hot-reload, e.g. Python, Java and Nodejs.

Welcome to learn more about lua-resty-ffi:

https://github.com/kingluo/lua-resty-ffi