coco: a simple stackless, single-threaded, and header-only C++11 coroutine library

coco◎ Darkhold

TL;DR

coco is a simple stackless, single-threaded, and header-only C++11 coroutine library.

https://github.com/kingluo/coco

Background

Don’t communicate through shared memory, share memory through communication.

I have always been impressed by Golang’s CSP programming, which is very expressive and couples different business logic together through channels.

In my C++ programming career, I have always struggled with callback hell. Usually, more than 5 callbacks are enough to make me confused even if I am the author, not to mention parallel asynchronous calls. On modern Linux, we use epoll or io_uring for asynchronous programming, but callback handling will make the business logic fragmented. C++20 introduced coroutines, but it is very complicated and far from the simple building blocks in Go that I need. I searched on GitHub, but did not find a corotuine library that is simple enough for my needs.

In Rust, a coroutine is implemented by an implicit state machine generated by the compiler. It is essentially a switch statement. So why not use macros to make coroutines look like Go, but use a state machine behind the scenes? Also, I don’t need multithreaded coroutines (goroutines) because in C++, usually we handle threads ourselves to improve efficiency, and we also don’t need a complex runtime like Go to schedule coroutines, so I hope the coroutine behaves like Lua, that is, it is cooperative and managed by the programmer to make it as flexible as possible.

So, I decided to build one myself (funny enough, I did something similar a decade ago, bringing Java’s reflection, annotations, and object proxies to C++). This is coco, a header file with only 200 code lines.

Design

  1. No depends on C++20 coroutine, C++11 is enough.
  2. No compiler dependencies, simple macros, header-only.
  3. coroutine like Lua, stackless async/await implementation just like Rust.
  4. channel and waitgroup like Go.
  5. single-threaded, no locks.
  6. No performance overhead.

Synopsis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    auto sock_read = new co_t([=](co_t* __self, state_t* _st) {
        auto st = dynamic_cast<sock_read_state_t*>(_st);
        COCO_ASYNC_BEGIN(sock_read);
        for (; st->cnt < 3; st->cnt++) {
            COCO_ASYNC_BEGIN(loop);
            COCO_WRITE_CHAN(write_ch, fs_write_ch, st->cnt);
            COCO_WRITE_CHAN(write_ch2, kafka_produce_ch, st->cnt);
            COCO_ASYNC_END();
        }
        fs_write_ch->close();
        kafka_produce_ch->close();
        COCO_WAIT(wait_all, wg);
        COCO_ASYNC_END();
        COCO_DONE();
    }, new sock_read_state_t);

    while (!sock_read->done())
        sock_read->resume();

Example: A simple webserver based on io_uring

In “Lord of the io_uring”, there is an example that implements a simple webserver. You can see that the callback style makes the code difficult to read and maintain.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void server_loop(int server_socket) {
    struct io_uring_cqe * cqe;
    struct sockaddr_in client_addr;
    socklen_t client_addr_len = sizeof(client_addr);
    add_accept_request(server_socket, & client_addr, & client_addr_len);
    while (1) {
        int ret = io_uring_wait_cqe( & ring, & cqe);
        struct request * req = (struct request * ) cqe -> user_data;
        if (ret < 0)
            fatal_error("io_uring_wait_cqe");
        if (cqe -> res < 0) {
            fprintf(stderr, "Async request failed: %s for event: %d\n",
                strerror(-cqe -> res), req -> event_type);
            exit(1);
        }
        switch (req -> event_type) {
        case EVENT_TYPE_ACCEPT:
            add_accept_request(server_socket, & client_addr, & client_addr_len);
            add_read_request(cqe -> res);
            free(req);
            break;
        case EVENT_TYPE_READ:
            if (!cqe -> res) {
                fprintf(stderr, "Empty request!\n");
                break;
            }
            handle_client_request(req);
            free(req -> iov[0].iov_base);
            free(req);
            break;
        case EVENT_TYPE_WRITE:
            for (int i = 0; i < req -> iovec_count; i++) {
                free(req -> iov[i].iov_base);
            }
            close(req -> client_socket);
            free(req);
            break;
        }
        /* Mark this request as processed */
        io_uring_cqe_seen( & ring, cqe);
    }
}

Connection acceptance and request handling are all mixed in a big switch. Compared with golang, this is terrible, isn’t it? When you face the real complex business logic, it’s hard to imagine what the code looks like.

With coco, it can be expressed as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    go([=](co_t* __self, state_t* _st) {
        auto st = dynamic_cast<iouring_state_t*>(_st);
        st->co = __self;
        while (true) {
            COCO_ASYNC_BEGIN(loop);
            do_accept(server_socket, nullptr, nullptr, st);
            COCO_YIELD();
            auto sk = st->res;
            go([=](co_t* __self, state_t* _st) {
                auto req = dynamic_cast<conn_state_t*>(_st);
                req->co = __self;
                COCO_ASYNC_BEGIN(process_req);

                read_request(req);
                COCO_YIELD();

                handle_request(req);
                send_response(req);
                COCO_YIELD();
                delete __self;

                COCO_ASYNC_END();
                COCO_DONE();
            }, new conn_state_t(sk));
            COCO_ASYNC_END();
        }
        COCO_DONE();
    }, new iouring_state_t);

Looks much better, right?

And the io_uring callback does only one thing: resumes the coroutine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
void server_loop(int server_socket) {
    while (true) {
        struct io_uring_cqe *cqe;
        int ret = io_uring_wait_cqe(&ring, &cqe);
        if (ret < 0)
            fatal_error("io_uring_wait_cqe");
        if (cqe->res < 0) {
            fprintf(stderr, "Async request failed: %s\n",
                    strerror(-cqe->res));
            close(server_socket);
            exit(1);
        }

        iouring_state_t *st = (iouring_state_t*)cqe->user_data;
        st->res = cqe->res;
        st->co->resume();
        io_uring_cqe_seen(&ring, cqe);
    }
}

It lets you program in C++ like Go, but without any performance sacrifices.

Coding restrictions

Since Coco is simple and has no compiler support, it inevitably has some coding limitations.

  1. coroutine lambda functions should be reentrant, so the statements should be reentrant and produce the same flow after reentry.
  2. yield, wait, or channel r/w should be placed in an async block at the first level.
  3. variables declared cannot escape yield, so if you need some variables to survive from a yield, use the state or global variables.
  4. exceptions must be caught inside the lambda function.
  5. state subclasses must inherit the state_t base class and dynamic_cast it to the subclass when using it.
  6. The state is managed by the coroutine, so the state will be deleted when the coroutine exits.
  7. You must delete all created coroutines, channels, and waitgroups yourself.

Conclusion

Coco is a poor man’s C++11 coroutine library that can improve your productivity when doing asynchronous programming.

If you like it, please star my GitHub repo!

https://github.com/kingluo/coco