coco: a simple stackless, single-threaded, and header-only C++20 coroutine library

coco◎ Magic Shield

TL;DR

coco is a simple stackless, single-threaded, and header-only C++20 coroutine library that leverages native C++20 coroutines for async/await programming with Go-like channels and waitgroups.

https://github.com/kingluo/coco

Background

Don’t communicate through shared memory, share memory through communication.

I have always been impressed by Golang’s CSP programming, which is very expressive and couples different business logic together through channels.

In my C++ programming career, I have always struggled with callback hell. Usually, more than 5 callbacks are enough to make me confused even if I am the author, not to mention parallel asynchronous calls. On modern Linux, we use epoll or io_uring for asynchronous programming, but callback handling will make the business logic fragmented.

Initially, I implemented coco using macros to make coroutines look like Go, but use a switch-based state machine behind the scenes. However, with the advent of C++20 coroutines, I realized we could leverage native language support for even better performance and cleaner syntax.

Design

  1. Uses C++20 coroutines for native async/await support.
  2. Header-only library with no external dependencies.
  3. Stackless coroutines with async/await implementation.
  4. Channel and waitgroup primitives like Go.
  5. Single-threaded, no locks required.
  6. Minimal performance overhead.
  7. Simple FIFO scheduler for managing multiple coroutines.

Synopsis

Defining, Resuming and Running Coroutines:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Define a coroutine function - must return co_t and use co_await/co_yield/co_return
co_t my_coroutine(int id) {
    std::cout << "Coroutine " << id << " started" << std::endl;
    co_yield resched;  // Yield control and reschedule for later execution
    std::cout << "Coroutine " << id << " resumed" << std::endl;
    co_return;
}

int main() {
    // Create coroutine instance
    auto coro = my_coroutine(1);

    // Resume the coroutine - schedules it for execution
    coro.resume();

    // Run the scheduler to execute all scheduled coroutines
    scheduler_t::instance().run();

    return 0;
}

Joining Coroutines:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
co_t task1() {
    std::cout << "Task 1 started" << std::endl;
    co_yield resched;
    std::cout << "Task 1 finished" << std::endl;
    co_return;
}

co_t task2() {
    std::cout << "Task 2 started" << std::endl;
    co_yield resched;
    std::cout << "Task 2 finished" << std::endl;
    co_return;
}

co_t main_task() {
    // Start task1 and wait for it to complete using join()
    co_await go([]() { return task1(); }).join();

    // After task1 completes, start task2
    co_await go([]() { return task2(); }).join();

    std::cout << "Both tasks completed sequentially" << std::endl;
    co_return;
}

int main() {
    auto task = main_task();
    task.resume();
    scheduler_t::instance().run();
    return 0;
}

Channels:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
co_t example() {
    chan_t<int> ch(1);  // Buffered channel with capacity 1

    // Producer coroutine
    auto producer = [&ch]() -> co_t {
        for (int i = 0; i < 3; i++) {
            std::cout << "Sending: " << i << std::endl;
            bool ok = co_await ch.write(i);
            if (!ok) {
                std::cout << "Channel closed, stopping producer" << std::endl;
                break;
            }
        }
        ch.close();
        std::cout << "Producer finished" << std::endl;
        co_return;
    };

    // Consumer coroutine
    auto consumer = [&ch](const std::string& name) -> co_t {
        while (true) {
            auto result = co_await ch.read();
            if (result.has_value()) {
                std::cout << name << " received: " << result.value() << std::endl;
                // Yield and reschedule to allow fair distribution among consumers
                co_yield resched;
            } else {
                std::cout << name << " channel closed" << std::endl;
                break;
            }
        }
        co_return;
    };

    // Start producer and consumers
    auto prod = producer();
    auto cons1 = consumer("Consumer1");
    auto cons2 = consumer("Consumer2");

    // Schedule coroutines for execution
    prod.resume();
    cons1.resume();
    cons2.resume();

    // Run scheduler to execute all scheduled coroutines
    scheduler_t::instance().run();

    co_return;
}

Waitgroups:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
co_t worker(int id, wg_t& wg) {
    std::cout << "Worker " << id << " starting" << std::endl;
    co_yield resched;  // Yield control, automatically rescheduled
    std::cout << "Worker " << id << " finished" << std::endl;
    wg.done();
    co_return;
}

co_t example() {
    wg_t wg;

    // Add workers to waitgroup
    wg.add(3);

    // Start workers using go()
    auto w1 = go([&wg](){ return worker(1, wg); });
    auto w2 = go([&wg](){ return worker(2, wg); });
    auto w3 = go([&wg](){ return worker(3, wg); });

    // Wait for all workers to complete
    co_await wg.wait();
    std::cout << "All workers completed!" << std::endl;

    co_return;
}

int main() {
    auto task = example();
    task.resume();
    scheduler_t::instance().run();
    return 0;
}

Example: A Simple Webserver Based on io_uring

In “Lord of the io_uring”, there is an example that implements a simple webserver. You can see that the callback style makes the code difficult to read and maintain.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void server_loop(int server_socket) {
    struct io_uring_cqe * cqe;
    struct sockaddr_in client_addr;
    socklen_t client_addr_len = sizeof(client_addr);
    add_accept_request(server_socket, & client_addr, & client_addr_len);
    while (1) {
        int ret = io_uring_wait_cqe( & ring, & cqe);
        struct request * req = (struct request * ) cqe -> user_data;
        if (ret < 0)
            fatal_error("io_uring_wait_cqe");
        if (cqe -> res < 0) {
            fprintf(stderr, "Async request failed: %s for event: %d\n",
                strerror(-cqe -> res), req -> event_type);
            exit(1);
        }
        switch (req -> event_type) {
        case EVENT_TYPE_ACCEPT:
            add_accept_request(server_socket, & client_addr, & client_addr_len);
            add_read_request(cqe -> res);
            free(req);
            break;
        case EVENT_TYPE_READ:
            if (!cqe -> res) {
                fprintf(stderr, "Empty request!\n");
                break;
            }
            handle_client_request(req);
            free(req -> iov[0].iov_base);
            free(req);
            break;
        case EVENT_TYPE_WRITE:
            for (int i = 0; i < req -> iovec_count; i++) {
                free(req -> iov[i].iov_base);
            }
            close(req -> client_socket);
            free(req);
            break;
        }
        /* Mark this request as processed */
        io_uring_cqe_seen( & ring, cqe);
    }
}

Connection acceptance and request handling are both mixed in a big switch statement. Compared with Go, this is difficult to maintain. When you face real complex business logic, it’s hard to imagine what the code looks like.

With coco using C++20 coroutines, it can be expressed as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Note: This is simplified pseudocode to illustrate the concept.
// For a complete working example, see the examples/ directory in the GitHub repo.

co_t accept_connections(int server_socket) {
    while (true) {
        // Accept connection (async operation using io_uring)
        int client_socket = co_await accept_async(server_socket);

        // Spawn coroutine to handle this connection
        auto handler = handle_connection(client_socket);
        handler.resume();  // Runs concurrently with accept loop
    }
}

co_t handle_connection(int client_socket) {
    // Read request (async operation using io_uring)
    auto request = co_await read_request_async(client_socket);

    // Process request (synchronous business logic)
    auto response = handle_request(request);

    // Send response (async operation using io_uring)
    co_await send_response_async(client_socket, response);

    close(client_socket);
}

Much cleaner and easier to understand, right? The business logic flows naturally from top to bottom.

The io_uring event loop does only one thing: schedules the coroutine for resumption.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
void event_loop(io_uring* ring) {
    while (true) {
        io_uring_cqe* cqe;
        io_uring_wait_cqe(ring, &cqe);

        // Get the awaiter from completion event
        iouring_awaiter* awaiter = (iouring_awaiter*)cqe->user_data;
        awaiter->res = cqe->res;  // Store the result

        // CRITICAL: Use scheduler to resume the coroutine handle
        if (awaiter->handle) {
            scheduler_t::instance().schedule(awaiter->handle);
        }

        io_uring_cqe_seen(ring, cqe);

        // Run scheduler to process resumed coroutines
        scheduler_t::instance().run();
    }
}

It lets you write C++ code like Go, but without any performance sacrifices.

Key Features and Benefits

Since coco uses native C++20 coroutines, it provides several advantages over traditional callback-based or macro-based approaches:

  1. Native Language Support: Uses C++20’s built-in coroutine support for better performance and debugging
  2. Type Safety: Better compile-time error checking compared to macro-based approaches
  3. Clean Syntax: Natural async/await syntax without complex macros
  4. Automatic State Management: No manual state tracking - the compiler handles coroutine state automatically
  5. Exception Safety: Exceptions can be used normally within coroutines and are properly propagated through join operations
  6. Memory Management: Full RAII compliance - destructors are called properly and resource cleanup works as expected
  7. Debugging Support: Better debugger integration for stepping through coroutine code
  8. Simple Scheduler: FIFO queue-based scheduler for managing multiple coroutines with cooperative multitasking
  9. Coroutine Join: Ability to wait for specific coroutines to complete with exception propagation
  10. Flexible Yielding: Support for both automatic rescheduling (co_yield resched) and manual control (co_yield no_sched). Auto-rescheduling is for fair distribution of work, and manual control is for fine-grained control (we resume it manually only when ready, e.g., io_uring completion)

Performance Characteristics

  • Zero-cost abstraction: C++20 coroutines compile to efficient state machines with minimal overhead
  • No heap allocation per operation: Only the coroutine frame is heap-allocated once at creation
  • Single-threaded: No lock contention or synchronization overhead
  • Cooperative scheduling: Predictable execution order with FIFO scheduling
  • Stackless: Lower memory footprint compared to stackful coroutines (like Boost.Context)

For high-performance I/O workloads (e.g., with io_uring), coco provides near-native performance while maintaining clean, readable code.

Important Caveats of C++20 Coroutines and coco Implementation

While C++20 coroutines provide powerful async/await capabilities, there are several important limitations and considerations to keep in mind when using coco:

Scheduler-Based Coordination

The channel implementation uses a scheduler-based approach for coroutine coordination, which prevents stack exhaustion:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// In read_wait_t::await_resume()
void wake_up_writer() {
    if (!ch->wq.empty()) {
        auto writer = ch->wq.front();
        ch->wq.pop();
        if (writer && !writer.done()) {
            scheduler_t::instance().schedule(writer);  // Schedules instead of direct resume
        }
    }
}

Benefit: This prevents recursive execution and stack exhaustion by queuing coroutine handles in the scheduler’s FIFO queue.

Impact: The scheduler processes coroutines in FIFO order, providing cooperative multitasking. All channel operations (read/write) automatically use the scheduler to wake up waiting coroutines.

Coroutine Composition with Join

Important Pattern: You can only use co_await and co_yield in the top-level coroutine function. To compose coroutines, use the join() method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// ✅ This works - co_await in top-level coroutine function
co_t authenticate_user(const std::string& username) {
    std::cout << "Authenticating " << username << "..." << std::endl;
    co_yield resched;
    std::cout << "Authentication successful" << std::endl;
    co_return;
}

// ✅ Correct: Use join() to compose coroutines
co_t handle_user_request(const std::string& username) {
    // Use go().join() to create, schedule, and join in one expression
    co_await go([&](){ return authenticate_user(username); }).join();
    std::cout << "Request handled!" << std::endl;
    co_return;
}

// ❌ This doesn't work - co_await in helper function
void helper_function() {
    auto result = co_await some_async_operation(); // Compilation error!
}

// ❌ This also doesn't work - mixing return with coroutine keywords
co_t handle_request_WRONG(const std::string& username) {
    if (username.empty()) {
        return authenticate_user(username);  // Compilation error!
    }
    co_yield resched;
    co_return;
}

Solution: Use co_await coroutine.join() to compose coroutines sequentially. The go() helper creates and schedules the coroutine, then join() waits for completion.

Variable Lifetime and RAII

Critical Consideration: C++20 coroutines are stackless - the compiler transforms your coroutine into a state machine with a heap-allocated coroutine frame. RAII works perfectly - objects maintain their identity and resources across suspension points.

However, pointers and references to local variables (stack variables) become invalid across suspension points because the coroutine frame may be relocated in memory.

Key Rules:

  • RAII objects are preserved perfectly across suspension points
  • Local variables maintain their identity - no copying or moving occurs during suspension
  • Destructors are called only when the coroutine completes (co_return)
  • Heap allocations remain valid (smart pointers, containers’ internal data)
  • ⚠️ Pointers/references to local variables are INVALID after suspension - the frame may be relocated
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
co_t raii_example() {
    std::unique_ptr<int> resource = std::make_unique<int>(42);
    std::string data = "Hello";           // Lives in coroutine frame
    std::string* ptr = &data;             // Pointer to local variable

    co_await some_operation();

    // ✅ Safe - RAII objects are preserved across suspension
    data += " World";
    *resource = 100;

    // ❌ UNSAFE - ptr may point to old frame location after suspension
    // *ptr += " World";  // Undefined behavior!

    // Resources destroyed only when coroutine completes
    co_return;
}

Why This Happens: The coroutine frame is heap-allocated and may be moved in memory during suspension/resumption. While the objects themselves are preserved (maintaining their state and identity), any pointers or references you created that point to these objects will still point to the old memory location, making them invalid.

How Coroutine Suspension Works

When a coroutine suspends (via co_await or co_yield), the coroutine frame is preserved in heap memory:

Coroutine Frame (heap-allocated):
├── Promise object
├── Parameters
├── Local variables (including RAII objects)
├── Temporary objects
└── Suspension state

Key Points:

  • No copy/move constructors called during suspension/resumption
  • No destructors called until the coroutine completes (co_return)
  • Object identity preserved - the same objects exist before and after suspension
  • RAII works perfectly - resources are held across suspension points

This is why C++20 coroutines are so efficient - objects maintain their state and identity across suspension points without any copying or moving overhead.

Best Practices

  1. ✅ RAII works perfectly - Use RAII objects freely across suspension points
  2. ✅ Use value semantics - Local variables are safely preserved in the coroutine frame
  3. ⚠️ Avoid pointers/references to local variables across suspension points - They become invalid when the coroutine frame is relocated
  4. ✅ Use the scheduler - Always call scheduler_t::instance().run() to execute scheduled coroutines
  5. ✅ Yield cooperatively - Use co_yield resched to allow other coroutines to run, especially in loops or long-running operations
  6. ✅ Use join for coordination - Use co_await coroutine.join() to wait for specific coroutines to complete and propagate exceptions
  7. ✅ Use wg_guard_t for exception safety - Prefer wg_guard_t over manual wg.done() calls when exceptions are possible
  8. ✅ Compose with join - Use co_await go(...).join() pattern to compose coroutines sequentially
  9. ✅ Exception handling - Exceptions thrown in coroutines are captured and can be propagated via join()

Understanding these patterns is crucial for writing robust coroutine-based code with coco. The key insights are:

  • Coroutine suspension preserves RAII semantics perfectly - objects maintain their identity across suspension points
  • Pointers/references to local variables become invalid across suspension points - the coroutine frame may be relocated in memory

Comparison with Other Approaches

Feature coco (C++20 Coroutines) Callbacks Threads Stackful Coroutines
Readability ✅ Excellent ❌ Poor (callback hell) ✅ Good ✅ Good
Performance ✅ Excellent ✅ Excellent ⚠️ Moderate (context switch overhead) ⚠️ Good (stack overhead)
Memory Usage ✅ Low (stackless) ✅ Low ❌ High (stack per thread) ⚠️ Moderate (stack per coroutine)
Debugging ✅ Good (native support) ❌ Difficult ✅ Good ⚠️ Moderate
Synchronization ✅ Not needed (single-threaded) ✅ Not needed ❌ Required (locks, atomics) Depends
Exception Handling ✅ Natural ⚠️ Complex ✅ Natural ✅ Natural
Composability ✅ Excellent (co_await) ❌ Poor ⚠️ Moderate ✅ Good

Use Cases

coco is ideal for:

  • High-performance I/O servers - Web servers, API gateways, proxy servers using io_uring or epoll
  • Async data processing pipelines - ETL workflows, stream processing with clean sequential logic
  • Protocol implementations - Network protocols, message parsers with state machines
  • Game servers - Single-threaded game loops with async I/O operations
  • Embedded systems - Resource-constrained environments where thread overhead is prohibitive
  • Any scenario where you want Go-like concurrency in C++ - Without the runtime overhead

Conclusion

Whether you’re building high-performance servers with io_uring, complex async workflows, or concurrent data processing pipelines, coco makes concurrent programming in C++ more expressive and enjoyable. The single-threaded, scheduler-based approach eliminates the need for locks while providing cooperative multitasking capabilities. With only a simple header file, it’s easy to integrate and understand.

Key Takeaways:

  • Write async C++ code that reads like synchronous code
  • Leverage native C++20 coroutines for zero-cost abstraction
  • Use Go-like channels and waitgroups for elegant concurrency patterns
  • No external dependencies, just include the header and start coding

If you like it, please star the GitHub repo!

GitHub: https://github.com/kingluo/coco

For complete examples, documentation, and the latest updates, visit the repository. Contributions and feedback are welcome!