◎ Magic Shield
TL;DR
coco is a simple stackless, single-threaded, and header-only C++20 coroutine library that leverages native C++20 coroutines for async/await programming with Go-like channels and waitgroups.
https://github.com/kingluo/coco
Background
Don’t communicate through shared memory, share memory through communication.
I have always been impressed by Golang’s CSP programming, which is very expressive and couples different business logic together through channels.
In my C++ programming career, I have always struggled with callback hell. Usually, more than 5 callbacks are enough to make me confused even if I am the author, not to mention parallel asynchronous calls. On modern Linux, we use epoll or io_uring for asynchronous programming, but callback handling will make the business logic fragmented.
Initially, I implemented coco using macros to make coroutines look like Go, but use a switch-based state machine behind the scenes. However, with the advent of C++20 coroutines, I realized we could leverage native language support for even better performance and cleaner syntax.
Design
- Uses C++20 coroutines for native async/await support.
- Header-only library with no external dependencies.
- Stackless coroutines with async/await implementation.
- Channel and waitgroup primitives like Go.
- Single-threaded, no locks required.
- Minimal performance overhead.
- Simple FIFO scheduler for managing multiple coroutines.
Synopsis
Defining, Resuming and Running Coroutines:
|
|
Joining Coroutines:
|
|
Channels:
|
|
Waitgroups:
|
|
Example: A Simple Webserver Based on io_uring
In “Lord of the io_uring”, there is an example that implements a simple webserver. You can see that the callback style makes the code difficult to read and maintain.
|
|
Connection acceptance and request handling are both mixed in a big switch statement. Compared with Go, this is difficult to maintain. When you face real complex business logic, it’s hard to imagine what the code looks like.
With coco using C++20 coroutines, it can be expressed as:
|
|
Much cleaner and easier to understand, right? The business logic flows naturally from top to bottom.
The io_uring event loop does only one thing: schedules the coroutine for resumption.
|
|
It lets you write C++ code like Go, but without any performance sacrifices.
Key Features and Benefits
Since coco uses native C++20 coroutines, it provides several advantages over traditional callback-based or macro-based approaches:
- Native Language Support: Uses C++20’s built-in coroutine support for better performance and debugging
- Type Safety: Better compile-time error checking compared to macro-based approaches
- Clean Syntax: Natural async/await syntax without complex macros
- Automatic State Management: No manual state tracking - the compiler handles coroutine state automatically
- Exception Safety: Exceptions can be used normally within coroutines and are properly propagated through join operations
- Memory Management: Full RAII compliance - destructors are called properly and resource cleanup works as expected
- Debugging Support: Better debugger integration for stepping through coroutine code
- Simple Scheduler: FIFO queue-based scheduler for managing multiple coroutines with cooperative multitasking
- Coroutine Join: Ability to wait for specific coroutines to complete with exception propagation
- Flexible Yielding: Support for both automatic rescheduling (
co_yield resched) and manual control (co_yield no_sched). Auto-rescheduling is for fair distribution of work, and manual control is for fine-grained control (we resume it manually only when ready, e.g., io_uring completion)
Performance Characteristics
- Zero-cost abstraction: C++20 coroutines compile to efficient state machines with minimal overhead
- No heap allocation per operation: Only the coroutine frame is heap-allocated once at creation
- Single-threaded: No lock contention or synchronization overhead
- Cooperative scheduling: Predictable execution order with FIFO scheduling
- Stackless: Lower memory footprint compared to stackful coroutines (like Boost.Context)
For high-performance I/O workloads (e.g., with io_uring), coco provides near-native performance while maintaining clean, readable code.
Important Caveats of C++20 Coroutines and coco Implementation
While C++20 coroutines provide powerful async/await capabilities, there are several important limitations and considerations to keep in mind when using coco:
Scheduler-Based Coordination
The channel implementation uses a scheduler-based approach for coroutine coordination, which prevents stack exhaustion:
|
|
Benefit: This prevents recursive execution and stack exhaustion by queuing coroutine handles in the scheduler’s FIFO queue.
Impact: The scheduler processes coroutines in FIFO order, providing cooperative multitasking. All channel operations (read/write) automatically use the scheduler to wake up waiting coroutines.
Coroutine Composition with Join
Important Pattern: You can only use co_await and co_yield in the top-level coroutine function. To compose coroutines, use the join() method.
|
|
Solution: Use co_await coroutine.join() to compose coroutines sequentially. The go() helper creates and schedules the coroutine, then join() waits for completion.
Variable Lifetime and RAII
Critical Consideration: C++20 coroutines are stackless - the compiler transforms your coroutine into a state machine with a heap-allocated coroutine frame. RAII works perfectly - objects maintain their identity and resources across suspension points.
However, pointers and references to local variables (stack variables) become invalid across suspension points because the coroutine frame may be relocated in memory.
Key Rules:
- RAII objects are preserved perfectly across suspension points
- Local variables maintain their identity - no copying or moving occurs during suspension
- Destructors are called only when the coroutine completes (co_return)
- Heap allocations remain valid (smart pointers, containers’ internal data)
- ⚠️ Pointers/references to local variables are INVALID after suspension - the frame may be relocated
|
|
Why This Happens: The coroutine frame is heap-allocated and may be moved in memory during suspension/resumption. While the objects themselves are preserved (maintaining their state and identity), any pointers or references you created that point to these objects will still point to the old memory location, making them invalid.
How Coroutine Suspension Works
When a coroutine suspends (via co_await or co_yield), the coroutine frame is preserved in heap memory:
Coroutine Frame (heap-allocated):
├── Promise object
├── Parameters
├── Local variables (including RAII objects)
├── Temporary objects
└── Suspension state
Key Points:
- No copy/move constructors called during suspension/resumption
- No destructors called until the coroutine completes (co_return)
- Object identity preserved - the same objects exist before and after suspension
- RAII works perfectly - resources are held across suspension points
This is why C++20 coroutines are so efficient - objects maintain their state and identity across suspension points without any copying or moving overhead.
Best Practices
- ✅ RAII works perfectly - Use RAII objects freely across suspension points
- ✅ Use value semantics - Local variables are safely preserved in the coroutine frame
- ⚠️ Avoid pointers/references to local variables across suspension points - They become invalid when the coroutine frame is relocated
- ✅ Use the scheduler - Always call
scheduler_t::instance().run()to execute scheduled coroutines - ✅ Yield cooperatively - Use
co_yield reschedto allow other coroutines to run, especially in loops or long-running operations - ✅ Use join for coordination - Use
co_await coroutine.join()to wait for specific coroutines to complete and propagate exceptions - ✅ Use wg_guard_t for exception safety - Prefer
wg_guard_tover manualwg.done()calls when exceptions are possible - ✅ Compose with join - Use
co_await go(...).join()pattern to compose coroutines sequentially - ✅ Exception handling - Exceptions thrown in coroutines are captured and can be propagated via
join()
Understanding these patterns is crucial for writing robust coroutine-based code with coco. The key insights are:
- Coroutine suspension preserves RAII semantics perfectly - objects maintain their identity across suspension points
- Pointers/references to local variables become invalid across suspension points - the coroutine frame may be relocated in memory
Comparison with Other Approaches
| Feature | coco (C++20 Coroutines) | Callbacks | Threads | Stackful Coroutines |
|---|---|---|---|---|
| Readability | ✅ Excellent | ❌ Poor (callback hell) | ✅ Good | ✅ Good |
| Performance | ✅ Excellent | ✅ Excellent | ⚠️ Moderate (context switch overhead) | ⚠️ Good (stack overhead) |
| Memory Usage | ✅ Low (stackless) | ✅ Low | ❌ High (stack per thread) | ⚠️ Moderate (stack per coroutine) |
| Debugging | ✅ Good (native support) | ❌ Difficult | ✅ Good | ⚠️ Moderate |
| Synchronization | ✅ Not needed (single-threaded) | ✅ Not needed | ❌ Required (locks, atomics) | Depends |
| Exception Handling | ✅ Natural | ⚠️ Complex | ✅ Natural | ✅ Natural |
| Composability | ✅ Excellent (co_await) | ❌ Poor | ⚠️ Moderate | ✅ Good |
Use Cases
coco is ideal for:
- High-performance I/O servers - Web servers, API gateways, proxy servers using io_uring or epoll
- Async data processing pipelines - ETL workflows, stream processing with clean sequential logic
- Protocol implementations - Network protocols, message parsers with state machines
- Game servers - Single-threaded game loops with async I/O operations
- Embedded systems - Resource-constrained environments where thread overhead is prohibitive
- Any scenario where you want Go-like concurrency in C++ - Without the runtime overhead
Conclusion
Whether you’re building high-performance servers with io_uring, complex async workflows, or concurrent data processing pipelines, coco makes concurrent programming in C++ more expressive and enjoyable. The single-threaded, scheduler-based approach eliminates the need for locks while providing cooperative multitasking capabilities. With only a simple header file, it’s easy to integrate and understand.
Key Takeaways:
- Write async C++ code that reads like synchronous code
- Leverage native C++20 coroutines for zero-cost abstraction
- Use Go-like channels and waitgroups for elegant concurrency patterns
- No external dependencies, just include the header and start coding
If you like it, please star the GitHub repo!
GitHub: https://github.com/kingluo/coco
For complete examples, documentation, and the latest updates, visit the repository. Contributions and feedback are welcome!