Asynchronous programming or async for short, is a concurrent programming model supported by an increasing number of programming languages. It lets you run a large number of concurrent tasks on a small number of OS threads, while preserving much of the look and feel of ordinary synchronous programming, through the async/await syntax.
How rust implements the asynchronous programming?
The most common and easy way is OS threading.
But the most disadvantage is resource consumption, especially context switching overhead.
Green thread (stackful coroutine)
Just like Golang, rust provides green thread before it hits 1.0.
But callback hell, fragmented program logic and data sharing are the nightmare.
Async/Await (stackless coroutine)
- The programming logics are linear, which is line with human thinking habits
- No need to manage stack grow and shrink
- No need to save CPU state
- Reuse generator mechanism
trait is at the center of asynchronous programming in Rust.
Future is an asynchronous computation that can produce a value.
So, all implementations of async runtime, e.g. tokio, take
Future as the schedule unit:
Future itself provides
Promise style async programming, aka combinators.
In fact, before async/await borns, the crate
Futures 0.1 used combinators.
Moreover, you can wrap futures into a high-level future, to implement new logic.
For example, write a
Future to run two sub-
How Async/Await looks like in assembly code?
Async function is just a more friendly presentation of
.await, you can compose different
Futures together at ease.
Each async function would be compiled into a special
In the compiler, generators are currently compiled as state machines. Each yield expression will correspond to a different state that stores all live variables over that suspension point. Resumption of a generator will dispatch on the current state and then execute internally until a yield is reached, at which point all state is saved off in the generator and a value is returned.
Let’s take a simple example:
We have a top-level async function
foobar(), which calls async functions
Let’s use playground to compile the code into assembly code in debug mode.
Check the assembly code of
State data structure
The first argument
%rdi is the generator type itself, which contains state data, saved in the stack
This type has a number of states (represented here as an enum) corresponding to each of the conceptual states of the generator. At the beginning we’re closing over our outer variable foo and then that variable is also live over the yield point, so it’s stored in both states.
What’s the structure of the state data? Well, let’s check it in llvm IR:
Suspend0 is the enum variant lives across the first yield point
Suspend1 is the enum variant lives across the second yield point
In my program, all local variables lives till the end, so
Suspend1 is a superset of
so it’s enough to check
Note that embedded generator is stored in the parent generator,
and because they only live before their yield points perspectively and of the same size,
embedded generators reuse the same field of the
In assembly code, you can see that field mapping to variable is:
+0 : %"alloc::string::String" -> str
+24 : i32 -> val
+28 : [4 x i8] -> state discriminator
async fn body -> embedded generator for foo() or bar()1 byte
[7 x i8]padding
+40 : %"alloc::string::String" -> str2
val2 is a variable after the last yield point, so it’s on the stack but not in the state.
From the program output, you can confirm the address relationship too:
hello, world! &str: 0x7ffc9e68c9b8, &val: 0x7ffc9e68c9d0 &str2: 0x7ffc9e68c9e0, &val2: 0x7ffc9e68c6ec
.LJTI38_0 is the memory label where stores the offsets of state resume entries.
The assembly code uses
%rip relative addressing to locate the entry address:
You can see that
to_string() saves the result in
192(%rsp), where is field address of
Then it calls
foo() to get and poll its generator.
After the completion of the
foobar(), it sets the discriminator to
1 means the next resume will jump to
.LBB38_3, which will panic the program.
Multi-variant layouts for generators
As said, each state of generator has its own variant structure layout. For non-overalpped fields of two consecutive states, memory will be reused.
In async function, the compiler needs to determine the local variables as stack variables or state fields, via liveness analysis from MIR.
- stack variables will not be saved in the state
- some state fields may be gone after this state.
Let’s talk about some common cases.
Normally, variables live until the end of a function, even if you do not use it after declaration. So, for temporary variables you don’t want to save in the state, you need to scope them.
For example, if you don’t use
str after printing it, you need to scope it:
str is no longer one of the fields in the state.
It’s especially meaningful for thread-safe generator, which means the async function will be scheduler by multi-threading executor.
Traits like Send and Sync are automatically implemented for a Generator depending on the captured variables of the environment. Unlike closures, generators also depend on variables live across suspension points. This means that although the ambient environment may be Send or Sync, the generator itself may not be due to internal variables live across yield points being not-Send or not-Sync.
MutexGuard is not
then it cannot be used with tokio multi-thread scheduler.
You need to scope it:
When you move out a variable, it will not be saved in the state anymore.
drop is actually a
so when you drop a variable manually, it’s execlued from the state.
So far, rust compiler will keep copied variables in the state even if they are actually not used later.
You could test it:
drop will copy for types implements
Copy trait, although it’s no-op.
This effectively does nothing for types which implement Copy, e.g. integers. Such values are copied and then moved into the function, so the value persists after this function call.
From llvm IR, we could see that
val is still there:
As known, shadowed variables will not be dropped until they go out of scope.
From the llvm IR, we could see that shadowed value
"foo" still occupies one field of the state:
How Pin looks like in assembly code?
Pin is some kind of smart pointer,
which applies some constraint that the compiler should not
move its address or
swap its content,
so that self-referenced struct is feasible, which is common case in async function,
because we most likely use variable
references in the function body.
Pin does not appears in the assembly code, even
and field projection will be optimized out.