Asynchronous programming or async for short, is a concurrent programming model supported by an increasing number of programming languages. It lets you run a large number of concurrent tasks on a small number of OS threads, while preserving much of the look and feel of ordinary synchronous programming, through the async/await syntax.
How rust implements the asynchronous programming?
Useful references:
https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html
https://cfsamson.github.io/books-futures-explained/introduction.html
OS threading
The most common and easy way is OS threading.
But the most disadvantage is resource consumption, especially context switching overhead.
Green thread (stackful coroutine)
Just like Golang, rust provides green thread before it hits 1.0.
Callback/Promise
Just like Netty and Javascript, the callback based dispatching is a main method to implement async.
But callback hell, fragmented program logic and data sharing are the nightmare.
Async/Await (stackless coroutine)
Just like Python and Javascript, async/await is a popular async way.
Advantages:
- The programming logics are linear, which is line with human thinking habits
- No need to manage stack grow and shrink
- No need to save CPU state
- Reuse generator mechanism
Future trait
The Future
trait is at the center of asynchronous programming in Rust.
A Future
is an asynchronous computation that can produce a value.
|
|
So, all implementations of async runtime, e.g. tokio, take Future
as the schedule unit:
|
|
Future
itself provides Promise
style async programming, aka combinators.
In fact, before async/await borns, the crate Futures
0.1 used combinators.
Example:
|
|
Moreover, you can wrap futures into a high-level future, to implement new logic.
For example, write a Select
top-level Future
to run two sub-Future
s simultaneously:
|
|
How Async/Await looks like in assembly code?
Async function is just a more friendly presentation of Future
.
Combined with .await
, you can compose different Future
s together at ease.
Each async function would be compiled into a special
generator,
which implements Future
trait.
State machine
In the compiler, generators are currently compiled as state machines. Each yield expression will correspond to a different state that stores all live variables over that suspension point. Resumption of a generator will dispatch on the current state and then execute internally until a yield is reached, at which point all state is saved off in the generator and a value is returned.
Let’s take a simple example:
|
|
We have a top-level async function foobar()
, which calls async functions foo()
and bar()
.
Let’s use playground to compile the code into assembly code in debug mode.
Check the assembly code of foobar()
generator:
|
|
State data structure
The first argument %rdi
is the generator type itself, which contains state data, saved in the stack 192(%rsp)
.
This type has a number of states (represented here as an enum) corresponding to each of the conceptual states of the generator. At the beginning we’re closing over our outer variable foo and then that variable is also live over the yield point, so it’s stored in both states.
What’s the structure of the state data? Well, let’s check it in llvm IR:
|
|
Suspend0
is the enum variant lives across the first yield point foo().await
,
and Suspend1
is the enum variant lives across the second yield point bar().await
.
In my program, all local variables lives till the end, so Suspend1
is a superset of Suspend0
,
so it’s enough to check Suspend1
.
Note that embedded generator is stored in the parent generator,
and because they only live before their yield points perspectively and of the same size,
embedded generators reuse the same field of the foobar
generator.
In assembly code, you can see that field mapping to variable is:
+0 : %"alloc::string::String" -> str
+24 : i32 -> val
+28 : [4 x i8] -> state discriminator
async fn body -> embedded generator for foo() or bar()
1 byte[7 x i8]
padding+40 : %"alloc::string::String" -> str2
Note that val2
is a variable after the last yield point, so it’s on the stack but not in the state.
From the program output, you can confirm the address relationship too:
hello, world! &str: 0x7ffc9e68c9b8, &val: 0x7ffc9e68c9d0 &str2: 0x7ffc9e68c9e0, &val2: 0x7ffc9e68c6ec
State resume/yield
.LJTI38_0
is the memory label where stores the offsets of state resume entries.
|
|
The assembly code uses %rip
relative addressing to locate the entry address:
|
|
Let’s check .LBB38_2
:
|
|
You can see that to_string()
saves the result in 192(%rsp)
, where is field address of str
.
Then it calls foo()
to get and poll its generator.
|
|
After the completion of the foobar()
, it sets the discriminator to 1
.
|
|
Offset 1
means the next resume will jump to .LBB38_3
, which will panic the program.
|
|
Multi-variant layouts for generators
As said, each state of generator has its own variant structure layout. For non-overalpped fields of two consecutive states, memory will be reused.
In async function, the compiler needs to determine the local variables as stack variables or state fields, via liveness analysis from MIR.
- stack variables will not be saved in the state
- some state fields may be gone after this state.
Useful references:
https://tmandry.gitlab.io/blog/posts/optimizing-await-2/
https://github.com/rust-lang/rust/pull/59897
Let’s talk about some common cases.
Scoped
Normally, variables live until the end of a function, even if you do not use it after declaration. So, for temporary variables you don’t want to save in the state, you need to scope them.
For example, if you don’t use str
after printing it, you need to scope it:
|
|
Then str
is no longer one of the fields in the state.
|
|
It’s especially meaningful for thread-safe generator, which means the async function will be scheduler by multi-threading executor.
Traits like Send and Sync are automatically implemented for a Generator depending on the captured variables of the environment. Unlike closures, generators also depend on variables live across suspension points. This means that although the ambient environment may be Send or Sync, the generator itself may not be due to internal variables live across yield points being not-Send or not-Sync.
For example, MutexGuard
is not Send
,
then it cannot be used with tokio multi-thread scheduler.
You need to scope it:
|
|
Move
When you move out a variable, it will not be saved in the state anymore.
Interestingly, drop
is actually a move
,
so when you drop a variable manually, it’s execlued from the state.
|
|
Copy
So far, rust compiler will keep copied variables in the state even if they are actually not used later.
You could test it:
|
|
Yes, drop
will copy for types implements Copy
trait, although it’s no-op.
This effectively does nothing for types which implement Copy, e.g. integers. Such values are copied and then moved into the function, so the value persists after this function call.
From llvm IR, we could see that val
is still there:
|
|
shadowed variables
As known, shadowed variables will not be dropped until they go out of scope.
|
|
From the llvm IR, we could see that shadowed value "foo"
still occupies one field of the state:
|
|
How Pin looks like in assembly code?
Pin
is some kind of smart pointer,
which applies some constraint that the compiler should not move
its address or swap
its content,
so that self-referenced struct is feasible, which is common case in async function,
because we most likely use variable reference
s in the function body.
Pin
does not appears in the assembly code, even Deref
and field projection will be optimized out.