Asyncify in Practice

Alon Zakai

Jan 2022

Wasm Stack Switching Subgroup

Goals

Overview of Asyncify
How it is used
Lessons from those experiences

WebAssembly has a simple execution model

1. Synchronous code executes synchronously.


;; this completes fully before the next line
(call $something-async)
(call $after)


// source code compiled to wasm
fread(buffer, 1, num, file);
// the data is ready to be used right here, synchronously

But on the Web,


const result = fetch("http://example.com/data.dat");
// result is a Promise; the data is not ready yet!

😔

As a workaround, can preload/embed files.

2. The call stack and locals are not visible in "userspace".


(func $foo
  (local $x i32)
  (local $y i32)
  (local $z i32)
  ..
  ;; no way to observe all the local values here,
  ;; or calls up the stack
  (call $conservative-gc)
)

As a workaround, on the Web you can do a GC when nothing is on the stack which, outside of a Worker, must happen repeatedly.

But that does not help various use cases: 😔

Workers with no pauses
many allocations in a single event
off the Web

The solution

WebAssembly support for stack switching and stack walking! 🚀

But we need something while we wait...

Asyncify

Asyncify is a transformation that can be run on a wasm module. It adds support for unwinding and rewinding the call stack.

Asyncify is implemented as a generic Binaryen pass (not specific to any particular toolchain) and running it is as easy as


wasm-opt input.wasm --asyncify -O -o output.wasm

The Asyncify API


(import "asyncify" "start_unwind"
    (func $asyncify_start_unwind (param i32)))

(import "asyncify" "stop_unwind"
    (func $asyncify_stop_unwind))

(import "asyncify" "start_rewind"
    (func $asyncify_start_rewind (param i32)))

(import "asyncify" "stop_rewind"
    (func $asyncify_stop_rewind))

The parameter refers to a buffer in linear memory, used to store the serialized call stack and locals.

Simply call "start unwind" from any point in the program, then "stop unwind" when you reach the point you want to stop at.

Later, call "start rewind". Then call the first function on the previous call stack, which begins the rewind, ending up where it was "paused" before. When you get there, call "stop rewind".

That's it!

Example:


(func $sleep
  (if
    (i32.eqz (global.get $sleeping))
    (block
      ;; Start to sleep.
      (global.set $sleeping (i32.const 1))
      (i32.store (i32.const 16) (i32.const 24))    ;; define
      (i32.store (i32.const 20) (i32.const 1024))  ;; stack
      (call $asyncify_start_unwind (i32.const 16)));; info
    (block
      ;; Resume after sleep, when we are called the 2nd time
      (call $asyncify_stop_rewind)
      (global.set $sleeping (i32.const 0)))))


  (func $main
    (call $print (i32.const 1))
    (call $sleep)
    (call $print (i32.const 3)))
  (func $runtime
    ;; Call main the first time, let the stack unwind.
    (call $main)
    (call $asyncify_stop_unwind)
    ;; We could do anything we want around here while
    ;; the code is paused!
    (call $print (i32.const 2))
    ;; Set the rewind in motion.
    (call $asyncify_start_rewind (i32.const 16))
    (call $main))

The key idea is what we call main twice, the second time to start to rewind.

Using Asyncify in Practice

By default, 50% or so code size and speed downside. (However, the worst case can be far worse.)

This ~50% is achieved by avoiding instrumentation where the optimizer can prove it is not needed.

With manual listing of relevant functions, can reduce overhead a lot more in many cases.

More on Speed

Splitting functions into pieces (CPS-style) is one way to implement pause/resume. Asyncify takes a different approach because of wasm's nature:

Structured control flow
Code size is critical
Indirect calls are slow

Asyncify preserves wasm structure and adds instrumention on top.


// conceptual code
function foo() {
  if (rewinding) { ..restore locals.. }
  unwind: {
    if (!(rewinding && notYet())) {
      call();
      if (unwinding) break unwind;
      ...
    }
    ...
    return;
  }
  ..save locals..
}

Those added ifs are often well-predicted branches!

Asyncify in production today

1. Emscripten

Integration at the JavaScript level. Could use Promise API.

Build with -s ASYNCIFY

Enabled APIs: emscripten_sleep(), emscripten_scan_registers(), etc.

Used in e.g. DOSBox, Doom 3, test suites, etc.


  while (1) {
    // Main game loop
    input();
    physics();
    rendering();
    audio();

    // Necessary on the web!
    emscripten_sleep(0);
  }

Asyncify in production today

2. TinyGo

Integration at the Wasm level.

Used to implement goroutines

Asyncify in production today

3. Ruby's Wasm/WASI port

Integration at the Wasm level.

Used for setjmp/longjmp and conservative stack scanning (PR).

Asyncify: original goals vs practice

As the name shows, the original use case was just the sync/async problem.

But in practice, setjmp/longjmp as well as conservative GC have become important too.

More detail: setjmp / longjmp

A tricky operation, especially in the setjmp()-ing function itself:


function foo() {
  while (1) {
    a();
    if (setjmp()) {
      b();
    } else {
      c();
    }
  }
  if (d()) longjmp(); // into the middle of a loop!
}

Aside from that, we need to unwind back to the setjmp()-ing function. Fairly easy to do if you carry around a boolean "am I unwinding" at calls, conceptually like this everywhere:


function foo() {
  ...
  result, unwinding = bar();
  if (unwinding) return;
  ...
}

Wasm exception handling and/or stack switching will be the proper solution. But what can we do for now?

Historically Emscripten relied on LLVM to transform the setjmping function, and on JavaScript to provide unwinding (less code size without an extra boolean everywhere).

But we have lacked a solution for pure Wasm environments like WASI.

I gave a wasm meetup talk about Asyncify back in 2019, with a demo there of using Asyncify to implement setjmp/longjmp in pure wasm:

pause at the setjmp
resume it immediately
resume it again when longjmp is called

Someone from the audience mentioned that it might be useful for WASI. Great to see that now used in Ruby in production!

More detail: conservative GC

Pretty simple with Asyncify:

pause
scan the pause/resume buffer, which now contains all the locals up the stack
resume

Pause/resume adds overhead, but probably ok given how GCs work.

Relevance to stack switching

That people are willing to use Asyncify with the 50% overhead shows the need!

Relevance to stack switching?

Multiple resumes (setjmp/longjmp) may not be relevant since LLVM + Wasm EH already provide a solution for that. (But maybe other use cases?)

Relevance to stack switching?

Stack scanning (conservative GC) is a separate problem, but perhaps worth thinking about how that interacts? E.g. scanning a paused stack may be enough - is that simpler?

Relevance: Minimalism

Asyncify defines only 4 simple, low-level functions:

start_unwind(buf)		stop_unwind()
start_rewind(buf)		stop_rewind()

The toolchain/VM engineer builds on top of those (e.g. handling state, value passing, types, etc.).

This simple API works well in those three very different toolchains, Emscripten, TinyGo, and Ruby.

That exact API can't work for the spec, but perhaps we can find something similarly minimalistic?

Thank you for listening!

Questions / thoughts?