Emscripten & asm.js:
C++'s role in
the modern web



Alon Zakai / @kripken


All major web browsers are written in C++

For the obvious reasons:
fast, familiar, library support


For the same reasons, people want to use C++ to write web content too, that is, websites

That's what this talk is about

The Web


Largest open platform in existence


Modern, standards-compliant websites are built using HTML, CSS, and JavaScript (JS)

No C++ there :(

What about non-standardized approaches (ActiveX, Flash/Alchemy, PNaCl/PPAPI)?


Plugins and proposals for entirely new technologies on the web have failed to reach significant adoption or standardization, for both technical and non-technical reasons


And plugins are on the way out (no plugins on iPhone/iPad, etc.)


This is a good trend - standardization is why websites work on both your laptop and your phone

...where does that leave C++, then?


Well, JavaScript is already standardized, so how about if we... compile C++ into that?


This has happened with many languages, in fact:

  • Java
  • C#
  • Python
  • New languages like TypeScript
  • etc.

Compiling to JavaScript?


JavaScript is a dynamic scripting language


  var x = 42;
  var y = "a string";
  var z = x + y; // z = "42a string"

  eval("z = z.substr(1, 2)"); // z = "2a"

  [1, "two", { three: 3 }].forEach(function(item) {
    if (typeof item === typeof z) console.log([z, item]);
  }); // emits ["2a", "two"]
  

Kind of a weird compiler target...

But from the developer's point of view, compiling to JavaScript can be very conventional!


First, a reminder of compiling to a native executable:


  // hello.cpp
  #include <iostream>
  int main() {
    std::cout << "hello, world!" << std::endl;
  }  

  $ g++ hello.cpp -o a.out
  $ ./a.out
  hello, world!

Compiling to JavaScript using Emscripten:

  $ em++ hello.cpp -o a.html
  $ firefox a.html # or any other browser

Here's the output, running in an iframe right here on this web page:



emcc, em++ are drop-in replacements for a native C or C++ compiler, workflow is almost identical


Open source (MIT license) LLVM-based C++ to JavaScript compiler


C++  LLVM  Emscripten  JavaScript

Emscripten builds on the LLVM family of projects:


clang C++ frontend

LLVM optimizer

libc++ C++ standard library

libc++abi low-level C++ support


Currently an out-of-tree fork of LLVM, but we hope to get upstream eventually

Other libraries


Hybrid libc: musl + parts written in JavaScript


Implementations of SDL, OpenGL, etc., using Web APIs

You might be curious at this point what the emitted code looks like...


  // C++
  int func(int *p) {
    int r = *p;
    return calc(r, r << 16);
  }

Emscripten


  // JavaScript
  function func(p) {
    var r = HEAP32[p >> 2];
    return calc(r, r << 16);
  }

Almost direct mapping in many cases

Another example:

  float array[5000]; // C++
  int main() {
    for (int i = 0; i < 5000; ++i) {
      array[i] += 1.0f;
    }
  }

Emscripten

  var buffer = new ArrayBuffer(32768); // JavaScript
  var HEAPF32 = new Float32Array(buffer);
  function main() {
    var a = 0, b = 0;
    do {
      a = (8 + (b << 2)) | 0;
      HEAPF32[a >> 2] = +HEAPF32[a >> 2] + 1.0;
      b = (b + 1) | 0;
    } while ((b | 0) < 5000);
  }

This "style" of code is a subset of JS called asm.js, which we'll discuss more later

So that's what the code can look like. But there are some fundamental differences here...

Builds


C++
Need to recompile for another CPU or OS


JS
Single build runs the same everywhere


Single build prevents some optimizations

Undefined Behavior


C++
Has undefined behavior, compiler can use it to optimize


JS
No undefined behavior


dev machine
  |  
user machine
C++  ⇒  JS
  |  
JS  ⇒  Executable
  |  
NO undefined behavior

Security


C++
Applications can use the system libs, access the local filesystem, etc.


JS
Sandboxed, cannot see the machine it is running on


Applications must ship their own system libraries


We "fake" a filesystem to make porting easy

JS sandboxing helps in some unexpected ways!


Remember that we implement C++ functions using JS functions:


  // Simple C++ function compiled to JavaScript
  function func(p) {
    var r = HEAP[p];
    return calc(r, r << 16);
  }

The JS call stack is managed, and unobservable/unmodifiable by executing code


Compiled C++ is therefore immune to some types of buffer overflow attacks

Numeric Types


C++
char, short, int, int64, float, double


JS
double


We build for a 32-bit target, because 64-bit integers cannot all fit in doubles (but 32-bit ones can)

Perf Model


C++
C-style code maps closely to CPU, higher-level C++ aspects can use RAII, etc., giving predictability


JS
virtual machine (VM), just in time (JIT) compilers w/ type profiling, garbage collection, etc.

But without good and predictable performance, this is pointless...

Historically, JS began as a slow interpreted language


Competition ⇒ type-specializing JITs


Those are very good at implicitly statically typed code


  function add(x, y) {
    x = x | 0;          // | 0  =>  int32
    y = y | 0;
    return (x + y) | 0; // int32 addition!
  }
  

That's what asm.js is: a subset of JavaScript where all the operations are clearly statically typed

Memory access

  var buffer = new ArrayBuffer(32768);
  var HEAP8 = new Int8Array(buffer);
  var HEAP16 = new Int16Array(buffer);
  var HEAP32 = new Int32Array(buffer);

  function mem_access() {
    return HEAP32[HEAP8[100] >> 2];
  }

Loads in C++ become reads from typed arrays in JS, which become loads in machine code


Emscripten's memory representation/layout is identical to LLVM's, including aliasing, so can use all LLVM opts

Ok, we've just seen some encouraging things about speed, but before we saw some scary things too...?

Performance


Performance / time


source: awfy; lower numbers are better

Overall, performance is around 50-67% of native speed, and still improving


Missing pieces remain, like SIMD, but work is underway in the standards bodies


Already fast enough for many applications, even performance sensitive ones like games

In fact, the game industry has been an early adopter of compiling C++ to JavaScript, using Emscripten:


Unity

Unigine

Minko

Torque 2D

Unreal

Nebula3

Cocos2D-X

Godot

etc.


Products are shipping

Links to online demos from Unity:


         

Adoption and usage in production show that while JS is a weird compiler target, the results can be robust and reliable


One way we work towards that is fuzzing using csmith; not currently aware of any Emscripten-specific bugs


While there are differences between browsers, having a single build for all of them improves reliability

Emscripten supports practically all C++ features, because clang does


But exception handling isn't something we just get for free

Emscripten supports C++ exceptions... differently

  // C++
  void func() {
    try {
      something();
    } catch (Type T) {
      handle(T);
    }
  }

Emscripten

  // JS
  void func() {
    invoke(10); // call a function pointer, checking for throw
    var T = get_thrown();
    if (T) {
      if (can_handle(T, 400)) { // 400 -> typeid of Type
        handle(T);
      } else {
        do_throw(T);
      }
    }
  }

Here are those runtime functions:
  // JS
  function invoke(ptr) {
    __thrown__ = 0;
    try {
      dyn_call(ptr);
    } catch (e) {
      __thrown__ = e;
    }
  }
  function can_handle(ptr, type) {
    // call into libc++abi internals
  }
  function do_throw(ptr) {
    throw ptr;
  }

We implement C++ exceptions using JS exceptions, JS VM provides stack unwinding


Perf depends on the speed of JS exceptions

We can compile C++ into JavaScript and run it on the web, in a fast and standards-compliant way


JavaScript is a weird - but fun! - compiler target



That's it! Questions?


will tweet link to slides @kripken

http://emscripten.org     http://asmjs.org

Back to Memory


Recall that we represent memory using a single flat array

Pointers are indexes into the array

  var buffer = new ArrayBuffer(32768);
  var HEAP8 = new Int8Array(buffer);
  function compiledCode(ptr) {
    HEAP[ptr] = 12; // write to an address
    return HEAP[ptr + 4]; // read from an address
  }  

Which is basically how C and C++ see memory: a pointer can point anywhere in all of memory

But this is not how languages like JavaScript, C#, Java, Python etc. see memory


Each object or array in those languages is in its own "space", which is bounds-checked, and pointers cannot point to anywhere, they are references to distinct objects

This isn't just an academic point!

  // same as before
  var buffer = new ArrayBuffer(32768);
  var HEAP8 = new Int8Array(buffer);
  function compiledCode(ptr) {
    HEAP8[ptr] = 12;
    return HEAP8[ptr + 4];
  }
  // a new function
  function getInput(array) {
    // array is a NORMAL JavaScript array
    // compiledCode cannot refer to it!
    // must *copy* into the HEAP
    var copy = malloc(array.length);
    HEAP8.set(array, copy);
    // 'copy' is now a pointer to a copy of 'array'
    return compiledCode(copy);
  }  

Ideas?