Alon Zakai / @kripken
All major web browsers are written in C++
For the obvious reasons:
fast, familiar, library support
For the same reasons, people want to use C++ to write web content too, that is, websites
That's what this talk is about
Largest open platform in existence
Modern, standards-compliant websites are built using HTML, CSS, and JavaScript (JS) |
No C++ there :(
What about non-standardized approaches (ActiveX, Flash/Alchemy, PNaCl/PPAPI)?
Plugins and proposals for entirely new technologies on the web have failed to reach significant adoption or standardization, for both technical and non-technical reasons
And plugins are on the way out (no plugins on iPhone/iPad, etc.)
This is a good trend - standardization is why websites work on both your laptop and your phone
...where does that leave C++, then?
Well, JavaScript is already standardized, so how about if we... compile C++ into that?
This has happened with many languages, in fact:
JavaScript is a dynamic scripting language
var x = 42;
var y = "a string";
var z = x + y; // z = "42a string"
eval("z = z.substr(1, 2)"); // z = "2a"
[1, "two", { three: 3 }].forEach(function(item) {
if (typeof item === typeof z) console.log([z, item]);
}); // emits ["2a", "two"]
Kind of a weird compiler target...
But from the developer's point of view, compiling to JavaScript can be very conventional!
First, a reminder of compiling to a native executable:
// hello.cpp
#include <iostream>
int main() {
std::cout << "hello, world!" << std::endl;
}
$ g++ hello.cpp -o a.out
$ ./a.out
hello, world!
Compiling to JavaScript using Emscripten:
$ em++ hello.cpp -o a.html
$ firefox a.html # or any other browser
Here's the output, running in an iframe right here on this web page:
emcc, em++ are drop-in replacements for a native C or C++ compiler, workflow is almost identical
Open source (MIT license) LLVM-based C++ to JavaScript compiler
C++ ⇒ LLVM ⇒ Emscripten ⇒ JavaScript
Emscripten builds on the LLVM family of projects:
clang C++ frontend
LLVM optimizer
libc++ C++ standard library
libc++abi low-level C++ support
Currently an out-of-tree fork of LLVM, but we hope to get upstream eventually
Hybrid libc: musl + parts written in JavaScript
Implementations of SDL, OpenGL, etc., using Web APIs
You might be curious at this point what the emitted code looks like...
// C++
int func(int *p) {
int r = *p;
return calc(r, r << 16);
}
⇒ Emscripten ⇒
// JavaScript
function func(p) {
var r = HEAP32[p >> 2];
return calc(r, r << 16);
}
Almost direct mapping in many cases
Another example:
float array[5000]; // C++
int main() {
for (int i = 0; i < 5000; ++i) {
array[i] += 1.0f;
}
}
⇒ Emscripten ⇒
var buffer = new ArrayBuffer(32768); // JavaScript
var HEAPF32 = new Float32Array(buffer);
function main() {
var a = 0, b = 0;
do {
a = (8 + (b << 2)) | 0;
HEAPF32[a >> 2] = +HEAPF32[a >> 2] + 1.0;
b = (b + 1) | 0;
} while ((b | 0) < 5000);
}
This "style" of code is a subset of JS called asm.js, which we'll discuss more later
So that's what the code can look like. But there are some fundamental differences here...
C++ | ||
Single build prevents some optimizations
C++ | ||
C++ | ||
Applications must ship their own system libraries
We "fake" a filesystem to make porting easy
JS sandboxing helps in some unexpected ways!
Remember that we implement C++ functions using JS functions:
// Simple C++ function compiled to JavaScript
function func(p) {
var r = HEAP[p];
return calc(r, r << 16);
}
The JS call stack is managed, and unobservable/unmodifiable by executing code
Compiled C++ is therefore immune to some types of buffer overflow attacks
C++ | ||
We build for a 32-bit target, because 64-bit integers cannot all fit in doubles (but 32-bit ones can)
C++ | ||
But without good and predictable performance, this is pointless...
Historically, JS began as a slow interpreted language
Competition ⇒ type-specializing JITs
Those are very good at implicitly statically typed code
function add(x, y) {
x = x | 0; // | 0 => int32
y = y | 0;
return (x + y) | 0; // int32 addition!
}
That's what asm.js is: a subset of JavaScript where all the operations are clearly statically typed
Memory access
var buffer = new ArrayBuffer(32768);
var HEAP8 = new Int8Array(buffer);
var HEAP16 = new Int16Array(buffer);
var HEAP32 = new Int32Array(buffer);
function mem_access() {
return HEAP32[HEAP8[100] >> 2];
}
Loads in C++ become reads from typed arrays in JS, which become loads in machine code
Emscripten's memory representation/layout is identical to LLVM's, including aliasing, so can use all LLVM opts
Ok, we've just seen some encouraging things about speed, but before we saw some scary things too...?
source: awfy; lower numbers are better
Overall, performance is around 50-67% of native speed, and still improving
Missing pieces remain, like SIMD, but work is underway in the standards bodies
Already fast enough for many applications, even performance sensitive ones like games
In fact, the game industry has been an early adopter of compiling C++ to JavaScript, using Emscripten:
Unity
Unigine
Minko
Torque 2D
Unreal
Nebula3
Cocos2D-X
Godot
etc.
Products are shipping
Adoption and usage in production show that while JS is a weird compiler target, the results can be robust and reliable
One way we work towards that is fuzzing using csmith; not currently aware of any Emscripten-specific bugs
While there are differences between browsers, having a single build for all of them improves reliability
Emscripten supports practically all C++ features, because clang does
But exception handling isn't something we just get for free
Emscripten supports C++ exceptions... differently
// C++
void func() {
try {
something();
} catch (Type T) {
handle(T);
}
}
⇒ Emscripten ⇒
// JS
void func() {
invoke(10); // call a function pointer, checking for throw
var T = get_thrown();
if (T) {
if (can_handle(T, 400)) { // 400 -> typeid of Type
handle(T);
} else {
do_throw(T);
}
}
}
// JS
function invoke(ptr) {
__thrown__ = 0;
try {
dyn_call(ptr);
} catch (e) {
__thrown__ = e;
}
}
function can_handle(ptr, type) {
// call into libc++abi internals
}
function do_throw(ptr) {
throw ptr;
}
We implement C++ exceptions using JS exceptions, JS VM provides stack unwinding
Perf depends on the speed of JS exceptions
We can compile C++ into JavaScript and run it on the web, in a fast and standards-compliant way
JavaScript is a weird - but fun! - compiler target
Recall that we represent memory using a single flat array
Pointers are indexes into the array
var buffer = new ArrayBuffer(32768);
var HEAP8 = new Int8Array(buffer);
function compiledCode(ptr) {
HEAP[ptr] = 12; // write to an address
return HEAP[ptr + 4]; // read from an address
}
Which is basically how C and C++ see memory: a pointer can point anywhere in all of memory
But this is not how languages like JavaScript, C#, Java, Python etc. see memory
Each object or array in those languages is in its own "space", which is bounds-checked, and pointers cannot point to anywhere, they are references to distinct objects
This isn't just an academic point!
// same as before
var buffer = new ArrayBuffer(32768);
var HEAP8 = new Int8Array(buffer);
function compiledCode(ptr) {
HEAP8[ptr] = 12;
return HEAP8[ptr + 4];
}
// a new function
function getInput(array) {
// array is a NORMAL JavaScript array
// compiledCode cannot refer to it!
// must *copy* into the HEAP
var copy = malloc(array.length);
HEAP8.set(array, copy);
// 'copy' is now a pointer to a copy of 'array'
return compiledCode(copy);
}
Ideas?