Big Web App?
Compile It!

Alon Zakai / Mozilla

Compiling to JavaScript

JavaScript is standards-based and the only language that runs in all web browsers

You can run only JavaScript in browsers, but you can write in another language - if you compile it to JavaScript

First set of demos




Compiling to JavaScript:
Nothing New!

  • 2006: Google Web Toolkit (GWT), Java to JS
  • 2007: pyjamas, Python into JS


Plenty of Languages

  • Jeremy Ashkenas's list has many dozens
  • Two main kinds:
    • Established languages: C, C++, Java, C#, Python, etc.
    • New languages intended for compilation to JavaScript: CoffeeScript, TypeScript, Dart, Roy, etc.
  • I'll focus on C and C++

I'll also focus on

Doesn't matter for all codebases, but tends to matter more in large ones

Why is compilation to JS useful?

1. Preference for other languages

  • Static typing
  • Existing tools

2. Performance

JavaScript engines have gotten fast enough to run large compiled codebases

Late 2008/early 2009: V8, TraceMonkey, and Nitro were released, and the race for JavaScript speed was on

That race enabled running large compiled codebases

Performance: Beyond "enabling"

Compiled JavaScript can be faster than "regular" handwritten JavaScript

Wait, compiled JavaScript is a subset of JavaScript! How can it be faster?

One Step Back:
How Compilation Works

    C/C++    =>    LLVM    =>    Emscripten    =>    JavaScript

LLVM Optimizations

LLVM's optimizer uses type information to perform many useful optimizations. Decades of work have gone into developing optimization passes for C/C++ compilers.

...dce, inline, constmerge, constprop, dse, licm, gvn, instcombine, mem2reg, scalarrepl...

LLVM Optimizations

These optimization are only available for compiled code!

Running them manually on a "normal" JavaScript codebase would be hard and make the code less maintainable

JavaScript Engine Optimizations - 1

Modern JavaScript engines infer types at runtime

This especially helps on code that is implicitly typed - which is exactly what compiled code is!

  function compiledCalculation() {
    var x = f()|0;  // x is a 32-bit value
    var y = g()|0;  // so is y
    return (x+y)|0; // 32-bit addition, no type or overflow checks

JavaScript Engine Optimizations - 2

Modern JavaScript engines optimize typed arrays very well

  var MEM8  = new Uint8Array(1024*1024);
  var MEM32 = new Uint32Array(MEM8.buffer); // alias MEM8's data

  function compiledMemoryAccess(x) {
    MEM8[x] = MEM8[x+10]; // read from x+10, write to x
    MEM32[(x+16)>>2] = 100;

Compiled C/C++ uses a typed array as "memory"


Realistic/large benchmarks

Can we do better?



asm.js (spec) is a research project at Mozilla that aims to formally define the subset of JavaScript that compilers like Emscripten and Mandreel already generate (typed arrays as memory, etc.)


  function strlen(ptr) { // calculate length of C string
    ptr = ptr|0;
    var curr = 0;
    curr = ptr;
    while (MEM8[curr]|0 != 0) {
      curr = (curr + 1)|0;
    return (curr - ptr)|0;
  • Ensure that ptr is always an integer
  • Read an integer from address curr
  • Additions and subtractions are all 32-bit


asm.js code avoids potential slowdowns in code: no variables with mixed types, etc.

asm.js code does only low-level assembly-like computation, precisely what compiled C/C++ needs (and hence the name)

asm.js - Formal type system benefits

Type check output of a C/C++ to JavaScript compiler

Type check input to a JavaScript engine at runtime

asm.js - Runtime Optimizations (1)

Variable types pop out during type checking. This makes it possible to do ahead of time (AOT) compilation, not only just in time (JIT)

asm.js - Runtime Optimizations (2)

JavaScript engine has a guarantee that there are no speed bumps - variable types won't change, etc. - so it can generate simpler and more efficient code

asm.js - Runtime Optimizations (3)

The asm.js type system makes it easy to reason about global program structure: function calls, memory access, etc.

How much faster are we
talking here..?


Realistic/large benchmarks

Seeing is believing



Code can run around 2X slower than native - comparable to Java, C# - and will get even faster

Optimizations can be done quickly and straightforwardly in existing JavaScript engines - not a new VM or JIT, just some additional optimizations to existing engines


Code is just a subset of JavaScript (like Crockford's "Good Parts") so already runs in all browsers

Not a new language

Trying asm.js now

Compile your code using emscripten with ASM_JS=1

Run it in a build of Firefox from this branch

Not supported yet: C++ exceptions, setjmp/longjmp

...Just C/C++?

C/C++ compiled to JavaScript can be fast (and even faster with asm.js). But what about other languages?

Not just C/C++!

Many languages can be compiled to C, C++ or LLVM IR, which means they can be compiled to JavaScript with the same approach and benefits

Java => C => JavaScript

Demo using XMLVM and Emscripten

Compiling to JavaScript:
Current state of the art

  • C/C++: Mature
  • Java, C#, Objective-C: Good
  • Python, Ruby, Lua: Needs work

Dynamic Languages

Entire C/C++ runtimes can be compiled and the original language interpreted with proper semantics, but this is not lightweight

Source-to-source compilers from such languages to JavaScript ignore semantic differences (for example, numeric types)

Fast Java, C#

Actually, Java and C# have a similar predicament: Both languages depend on special VMs to be efficient

Source-to-source compilers for them lose out on the optimizations done in those VMs

AOT compilers for them can at least gain LLVM-type optimizations - but still something is missing

A Unified Approach?

Should we compile entire VMs from C/C++ to JavaScript, and implement JavaScript-emitting JITs?

Seems the only way to run most languages with perfect semantics + maximum speed

This is why I believe C/C++ to JavaScript translation is the core issue regarding compilation to JavaScript


Statically-typed languages and especially C/C++ can be compiled effectively to JavaScript

Expect the speed of compiled C/C++ to get to just 2X slower than native code, or better, later this year

Thanks for listening!