Alon Zakai/@kripken
Many kinds constantly being worked on: Object layout (hidden classes, etc.), garbage collection (generational GC, etc.), etc. etc.
Here I'll focus only on type-based optimizations
Many benchmarks (most of kraken, parts of octane, etc.) test pure computation
Optimizations include type profiling, type inference, type specialization
Prominent in modern JS VMs (optimizing JITs like crankshaft, ionmonkey, etc.)
I'll discuss three areas of optimization, all of which only make sense given type information:
JavaScript numbers are 64-bit doubles, but VMs has 32-bit ints as well!
Possible due to a mathematical fact about integer and double math:
xint +int yint
==
int( double(xint) +double double(yint) )
Say we are adding two values known to be 32-bit ints:
xint + yint == xint +int yint ,
assuming no overflow happened
Thankfully overflow checks are cheap, but can do even better:
(xint + yint) | 0 == xint +int yint
We can do the same for 32-bit floats!
xf32 +f32 yf32
==
f32( f64(xf32) +f64 f64(yf32) )
Also -,*,/,sqrt,
etc. (but not everything, e.g. not sin
)
Only true because of the exact definition of IEEE754 doubles and floats (# of bits in exponent, definition of rounding, etc.)
Typed arrays, e.g. stuff like this is common in WebGL code
var f32 = new Float32Array(1000);
for (var i = 0; i < 1000; ++i) {
f32[i] = f32[i] + 1;
}
Optimizable :)
Fragile, though - needs to be cast back to float32 immediately
var f32 = new Float32Array(1000);
for (var i = 0; i < 1000; ++i) {
f32[i] = 2*f32[i] + 1;
}
Not optimizable, and also no parallel to overflow checks that ints can use :(
Math.fround: explicitly cast a number to 32-bit float precision
var f32 = new Float32Array(1000);
for (var i = 0; i < 1000; ++i) {
f32[i] = Math.fround(2*f32[i]) + 1;
}
Optimizable :)
Parallel to |0
which as mentioned before can help optimize integer math
In ES6, already in Firefox, Safari, Chrome (behind a flag), easy to polyfill:
Math.fround = Math.fround || (function() {
var temp = new Float32Array(1);
return function fround(x) {
temp[0] = +x;
return temp[0];
}
})();
or
Math.fround = Math.fround || function(x) { return x };
Matrix Inversions | Matrix Graphics | Exponential | FFT | |
x86 | 33% | 60% | 30% | 16% |
ARM | 38% | 38% | 24% | 33% |
% speedup with float optimizations (higher is better; source)
Less precise, often less CPU cycles
Half the size, fewer cache misses
WebGL code often uses float32s, doing computation in float32 avoids conversions
Not yet, but possible: more registers/SIMD lanes
Modern JS VMs have powerful optimizing JITs that operate on typed code
But many heuristics lie between source code and actually being fully typed and optimized
What if we made some JS code really really easy to optimize, with as few heuristics as possible?
asm.js is one way to do that
asm.js is a subset of JavaScript (not a new language, no new semantics, all 100% backwards compatible), written in a format that is extremely easy to optimize
E.g., the |0
pattern mentioned before:
function twice(x) {
x = x | 0;
return x + x | 0;
}
No need to profile types, can be optimized ahead of time (AOT) from source
function asmCode(global, env, buffer) {
'use asm';
var HEAP = new global.Uint8Array(buffer);
function fib_like(x) {
x = x|0;
if ((x >>> 0) < 2) return HEAP[x]|0;
return (fib_like(x-2|0)|0) + (fib_like(x-1|0)|0) | 0;
}
return fib_like;
}
Low-level and mostly inconvenient to write by hand - primarily intended as a compiler target
Emscripten is an open source LLVM-based compiler that generates asm.js from C/C++
Can compile existing C/C++ codebases, or write new ones - fits in with the "compile-to-js" movement (CoffeeScript, etc.)
Lots of interest from the games industry: Top-tier game engines starting to support HTML5 as a compilation target, using Emscripten to emit asm.js
Unity - Replacement for their popular web plugin
Unreal Engine - HTML5 as new target platform
& many others
Should see stuff like this ship on the web later this year, will be interesting times!
Around 1.5x slower than native (67% of native speed)
As we saw, sufficient even for AAA game engines
AOT - full optimizing JIT always run, using full type info
AOT done in parallel, results cached
No need for bailouts, we know many corner cases are impossible, etc.
On 64-bit, can do zero-cost bounds checks using a signal handler
OdinMonkey is the optimization module for asm.js in SpiderMonkey
Fairly small amount of code
Mostly just type checks and sends the results into the normal optimizing JIT
We saw that JavaScript can get very close to native performance
When you get that close, every little thing starts to become significant, even register allocation
Inspired by LLVM's Greedy Register Allocator (added in LLVM 3.0)
LLVM results: 1-2% smaller code, up to 10% faster
Takes longer to run - and regalloc is already among the slowest phases - but worth it
Use priority queue of live ranges, instead of linear scan
If no free physical register, can backtrack - undo a previous allocation, if we need it more
Split live ranges to ensure physical registers for loops and other important places
SpiderMonkey followed LLVM's progress and in fact mirrored it
Like LLVM, began with a linear scan register allocator, improved it until it became too complex, then wrote a backtracking allocator
LLVM's experience and design has been very informative!
Heuristics for when to split, whether to spill or split, etc.
SpiderMonkey values do not always require a physical register (e.g. for snapshots), can split live ranges based on that
Most benchmark show little or no difference. But when there is a difference, it can be large!
21% speedup on octane-zlib on x86
7% speedup on x86-64, 11% speedup on ARM (more registers anyhow)
Other benchmarks: 29% speedup on emscripten-life on x86
Enabled a few weeks ago on asm.js code, where it makes the most difference (closest to native speed) and we AOT
Heuristics need tuning before enabling everywhere, as compilation is slower, hopefully soon
Topics covered: