Embenchen - the Emscripten benchmark suite

FAQ

Is this a meaningful benchmark?

The Emscripten benchmark suite has grown over time in the Emscripten project, with two goals: To pinpoint areas where Emscripten output is slower than native code, so those can be focused on, and to prevent regressions. This benchmark suite contains two sets of tests: microbenchmarks, which were mainly designed to test performance on specific issues, and are probably not representative of real-world code, and macrobenchmarks (bullet, box2d, lua, zlib) which are real-world code that we believe is important and interesting.

The macrobenchmark results, therefore, are meaningful and worth looking at. The microbenchmarks on the other hand are less important, but might still be interesting in some cases.

What can I compare to what here?

The benchmark changes over time as Emscripten and LLVM improve, so you cannot compare different versions of the benchmark to each other (see version number above). What you can do is to compare the same version, on the same machine, on different browsers.

Will my browser freeze up if I press the button?

Fear not! Each benchmark runs in a worker, so the browser will remain responsive.

How can I build these benchmarks myself?

Run tests/runner.py benchmark in Emscripten, after editing tests/test_benchmark.py to enable the line with "embenchen" on it (you can also disable other measurements there if you want, and can set DEFAULT_ARG to 0 to just build (and not measure).

What settings are these builds made with?

-O3 -profiling . That is a fully optimized build but not fully minified, so that you can inspect the source code, look at profiler measurements, etc. The minification not done here should only affect startup speed; throughput should still be maximal. (Don't be surprised by the source code size of the benchmarks; minified builds would be far smaller.)

Is startup time a factor in these tests?

The goal here is mainly throughput, not startup, mainly because we are interested in long-running computation-intensive code. Another reason is that it's not easy to measure "everything but startup", because JS engines may start to parse or even compile code as it is downloaded, so ignoring download may also ignore some parsing, etc. Therefore, to focus on throughput, these tests measure time before main() is to be called, and right after it.