Benchmarks will take a while to run, and some browsers may become less responsive for part of that time.
|Benchmark||Normalized Result (higher is better)||What it measures|
Main Thread Responsiveness measures the user experience as a large codebase is loaded. What is tested is whether the main thread stalls as the codebase is prepared and executed for a short while. The score here can be improved by parsing the code off the main thread, for example. This does not measure how much time is spent, but only how responsive or unresponsive the user experience is (how much time is spent is measured by Preparation, and to some extent Throughput). Technically, we measure responsiveness by seeing if events on the main thread execute at the proper interval (as when the main thread stalls, it stalls both the user experience and other events).
Throughput measures how fast a large computational workload runs. This is what is typically measured by benchmarks. Massive's throughput tests focus on very large real-world codebases.
Preparation measures how long (in wall time) is spent to get a codebase ready to execute, before executing any of it. This measures how much time passes between adding a script tag with that code and being able to call the code (this may or may not cause a user-noticeable pause, depending on whether it is parsed on or off the main thread; Main Thread Responsiveness tests that aspect). "Preparation" is basically all the time before code is actually able to run; that may include parsing, conversion to bitcode, JIT compilation, etc., depending on the JS engine.
Variance measures how variable the frame rate is in an application that needs to run in each frame (this is important in things like games, which must finish all they need every 1/60 of a second, in order to be smooth). Specifically, we run many frames and then calculate the statistical variance and worst case. Note that one VM might have a much faster overall frame rate than another, but also more variance: in general, given two VMs with the same average, the one with less variance is "better" since it's smoother. But given a different mean, things are less clear (perhaps we are happy to get some average slowdown in order to reduce variance which can cause noticable but rare pauses?). Hence we measure variance separately from throughput (which is a measurement of the total speed, and is proportional to the average).
Most of the tests, in particular the throughput ones, are generally very consistent, as we run a deterministic workload in a web worker, which minimizes outside noise. We also run a few repetitions and average the results. However, in particular the Main Thread Responsiveness tests need to run on the main thread, and they involve DOM events like adding a script tag, setInterval, etc., which can be fairly variable. We run a larger amount of repetitions on those tests to average out the noise, but even so they appear to be less consistent between runs on some browsers.
When we see the results of a test are too variable, we mark it with "(±X%!)" next to the score. The cause of such variability might be something else on your machine (perhaps a background indexing service happend to use a CPU core during a test, etc.), or it might be that the browser behaves unpredictably for some reason.
All the benchmarks here are from real-world C or C++ codebases:
All of these codebases are open source, so you can build and inspect them yourself (the build tool, Emscripten, is of course open source as well).
Note that the KLOC numbers mentioned above do not include system libraries like libc and libc++, the necessary parts of which are necessarily included in the benchmarks.
Generally quite a while, as it is designed to execute fixed workloads of sufficient length to measure real-world performance on large applications. How long it takes will depend on the machine and browser, of course, but you can probably expect it to take at least a few minutes (on a desktop or laptop machine; a mobile device may take much more). Massive should not lock up your browser as it runs, however - except for the Main Thread Responsiveness tests, which run first, benchmarks are run in web workers (and even the Main Thread Responsiveness tests should not reduce responsiveness very much). Note that results of individual benchmarks show up when ready, so you can view those before all of Massive is complete.
Sure, use a URL like
index.html?box2d-throughput,box2d-throughput-f32 to run just those benchmarks.
Some calculations have an "absolute optimal" result. For example, Variance measures how variable the frame rate is. If the frame rate is practically still - no jumping around at all - then the result is the maximum score of 10,000. For practical reasons, there is an absolute threshold: In the case of Variance, anything under 5ms is considered perfect; this avoids large differences between results like 2ms and 4ms (double the variance in the second!), because 5ms is already so small as to be below the threshold of noticeability.