Emscripten utilizes the SIMD.js specification to enable support for building SIMD code. While the specification does not expose direct access to native CPU SIMD instructions, it does offer the most basic operations that codebases commonly need. Given the nature of the SIMD.js specification, compiled SIMD code is platform-abstract and the generated .js files will run accelerated on both x86 SSE and ARM NEON enabled CPUs (and any other vector instruction sets that browsers might be targeting), although performance can vary.
For backwards compatibility, all generated SIMD code will embed a polyfill for all SIMD.js instructions. This allows Emscripten compiled pages to be run even on browsers that don’t have SIMD support available. However, due to the nature of the interoperation with asm.js, the SIMD.js polyfill will likely be much slower than scalar execution, so it is best to primarily target browsers that do support SIMD.js in hardware.
There are three different ways to generate code to benefit from SIMD instructions:
These three methods are not mutually exclusive, but may freely be combined.
- Emscripten does not support x86 or any other native inline SIMD assembly or building .s assembly files, but all code should be written to use SIMD intrinsic functions.
- The SIMD types supported by SIMD.js are Float32x4, Int32x4, Uint32x4, Int16x8, Uint16x8, Int8x16 and Uint8x16. In particular, Float64x2 and Int64x2 are currently not supported, however Float64x2 is emulated in software in the current polyfill. 256-bit or wider SIMD types (AVX) are not supported either.
- Even though the full set of SSE1, SSE2, SSE3 and SSSE3 intrinsics are supported, because of the platform-abstract nature of SIMD.js, some of these intrinsics will compile down to scalarized instructions to emulate. To verify which instructions are accelerated and which are not, examine the code in the platform headers xmmintrin.h and emmintrin.h.
- Currently the Intel x86 SIMD support is limited to SSE1, SSE2, SSE3 and SSSE3 instruction sets. The Intel x86 SSE4.1, SSE4.2, AVX, AVX2 and FMA instruction sets or newer are not supported. Also, the old Intel x86 MMX instruction set is not supported.
- SIMD.js does not have control over managing floating point rounding modes or handling denormals.
- Cache line prefetch instructions are not available, and calls to these functions will compile, but be treated as no-ops.
- Asymmetric memory fence operations are not available, but will be implemented as fully synchronous memory fences when SharedArrayBuffer is enabled (-s USE_PTHREADS=1) or as no-ops when multithreading is not enabled (default, -s USE_PTHREADS=0).
- Building ARM NEON -based intrinsics (#include <arm_neon.h>) code is not currently supported, but could be doable. Contributions on this front are welcome.
Emscripten repository has several tests for SIMD support. To run SIMD tests, execute e.g. “python tests/runner.py ALL.test_sse1_full” in Emscripten root directory. For the full list of tests, see test_simd* and test_sse* in test_core.py.
To run a synthetic SSE1 API benchmark, execute “python tests/benchmark_sse1.py” in Emscripten root directory.
SIMD-related bug reports are tracked in the Emscripten bug tracker with the label SIMD.