.. _Building-Projects:

=================
Building Projects
=================

Building large projects with Emscripten is very easy. Emscripten provides two simple scripts that configure your makefiles to use :ref:`emcc <emccdoc>` as a drop-in replacement for *gcc* — in most cases the rest of your project’s current build system remains unchanged.


.. _building-projects-build-system:

Integrating with a build system
==================================

To build using Emscripten you need to replace *gcc* with *emcc* in your makefiles. This is done using *emconfigure*, which sets the appropriate environment variables like CXX (C++ compiler) and CC (the compiler).

Consider the case where you normally build with the following commands:

.. code-block:: bash

  ./configure
  make

To build with Emscripten, you would instead use the following commands:

.. code-block:: bash

  # Run emconfigure with the normal configure command as an argument.
  ./emconfigure ./configure

  # Run emmake with the normal make to generate linked LLVM bitcode.
  ./emmake make

  # Compile the linked code generated by make to JavaScript + WebAssembly.
  # 'project.o' should be replaced with the make output for your project, and
  # you may need to rename it if it isn't something emcc recognizes
  # (for example, it might have a different suffix like 'project.so' or
  # 'project.so.1', or no suffix like just 'project' for an executable).
  # If the project output is a library, you may need to add your 'main.c' file
  # here as well.
  # [-Ox] represents build optimisations (discussed in the next section).
  ./emcc [-Ox] project.o -o project.js


*emconfigure* is called with the normal *configure* as an argument (in *configure*-based build systems), and *emmake* with *make* as an argument. If your build system uses **CMake**, replace ``./configure`` with ``cmake .`` etc. in the above example. If your build system doesn't use configure or CMake, then you can omit the first step and just run ``make`` (although then you may need to edit the ``Makefile`` manually).

.. tip:: We recommend you call both *emconfigure* and *emmake* scripts in *configure*- and *CMake*-based build systems. Whether you actually **need** to call both tools depends on the build system (some systems will store the environment variables in the configure step, and others will not).

*Make* generates linked LLVM bitcode. It does not automatically generate JavaScript during linking because all the files must be compiled using the :ref:`same optimizations and compiler options <building-projects-optimizations>` — and it makes sense to do this in the final conversion from bitcode to JavaScript.

.. note::

  The file output from *make* might have a different suffix: **.a** for a static
  library archive, **.so** for a shared library, **.o** or **.bc** for object
  files (these file extensions are the same as *gcc* would use for the different
  types). Irrespective of the file extension, these files contain something that
  *emcc* can compile into the final JavaScript + WebAssembly (typically the
  contents will be wasm object files, but if you build with LTO then they will
  contain LLVM bitcode). If the suffix is something else - like no suffix at all, or
  something like **.so.1** - then you may need to rename the file before sending
  it to *emcc*.

.. note::

  Some build systems may not properly emit bitcode using the above procedure,
  and you may see ``is not a valid input file`` warnings. You can run ``file`` to
  check what a file contains (also you can manually check if the contents
  start with ``\0asm`` to see if they are wasm object files, or ``BC`` if they
  are LLVM bitcode). It is also worth running ``emmake make VERBOSE=1`` which
  will print out the commands it runs - you should see *emcc* being used, and
  not the native system compiler. If *emcc* is not used, you may need to modify
  the configure or cmake scripts.


.. _building-projects-build-outputs:

Emscripten build output files
=============================

Emscripten compiler output often consists of several files and not just one. The set of produced files changes depending on the final linker flags passed to `emcc/em++`. Here is a cheat sheet of which files are produced under which conditions:

 - `emcc ... -o output.html` builds a `output.html` file as an output, as well as an accompanying `output.js` launcher file, and a `output.wasm` WebAssembly file.
 - `emcc ... -o output.js` omits generating a HTML launcher file (expecting you to provide it yourself if you plan to run in browser), and produces two files, `output.js` and `output.wasm`. (that can be run in e.g. node.js shell)
 - `emcc ... -o output.bc` does not produce a final asm.js/wasm build, but stops at LLVM bitcode generation stage, and produces a single LLVM bitcode file `output.bc`. Likewise only one bitcode file is produced if output suffix is `.ll`, `.o`, '.obj', '.lo', `.lib`, `.dylib` or `.so`.
 - `emcc ... -o output.a` generates a single archive file `output.a`.
 - `emcc ... -o output.{html,js} -s WASM=0` causes the compiler to target asm.js, and therefore a `.wasm` file is not produced.
 - `emcc ... -o output.{html,js} -s WASM=0 --separate-asm` likewise targets asm.js, but splits up the generated code to two files, `output.js` and `output.asm.js`.
 - `emcc ... -o output.{html,js} --emit-symbol-map` produces a file `output.{html,js}.symbols` if WebAssembly is being targeted (`-s WASM=0` not specified), or if asm.js is being targeted and `-Os`, `-Oz` or `-O2` or higher is specified, but debug level setting is `-g1` or lower (i.e. if symbols minification did occur).
 - `emcc ... -o output.{html,js} -s WASM=0 --memory-init-file 1` causes the generation of `output.{html,js}.mem` memory initializer file. Pasing `-O2`, `-Os` or `-Oz` also implies `--memory-init-file 1`.
 - `emcc ... -o output.{html,js} -g4` generates a source map file `output.wasm.map`. If targeting asm.js with `-s WASM=0`, the filename is `output.{html,js}.map`.
 - `emcc ... -o output.{html,js} --preload-file xxx` directive generates a preloaded MEMFS filesystem file `output.data`.
 - `emcc ... -o output.{html,js} -s WASM={0,1} -s SINGLE_FILE=1` merges JavaScript and WebAssembly code in the single output file `output.{html,js}` (in base64) to produce only one file for deployment. (If paired with `--preload-file`, the preloaded `.data` file still exists as a separate file)

This list is not exhaustive, but illustrates most commonly used combinations.

.. _building-projects-optimizations:

Building projects with optimizations
====================================

Emscripten performs :ref:`compiler optimization <Optimizing-Code>` at two levels: each source file is optimized by LLVM as it is compiled into an object file, and then JavaScript/WebAssembly-specific optimizations are applied when converting object files into the final JavaScript/WebAssembly.

In order to properly optimize code, it is usually best to use the **same** :ref:`optimization flags <emcc-compiler-optimization-options>` and other :ref:`compiler options <emcc-s-option-value>` when compiling source to object code, and object code to JavaScript (or HTML).

Consider the examples below:

.. code-block:: bash

  # Sub-optimal - JavaScript/WebAssembly optimizations are omitted
  ./emcc -O2 a.cpp -c -o a.o
  ./emcc -O2 b.cpp -c -o b.o
  ./emcc a.o b.o -o project.js

  # Sub-optimal - LLVM optimizations omitted
  ./emcc a.cpp -c -o a.o
  ./emcc b.cpp -c -o b.o
  ./emcc -O2 a.o b.o -o project.js

  # Usually the right thing: The same options are provided at compile and link.
  ./emcc -O2 a.cpp -c -o a.o
  ./emcc -O2 b.cpp -c -o b.o
  ./emcc -O2 a.o b.o -o project.js

However, sometimes you may want slightly different optimizations on certain files:

.. code-block:: bash

  # Optimize the first file for size, and the rest using `-O2`.
  ./emcc -Oz a.cpp -c -o a.o
  ./emcc -O2 b.cpp -c -o b.o
  ./emcc -O2 a.o b.o -o project.js

.. note:: Unfortunately each build-system defines its own mechanisms for setting compiler and optimization methods. **You will need to work out the correct approach to set the LLVM optimization flags for your system**.

  - Some build systems have a flag like ``./configure --enable-optimize``.
  - You can control whether LLVM optimizations are run using ``--llvm-opts N`` where N is an integer in the range 0-3. Sending ``-O2 --llvm-opts 0`` to *emcc* during all compilation stages will disable LLVM optimizations but utilize JavaScript optimizations. This can be useful when debugging a build failure.


JavaScript/WebAssembly optimizations are specified in the final step (sometimes called "link", as that step typically also links together a bunch of files that are all compiled together into one JavaScript/WebAssembly output). For example, to compile with :ref:`-O1 <emcc-O1>`:

.. code-block:: bash

  # Compile the linked bitcode to JavaScript with -O1 optimizations.
  ./emcc -O1 project.o -o project.js


.. _building-projects-debug:

Building projects with debug information
========================================

Building a project containing debug information requires that debug flags are specified for both the LLVM and JavaScript compilation phases.

To make *Clang* and LLVM emit debug information in the bitcode files you need to compile the sources with :ref:`-g <emcc-g>` (exactly the same as with :term:`clang` or *gcc* normally). To get *emcc* to include the debug information when compiling the bitcode to JavaScript, specify :ref:`-g <emcc-g>` or one of the ``-gN`` :ref:`debug level options <emcc-gN>`.

.. note:: Each build-system defines its own mechanisms for setting debug flags. **To get Clang to emit LLVM debug information, you will need to work out the correct approach for your system**.

  - Some build systems have a flag like ``./configure --enable-debug``.

The flags for emitting debug information when compiling from bitcode to JavaScript are specified as an *emcc* option in the final step:

.. code-block:: bash

  # Compile the linked bitcode to JavaScript.
  # -g or -gN can be used to set the debug level (N)
  ./emcc -g project.o -o project.js

For more general information, see the topic :ref:`Debugging`.


Using libraries
===============

Built-in support is available for a number of standard libraries: *libc*, *libc++* and *SDL*. These will automatically be linked when you compile code that uses them (you do not even need to add ``-lSDL``, but see below for more SDL-specific details).

If your project uses other libraries, for example `zlib <https://github.com/emscripten-core/emscripten/tree/master/tests/zlib>`_ or *glib*, you will need to build and link them. The normal approach is to build the libraries to bitcode and then compile library and main program bitcode together to JavaScript.

For example, consider the case where a project "project" uses a library "libstuff":

.. code-block:: bash

  # Compile libstuff to bitcode
  ./emconfigure ./configure
  ./emmake make

  # Compile project to bitcode
  ./emconfigure ./configure
  ./emmake make

  # Compile the library and code together to HTML
  emcc project.o libstuff.a -o final.html


Emscripten Ports
================

Emscripten Ports is a collection of useful libraries, ported to Emscripten. They reside `on github <https://github.com/emscripten-ports>`_, and have integration support in *emcc*. When you request that a port be used, emcc will fetch it from the remote server, set it up and build it locally, then link it with your project, add necessary include to your build commands, etc. For example, SDL2 is in ports, and you can request that it be used with ``-s USE_SDL=2``. For example,

.. code-block:: bash

  ./emcc tests/sdl2glshader.c -s USE_SDL=2 -s LEGACY_GL_EMULATION=1 -o sdl2.html

You should see some notifications about SDL2 being used, and built if it wasn't previously. You can then view ``sdl2.html`` in your browser.

.. note:: *SDL_image* has also been added to ports, use it with ``-s USE_SDL_IMAGE=2``. To see a list of all available ports, run ``emcc --show-ports``. For SDL2_image to be useful, you generally need to specify the image formats you are planning on using with e.g. ``-s SDL2_IMAGE_FORMATS='["bmp","png","xpm"]'`` (note: jpg support is not available yet as of Jun 22 2018 - libjpg needs to be added to emscripten-ports). This will also ensure that ``IMG_Init`` works properly when you specify those formats. Alternatively, you can use ``emcc --use-preload-plugins`` and ``--preload-file`` your images, so the browser codecs decode them (see :ref:`preloading-files`). A code path in the SDL2_image port will load through :c:func:`emscripten_get_preloaded_image_data`, but then your calls to ``IMG_Init`` with those image formats will fail (as while the images will work through preloading, IMG_Init reports no support for those formats, as it doesn't have support compiled in - in other words, IMG_Init does not report support for formats that only work through preloading).```

.. note:: *SDL_net* has also been added to ports, use it with ``-s USE_SDL_NET=2``. To see a list of all available ports, run ``emcc --show-ports``.

.. note:: Emscripten also has support for older SDL1, which is built-in. If you do not specify SDL2 as in the command above, then SDL1 is linked in and the SDL1 include paths are used. SDL1 has support for *sdl-config*, which is present in `system/bin <https://github.com/emscripten-core/emscripten/blob/master/system/bin/sdl-config>`_. Using the native *sdl-config* may result in compilation or missing-symbol errors. You will need to modify the build system to look for files in **emscripten/system** or **emscripten/system/bin** in order to use the Emscripten *sdl-config*.

Adding more ports
-----------------

Adding more ports is fairly easy. Basically, the steps are

 * Make sure the port is open source and has a suitable license.
 * Add it to emscripten-ports on github. The ports maintainers can create the repo and add the relevant developers to a team for that repo, so they have write access.
 * Add a script to handle it under ``tools/ports/`` (see existing code for examples) and use it in ``tools/ports/__init__.py``.
 * Add testing in the test suite.


Build system issues
===================

Build system self-execution
---------------------------

Some large projects generate executables and run them in order to generate input for later parts of the build process (for example, a parser may be built and then run on a grammar, which then generates C/C++ code that implements that grammar). This sort of build process causes problems when using Emscripten because you cannot directly run the code you are generating.

The simplest solution is usually to build the project twice: once natively, and once to JavaScript. When the JavaScript build procedure fails because a generated executable is not present, you can then copy that executable from the native build, and continue to build normally. This approach was successfully used for compiling Python (see `tests/python/readme.md <https://github.com/emscripten-core/emscripten/blob/master/tests/python/readme.md>`_ for more details).

In some cases it makes sense to modify the build scripts so that they build the generated executable natively. For example, this can be done by specifying two compilers in the build scripts, *emcc* and *gcc*, and using *gcc* just for generated executables. However, this can be more complicated than the previous solution because you need to modify the project build scripts, and you may have to work around cases where code is compiled and used both for the final result and for a generated executable.


Dynamic linking
---------------

Emscripten's goal is to generate the fastest and smallest possible code, and for that reason it focuses on generating a single JavaScript file for an entire project. For that reason, dynamic linking should be avoided when possible.

By default, Emscripten ``.so`` files are the same as ``.bc`` or ``.o`` files, that is, they contain LLVM bitcode. Dynamic libraries that you specify in the final build stage (when generating JavaScript or HTML) are linked in as static libraries. *Emcc* ignores commands to dynamically link libraries when linking together bitcode (i.e., not in the final build stage). This is to ensure that the same dynamic library is not linked multiple times in intermediate build stages, which would result in duplicate symbol errors.

There is `experimental support <https://github.com/emscripten-core/emscripten/wiki/Linking>`_ for true dynamic libraries, loaded as runtime, either via dlopen or as a shared library. See that link for the details and limitations.


Configure may run checks that appear to fail
--------------------------------------------

Projects that use *configure*, *cmake*, or some other portable configuration method may run checks during the configure phase to verify that the toolchain and paths are set up properly. *Emcc* tries to get checks to pass where possible, but you may need to disable tests that fail due to a "false negative" (for example, tests that would pass in the final execution environment, but not in the shell during *configure*).

.. tip:: Ensure that if a check is disabled, the tested functionality does work. This might involve manually adding commands to the make files using a build system-specific method.

.. note:: In general *configure* is not a good match for a cross-compiler like Emscripten. *configure* is designed to build natively for the local setup, and works hard to find the native build system and the local system headers. With a cross-compiler, you are targeting a different system, and ignoring these headers etc.


Archive (.a) files
------------------

Emscripten supports **.a** archive files, which are bundles of object files. This is an old format for libraries, and it has special semantics - for example, the order of linking matters with **.a** files, but not with plain object files (in **.bc**, **.o** or **.so**). For the most part those special semantics should work in Emscripten, however, we support **.a** files using llvm's tools, which have a few limitations.

The main limitation is that if you have multiple files in a single **.a** archive that have the same basename (for example, ``dir1/a.o, dir2/a.o``), then llvm-ar cannot access both of those files. Emscripten will attempt to work around this by adding a hash to the basename, but collisions are still possible in principle.

Where possible it is better to generate shared library files (**.so**) rather than archives (**.a**) — this is generally a simple change in your project's build system. Shared libraries are simpler, and are more predictable with respect to linking.


Manually using emcc
===================

The :ref:`Tutorial` showed how :ref:`emcc <emccdoc>` can be used to compile single files into JavaScript. *Emcc* can also be used in all the other ways you would expect of *gcc*:

::

  # Generate a.out.js from C++. Can also take .ll (LLVM assembly) or .bc (LLVM bitcode) as input
  ./emcc src.cpp

  # Generate src.o containing LLVM bitcode.
  ./emcc src.cpp -c

  # Generate result.js containing JavaScript.
  ./emcc src.cpp -o result.js

  # Generate result.o containing LLVM bitcode (the suffix matters).
  ./emcc src.cpp -c -o result.o

  # Generate a.out.js from two C++ sources.
  ./emcc src1.cpp src2.cpp

  # Generate src1.o and src2.o, containing LLVM bitcode
  ./emcc src1.cpp src2.cpp -c

  # Combine two LLVM bitcode files into a.out.js
  ./emcc src1.o src2.o

  # Combine two LLVM bitcode files into another LLVM bitcode file
  ./emcc src1.o src2.o -o combined.o

In addition to the capabilities it shares with *gcc*, *emcc* supports options to optimize code, control what debug information is emitted, generate HTML and other output formats, etc. These options are documented in the :ref:`emcc tool reference <emccdoc>` (``./emcc --help`` on the command line).


Detecting Emscripten in Preprocessor
====================================

Emscripten provides the following preprocessor macros that can be used to identify the compiler version and platform:

 * The preprocessor define ``__EMSCRIPTEN__`` is always defined when compiling programs with Emscripten.
 * The preprocessor variables ``__EMSCRIPTEN_major__``, ``__EMSCRIPTEN_minor__`` and ``__EMSCRIPTEN_tiny__`` specify, as integers, the currently used Emscripten compiler version.
 * Emscripten behaves like a variant of Unix, so the preprocessor defines ``unix``, ``__unix`` and ``__unix__`` are always present when compiling code with Emscripten.
 * Emscripten uses Clang/LLVM as its underlying codegen compiler, so the preprocessor defines ``__llvm__`` and ``__clang__`` are defined, and the preprocessor defines ``__clang_major__``, ``__clang_minor__`` and ``__clang_patchlevel__`` indicate the version of Clang that is used.
 * Clang/LLVM is GCC-compatible, so the preprocessor defines ``__GNUC__``, ``__GNUC_MINOR__`` and ``__GNUC_PATCHLEVEL__`` are also defined to represent the level of GCC compatibility that Clang/LLVM provides.
 * The preprocessor string ``__VERSION__`` indicates the GCC compatible version, which is expanded to also show Emscripten version information.
 * Likewise, ``__clang_version__`` is present and indicates both Emscripten and LLVM version information.
 * Emscripten is a 32-bit platform, so ``size_t`` is a 32-bit unsigned integer, ``__POINTER_WIDTH__=32``, ``__SIZEOF_LONG__=4`` and ``__LONG_MAX__`` equals ``2147483647L``.
 * When targeting asm.js, the preprocessor defines ``__asmjs`` and ``__asmjs__`` are present.
 * When targeting SSEx SIMD APIs using one of the command line compiler flags ``-msse``, ``-msse2``, ``-msse3``, ``-mssse3``, or ``-msse4.1``, one or more of the preprocessor flags ``__SSE__``, ``__SSE2__``, ``__SSE3__``, ``__SSSE3__``, ``__SSE4_1__`` will be present to indicate available support for these instruction sets.
 * If targeting the pthreads multithreading support with the compiler & linker flag ``-s USE_PTHREADS=1``, the preprocessor define ``__EMSCRIPTEN_PTHREADS__`` will be present.


Examples / test code
====================

The Emscripten test suite (`tests/runner.py <https://github.com/emscripten-core/emscripten/blob/master/tests/runner.py>`_) contains a number of good examples — large C/C++ projects that are built using their normal build systems as described above: `freetype <https://github.com/emscripten-core/emscripten/tree/master/tests/freetype>`_, `openjpeg <https://github.com/emscripten-core/emscripten/tree/master/tests/openjpeg>`_, `zlib <https://github.com/emscripten-core/emscripten/tree/master/tests/zlib>`_, `bullet <https://github.com/emscripten-core/emscripten/tree/master/tests/bullet>`_ and `poppler <https://github.com/emscripten-core/emscripten/tree/master/tests/poppler>`_.

It is also worth looking at the build scripts in the `ammo.js <https://github.com/kripken/ammo.js/blob/master/make.py>`_ project.


Troubleshooting
===============

- Make sure to use ``emar`` (which calls ``llvm-ar``), as the system ``ar`` may
  not support our object files. ``emmake`` and ``emconfigure`` set the AR
  environment variable correctly, but a build system might incorrectly hardcode
  ``ar``.
- Similarly, using the system ``ranlib`` instead of ``emranlib`` (which calls
  ``llvm-ranlib``) may lead to problems, like not supporting our object files
  and removing the index, leading to
  ``archive has no index; run ranlib to add one`` from ``wasm-ld``. Again, using
  ``emmake``/``emconfigure`` should avoid this by setting the env var RANLIB,
  but a build system might have it hardcoded, or require you to
  `pass an option <https://github.com/emscripten-core/emscripten/issues/9705#issuecomment-548199052>`_.
-
  The compilation error ``multiply defined symbol`` indicates that the project has linked a particular static library multiple times. The project will need to be changed so that the problem library is linked only once.

  .. note:: You can use ``llvm-nm`` to see which symbols are defined in each bitcode file.

  One solution is to use the _`dynamic-linking` approach described above. This ensures that libraries are linked only once, in the final build stage.