Synchronous Virtual XHR Backed File System Usage

Emscripten supports lazy loading of binary data from HTTP servers using XHR. This functionality can be used to create a backend for synchronous file access from compiled code.

The backend can improve start up time as the whole file system does not need to be preloaded before compiled code is run. It can also be very efficient if the web server supports byte serving — in this case Emscripten can just read the parts of files that are actually needed.

Warning

This mechanism is only possible in Web Workers (due to browser limitations).

Note

If byte serving is not supported then Emscripten will have to load the whole file (however big) even if a single byte is read.

Test code

An example of how to implement a synchronous virtual XHR backed file system is provided in the test code at test/test_browser.py (see test_chunked_synchronous_xhr). The test case also contains an HTTP server (see test_chunked_synchronous_xhr_server) showing CORS headers that might need to be set (if the resources are hosted from the same domain Emscripten runs from, there is no issue).

The tests use checksummer.c as the Emscripten-compiled program. This is simply a vanilla C program using synchronous libc file system calls like fopen(), fread(), fclose() etc.

JavaScript code is added (using emcc’s pre-js option) to map the file system calls in checksummer.c to a file in the virtual file system. This file is created early in Emscripten initialisation using FS.createLazyFile(), but only loaded with content from the server when the file is first accessed by compiled code. The added JavaScript code also sets up communication between the web worker and the main thread.

Instructions

  1. You will need to add JavaScript to the generated code to map the file accessed by your compiled native code and the server.

    The test code simply creates a file in the virtual file system using FS.createLazyFile() and sets the compiled code to use the same file (/bigfile):

    , r"""
          Module.arguments = ["/bigfile"];
          Module.preInit = () => {
            FS.createLazyFile('/', "bigfile", "http://localhost:11111/bogus_file_path", true, false);
          };
          
    

    Note

    • The compiled test code (in this case) gets the file name from command line arguments — these are set in Emscripten using Module.arguments.

    • The call to create the file is added to Module.preInit. This ensures that it is run before any compiled code.

    • The additional JavaScript is added using emcc’s prejs option.

  2. The added JavaScript should also include code to allow the web worker to communicate with the original thread.

    The test code adds the following JavaScript to the web worker for this purpose. It uses postMessage() to send its stdout back to the main thread.

    
          Module.print = (s) => self.postMessage({channel: "stdout", line: s});
          Module.printErr = (s) => { self.postMessage({channel: "stderr", char: s, trace: ((doTrace && s === 10) ? new Error().stack : null)}); doTrace = false; };
        
    

    Note

    If you use the above solution, the parent page should probably contain handwritten glue code to handle the stdout data.

  3. You will need a page that spawns the web worker.

    The test code that does this is shown below:

     '''
          <html>
          <body>
            Worker Test
            <script>
              var worker = new Worker('worker.js');
              worker.onmessage = (event) => {
                var xhr = new XMLHttpRequest();
                xhr.open('GET', 'http://localhost:%s/report_result?' + event.data);
                xhr.send();
                setTimeout(function() { window.close() }, 1000);
              };
            </script>
          </body>
          </html>
        ''' % self.port)
    
        for file_data in [1, 0]:
          cmd = [EMCC, test_file('hello_world_worker.cpp'), '-o', 'worker.js'] + self.get_emcc_args()
          if file_data:
            cmd += ['--preload-file', 'file.dat']
          self.run_process(cmd)
          self.assertExists('worker.js')
          self.run_browser('main.html', '/report_result?hello from worker, and :' + ('data for w' if file_data else '') + ':')
    
        # code should run standalone too
        # To great memories >4gb we need the canary version of node
        if self.is_4gb():
          self.require_node_canary()
        self.assertContained('you should not see this text when in a worker!', self.run_js('worker.js'))
    
      @no_wasmfs('https://github.com/emscripten-core/emscripten/issues/19608')
      def test_mmap_lazyfile(self):
        create_file('lazydata.dat', 'hello world')
        create_file('pre.js', '''
          Module["preInit"] = () => {
            FS.createLazyFile('/', "lazy.txt", "lazydata.dat", true, false);
          }
        ''')
        self.emcc_args += ['--pre-js=pre.js', '--proxy-to-worker']
        self.btest_exit('test_mmap_lazyfile.c')
    
      @no_wasmfs('https://github.com/emscripten-core/emscripten/issues/19608')
      @no_firefox('keeps sending OPTIONS requests, and eventually errors')
      def test_chunked_synchronous_xhr(self):
        main = 'chunked_sync_xhr.html'
        worker_filename = "download_and_checksum_worker.js"
    
        create_file(main, r"""
          <!doctype html>
          <html>
          <head><meta charset="utf-8"><title>Chunked XHR</title></head>
          <body>
            Chunked XHR Web Worker Test
            <script>
              var worker = new Worker("%s");
              var buffer = [];
              worker.onmessage = (event) => {
                if (event.data.channel === "stdout") {
                  var xhr = new XMLHttpRequest();
                  xhr.open('GET', 'http://localhost:%s/report_result?' + event.data.line);
                  xhr.send();
                  setTimeout(function() { window.close() }, 1000);
                } else {
                  if (event.data.trace) event.data.trace.split("\n").map(function(v) { console.error(v); });
                  if (event.data.line) {
                    console.error(event.data.line);
                  } else {
                    var v = event.data.char;
                    if (v == 10) {
                      var line = buffer.splice(0);
                      console.error(line = line.map(function(charCode){return String.fromCharCode(charCode);}).join(''));
                    } else {
                      buffer.push(v);
                    }
                  }
                }
              };
            </script>
          </body>
          </html>