An Introduction to Low-Latency Scripting for Game Engines

9 min readMay 22, 2024

The basics condensed into one very large document

libriscv is a mature RISC-V emulator that is currently being used in game engines. As far as I know, it is the only emulator that focuses solely on latency, and provides specialized solutions and tools to accomplish fast in-and-out function calls wrapped around a safe sandbox. It has much lower latencies than gold standard emulators.

Many people have asked how to use it, or how can you even think about using C++ for scripting — shouldn’t it be very hard? The answer is not really. I’ve been scripting for one big and one small game for several years now, and I’ve rarely felt it was C++ or the scripting APIs that were the reason for any troubles. I’ve been using Lua for years, and before that I used plain C. Today, modern idiomatic C++ hits the right spot for me. And I can use the same language both in- and outside of the game engine, with many of the same abstractions (literally), the same data structures, and C++ is also really just overall very powerful. Obviously, people like different things and there are tradeoffs.

So, how do you actually use it?

This is (so far) a 5-part series where the first two parts are about scripting with C++, part 3 is about Nelua, part 4 is about Nim, and finally Part 5 is about Rust.

It is of course possible to add tutorials for every single systems language, but the example gamedev repository will be very crowded in the end!

1. Import libriscv into your project

Importing a CMake library is fairly straight-forward:

cmake_minimum_required(VERSION 3.14)
project(example LANGUAGES CXX)

include(FetchContent)
FetchContent_Declare(libriscv
  GIT_REPOSITORY https://github.com/fwsGonzo/libriscv
  GIT_TAG        master
  )
FetchContent_Declare(libfmt
  GIT_REPOSITORY https://github.com/fmtlib/fmt
  GIT_TAG        master
  )

FetchContent_MakeAvailable(libriscv)
FetchContent_MakeAvailable(libfmt)

add_executable(example example.cpp script.cpp)
target_link_libraries(example riscv fmt)

This will pull in the latest fmtlib and libriscv, and link them into our project executable. CMake also takes care of include directories and such. fmtlib gives us a nice way to print and format things. And libriscv is the sandbox. This guide follows the gamedev example from the libriscv repository. So, if you don’t want to try to piece this together yourself, the simple and advanced projects are in the libriscv repository under examples.

With this we can build and link on Linux, Mac and MinGW. I know that libriscv also builds in MSVC when using CMake, but I am unable to test this right now.

2. Install a RISC-V compiler

On Linux this is fairly straight-forward. It’s usually a package. For example, on Ubuntu 20.04 I can write sudo apt install g++-10-riscv64-linux-gnu, or sudo apt install g++-12-riscv64-linux-gnu on Ubuntu 22.04. Looking at Launchpad I see that 24.04 has g++-14. If this is not available for you, you can also build it from sources using my guide here. As a bonus for building from source, you will get the fastest compiler for my emulator, and the best results. On Windows you have WSL2, with the same packages as mentioned above. That is, it’s the same command.

3. Execute a sandboxed main() function

The emulator can be instantiated with just an in-memory binary:

#include <fstream>
#include <iostream>
#include <libriscv/machine.hpp>
using namespace riscv;

int main(int argc, char** argv)
{
 if (argc < 2) {
  std::cout << argv[0] << ": [program file] [arguments ...]" << std::endl;
  return -1;
 }

 // Read the RISC-V program into a std::vector:
 std::ifstream stream(argv[1], std::ios::in | std::ios::binary);
 if (!stream) {
  std::cout << argv[1] << ": File not found?" << std::endl;
  return -1;
 }
 const std::vector<uint8_t> binary(
  (std::istreambuf_iterator<char>(stream)),
  std::istreambuf_iterator<char>());

 // Create a new 64-bit RISC-V machine
 Machine<RISCV64> machine{binary, {.memory_max = 64UL << 20}};
 ...

For a normal terminal program, this is almost enough to run through a simple hello world. We load the program from file into memory, and instantiate the emulator with the binary.

In order to be able to run various Linux/POSIX programs, we will call setup_linux(...), to set up the run-time environment with program arguments, environment variables and the aux-vector. And finally, we will call setup_linux_syscalls(false, false), which installs Linux-related system call handlers and disables filesystem and networking. It’s like enabling a strict sandbox mode:

// Use string vector as arguments to the RISC-V program
machine.setup_linux(
  {"micro", "Hello World!"},
  {"LC_TYPE=C", "LC_ALL=C", "USER=groot"});
machine.setup_linux_syscalls(false, false);

Now that the environment is there, we can just execute the program:

try {
  // Run through main(), but timeout after 32mn instructions
  machine.simulate(32'000'000ull);
} catch (const std::exception& e) {
  std::cout << "Program error: " << e.what() << std::endl;
  return -1;
}

For an ordinary program, this is all that’s needed.

4. A simple example sandbox

Here is the full simple_example program:

#include <fstream>
#include <iostream>
#include <libriscv/machine.hpp>
using namespace riscv;

int main(int argc, char** argv)
{
  if (argc < 2) {
    std::cout << argv[0] << ": [program file] [arguments ...]" << std::endl;
    return -1;
  }

  // Read the RISC-V program into a std::vector:
  std::ifstream stream(argv[1], std::ios::in | std::ios::binary);
  if (!stream) {
    std::cout << argv[1] << ": File not found?" << std::endl;
    return -1;
  }
  const std::vector<uint8_t> binary(
     (std::istreambuf_iterator<char>(stream)),
     std::istreambuf_iterator<char>());

  // Create a new 64-bit RISC-V machine
  Machine<RISCV64> machine{binary, {.memory_max = 64UL << 20}};

  // Use string vector as arguments to the RISC-V program
  machine.setup_linux(
    {"micro", "Hello World!"},
    {"LC_TYPE=C", "LC_ALL=C", "USER=groot"});
  machine.setup_linux_syscalls(false, false);

  try {
    // Run through main(), but timeout after 32mn instructions
    machine.simulate(32'000'000ull);
  } catch (const std::exception& e) {
    std::cout << "Program error: " << e.what() << std::endl;
    return -1;
  }

  std::cout << "Program exited with status: " << machine.return_value<int>() << std::endl;
  return 0;
}

So, instantiate, setup, simulate.

5. Running a test program

Lets build a simple hello world using any RISC-V compiler:

#include <stdexcept>
#include <iostream>

int main(int, char** argv)
{
    try {
        throw std::runtime_error(argv[1]);
    } catch (const std::exception& e) {
        std::cout << e.what() << std::endl;
        return 0;
    }
    return 1;
}

Build it statically, and then run simple_example test :

gamedev$ riscv64-linux-gnu-g++-10 -static -O2 test.cpp -o test
gamedev$ .build/simple_example test
Hello World!
Program exited with status: 0

It’s a stupid test, but it’s for figuring out if the run-time environment is golden 👌. It runs through main(), throws an exception with the second program argument, which is Hello World!, catches that and prints it to terminal. Then exits normally. This shows us that the environment is sane.

With this, we can implement scripting.

6. A VM function call

A VM function call is called a vmcall. Any public symbol can be called, but it’s easier for everyone if it’s a RISC-V ABI calling convention. In C++ we can simply use extern "C" when implementing that function:

extern "C"
int my_function(const char* str)
{
    std::cout << str << std::endl;
    return 1234;
}

Adding this function and building test.cpp again, we can now call this function right after simulate():

machine.vmcall("my_function", "Hello Sandboxed World!");

std::cout << "Program exited with status: " << machine.return_value<int>() << std::endl;
return 0;

If we run the simple_example now, it should print Hello Sandboxed World! and say that the program exited with status: 1234, which is what we returned from the function call!

Hello World!
Hello Sandboxed World!
Program exited with status: 1234

This is not guaranteed to happen though, because we returned from main() earlier, which usually flushes and closes open files. But, in this instance it worked. It’s better to implement our own print function that calls write() directly, or even our own system call that prints. We can also just not return from main() in several ways.

Now that we have the ability to make a call into the program, how can we now call out? Well, the usual way is through system calls, which we will not go into here. It’s a bunch of RISC-V assembly and such things. Instead, I will go through some helper functions that I wrote for the gamedev example. I wrote the helpers to not necessarily be fully understood, but be simple to use.

7. Callable host functions

In order to call out from the sandbox, and ask the game engine to do something for us, we need to move on from the simple example and go to gamedev example.cpp.

In example.cpp we can see that there’s a reference to a ScriptCallable:

// ScriptCallable is a function that can be requested from the script
using ScriptCallable = std::function<void(Script&)>;
// A map of host functions that can be called from the script
static std::array<ScriptCallable, 64> g_script_functions {};
static void register_script_function(uint32_t number, ScriptCallable&& fn) {
  g_script_functions.at(number) = std::move(fn);
}

They are functions that the program inside the sandbox can call. So, when the sandboxed program is running, it can at any time say “Call number 1 with these arguments”. The meaning of each function is up to the API designer. That would be you! Each number is without meaning and are just used to identify a callable function.

Right inside main() in example.cpp we can see that the first callable is implemented like so:

// Register a custom function that can be called from the script
// This is the handler for dyncall1
register_script_function(1, [](Script& script) {
  auto [arg] = script.machine().sysargs<int>();

  fmt::print("dyncall1 called with argument: 0x{:x}\n", arg);

  script.machine().set_result(42);
});

So, the first callable has a single integer argument ( machine().sysargs<int> ), and it seems to set an integral result as some kind of return value machine().set_result(42), so it should be something like int myfunc(int arg) from inside the sandbox, right? Let’s have a look at script_program/program.cpp:

// A dynamic call for testing integer arguments and return values
DEFINE_DYNCALL(1, dyncall1, int(int));

Indeed it is: int(int)is the function type. It’s also being used in the program like this:

 // Call a function that was registered as a dynamic call
 const int result = dyncall1(0x12345678);
 printf("dyncall1(1) = %d\n", result);

The second callable seems to be taking a std::string_view and a std::string:

// This is the handler for dyncall2
register_script_function(2, [](Script& script) {
  // string_view consumes 2 argument registers: the first is the pointer, the second is the length
  // unlike std::string, which consumes only 1 register (zero-terminated string pointer)
  auto [view, str] = script.machine().sysargs<std::string_view, std::string>();

  fmt::print("dyncall2 called with arguments: '{}' and '{}'\n", view, str);
});

So, the first argument gives a zero-copy view into the programs memory, while the str argument is a std::string, owning its data. It’s defined like this in the program:

// A dynamic call for testing string arguments
DEFINE_DYNCALL(2, dyncall2, void(const char*, size_t, const char*));

And called from main():

 // Call a function that passes a string (with length)
 dyncall2("Hello, Vieworld!", 16, "A zero-terminated string!");

For std::string_view to be fast, it must know the length and so we pass both the pointer to the string and the length. We can say that a std::string_view argument consumes 2 registers. Meanwhile, the zero-terminated string consumes only one argument.

For plain data, there is another callable:

// This is the handler for dyncall_data
register_script_function(4, [](Script& script) {
  struct MyData {
   char buffer[32];
  };
  auto [data_span, data] = script.machine().sysargs<std::span<MyData>, const MyData*>();

  fmt::print("dyncall_data called with args: '{}' and '{}'\n", data_span[0].buffer, data->buffer);
});

So, this callable function seems to take a span<MyData> as argument, as well as a const MyData*. Since span<MyData> is dynamic, it must consume two arguments in order to know the pointer and the number of elements. const MyData* acts like a fixed-size span<MyData, 1>, and consumes only one argument. In both cases, the data is not copied, rather we are safely viewing the memory of the paused sandbox.

In the program, the callable is defined like so:

// A dynamic call that passes a view to complex data
struct MyData {
  char buffer[32];
};
DEFINE_DYNCALL(4, dyncall_data, void(const MyData*, size_t, const MyData&));

And it’s called from a test function:

PUBLIC(void test5())
{
  std::vector<MyData> vec;
  vec.push_back(MyData{ "Hello, World!" });
  MyData data = { "Second data!" };

  dyncall_data(vec.data(), vec.size(), data);
}

So, the dynamic std::span<MyData> did indeed require two arguments, while the fixed-size const MyData* only required one. That makes sense.

Running the program I get this on my machine:

dyncall1 called with argument: 0x12345678
dyncall2 called with arguments: 'Hello, Vieworld!' and 'A zero-terminated string!'
Hello, World from a RISC-V virtual machine!
dyncall1(1) = 42
>>> myscript initialized.
test1 returned: 10
Call overhead: 5ns
Benchmark: std::make_unique[1024] alloc+free  Elapsed time: 14ns
test1(1, 2, 3, 4)
Caught exception: Oh, no! An exception!
Data: 1 2 3 4 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 Hello, World!
Benchmark: Overhead of dynamic calls  Elapsed time: 4ns
dyncall_data called with args: 'Hello, World!' and 'Second data!'

Dynamic calls aren’t designed to have record-holding low latency, largely because of std::function. But it’s nice to have capture storage and a check that fails when no handler has been set. We like our convenience.

I hope that this example code was somewhat understandable. All these APIs are designed to be low-latency, and require less scrutiny on the host side. As an example, if you read out a std::string on the host that was passed to you from the VM program, the host will immediately stop parsing if the string turns out to be very long. I believe the default limit is 16MB, but you can make up your own limits on a case-by-case basis. The same is true for every other method of viewing or extracting host memory, with exception to machine.copy_to_guest and machine.copy_from_guest. For those, you provide the lengths/limits similar to a memcpy.

Part 2 can be found here.

-gonzo