An Introduction to Low-Latency Scripting for Game Engines, Part 4

fwsGonzo
8 min readMay 25, 2024

--

Scripting in Nim, a systems language

Scripting with Nim

Nim can be transpiled to C or C++, which makes it easy to integrate into a custom run-time environment.

Setting up a Nim build system for RISC-V cross-compilation ended up being complicated, but doable. We have to detect the Nim system include path, extract build artifacts and build the final executable using our RISC-V compiler manually. However, once it was building the rest was just remembering a few things and disabling threads. I also ended up porting the native malloc/free API to Nim, which I did with just plain linker-wrap.

Building a RISC-V executable with Nim

Alright, so apparently just passing --cpu=riscv64 is going to make Nim assume a bunch of things. In order to avoid that I decided to just build manually, by making Nim output C code, and build with that. Let’s go through that process:

  1. Detect Nim path, in order to find includes
# Detect nim libraries: find nim and replace /bin/nim with /lib
NIM_LIBS=`whereis nim`
NIM_LIBS="${NIM_LIBS##*: }"
NIM_LIBS="${NIM_LIBS/bin*/lib}"

I just used whereis nim to figure out where the includes are.

2. Nim outputs some C files into a nimcache folder. In order to create a single build folder I decided to just put it under .build

NIMCACHE=$PWD/nimcache
mkdir -p $NIMCACHE

3. Invoke nim with nimcache, 64-bit riscv and Linux options, so that it can output the C files

nim c --nimcache:$NIMCACHE $NIMCPU --colors:on --os:linux --mm:arc --threads:off -d:release -d:useMalloc=true -c ${NIMFILE}

jq '.compile[] [0]' $NIMCACHE/*.json -r > buildfiles.txt
files=""
for i in $(cat buildfiles.txt); do
files="$files $i"
done

Then I used jq to extract the build files and put them in a variable (without quotes). Notice the options I passed to Nim. I enable malloc (after all, it is accelerated to native performance), and disable threads. The rest is just preferences.

With that we can now attempt to build the final executable with our RISC-V compiler:

$CC -static -O2 -ggdb3 -Wall -Wno-unused -Wno-maybe-uninitialized -Wno-discarded-qualifiers -Wl,--wrap=malloc,--wrap=free,--wrap=calloc,--wrap=realloc -I$NIM_LIBS -o $binfile $NIMAPI $files

It’s the usual static compilation with some warnings disabled. You don’t really need those, but it removes some unnecessary warnings we don’t control. The wrap linker options allow us to take over heap allocations.

As always remember to edit the build.sh script to pick your local RISC-V compiler.

That concludes building and linking.

Program setup

Just like with Nelua we have an API file and a program file. In the API file I export some functions using the star (*):

proc print*(content: string) =
fast_write(1, content, len(content))

proc exit*(status: int) =
fast_exit(status)

Adding the basic two functions for minimal scripting run-time: write (to terminal) and exit (immediately stop running).

These functions are implemented in api.nim and to use them we can import api, and then we have access to api.exit(0) for example.

We will use the exit function is to prevent normal return from main, which may close things up. Here’s the initial test program:

import api

api.print("Hello Nim World!\n")

var i = api.dyncall1(0x12345678)
api.print("i = " & $i & "\n")

api.exit(0)

So we’re printing to terminal with the minimal API, calling api.dyncall1 (which we will implement later), and we’re building a string using & and to-string ($):

api.print("i = " & $i & "\n")

The & will concatenate all the pieces together, and $i will turn the integer into a string. It works on pretty much everything.

As a final does Nim _actually_ work? we will use a JSON feature in Nim, turning objects directly into JSON and pretty-printing it:

var hisName = "John"
let herAge = 31
var j = %*
[
{ "name": hisName, "age": 30 },
{ "name": "Susan", "age": herAge }
]

var j2 = %* {"name": "Isaac", "books": ["Robot Dreams"]}
j2["details"] = %* {"age":35, "pi":3.1415}
echo j2

With the expected output:

{"na":"Isaac","books":["Robot Dreams"],"details":{"age":35,"pi":3.1415}}

API creation

In order to be able to create callable functions in Nim we can use pragmas. There’s a pragma for what is essentially extern "C" called .exportc , and if you want to use a C function from Nim it’s .importc .

A callable function

To create a test1 function that can be called directly from the game engine, we can use the .exportc pragma:

proc test1(a: int, b: int, c: int, d: int): int {.exportc.} =
api.print("test1 called with: " & $a & " " & $b & " " & $c & " " & $d & "\n")
return a + b + c + d

Again, it matches the expectations in the game engine. Remember, this is how it was called in example.cpp:

 // Create an event for the 'test1' function with 4 arguments and returns an int
Event<int(int, int, int, int)> test1(script, "test1");
if (auto ret = test1(1, 2, 3, 4))
fmt::print("test1 returned: {}\n", *ret);
else
throw std::runtime_error("Failed to call test1!?");

Continuing down the line, I implemented test2 like this:

proc test2() {.exportc.} =
var data = alloc(1024)
dealloc(data)
return

So, test2 is used to benchmark heap allocations.

For test3 we are trying out exception handling. It actually didn’t work with goto-based exceptions when they were unhandled, and I had to switch to --exceptions:setjmp to see what I was doing wrong, but once fixed it worked and I could switch back to the default:

proc test3(str: cstring) {.exportc.} =
try:
raise newException(IOError, $str)
except IOError as e:
api.print("Test3 called with " & e.msg & "\n")

I accidentally caught the wrong exception, leading to it being unhandled.

Test3 called with Oh, no! An exception!

For the data test, it was fairly straight-forward:

type
Data* = object
a: int32
b: int32
c: int32
d: int32
e: float32
f: float32
g: float32
h: float32
i: float64
j: float64
k: float64
l: float64
buffer: array[32, char]

proc test4(d: Data) {.exportc.} =
var str = cast[cstring](addr d.buffer[0])
api.print("test4 called with: " & $d.a & " " & $d.b & " " & $d.c & " " & $d.d & " " & $d.e & " " & $d.f & " " & $d.g & " " & $d.h & " " & $d.i & " " & $d.j & " " & $d.k & " " & $d.l & " " & $str & "\n")
return

Reading from the fixed-size buffer was not simple but the Nim discord answered quickly and precisely. The dollar-sign ($) is used to convert something to a regular string. For example, a cstring is just a pointer to a zero-terminated string in Nim, just like in C. In order to really use it, we should convert it using $str .

test4 called with: 1 2 3 4 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 Hello, World!

Calling back out from inside the Nim program

The first host-function was int(int) which we can create like this in Nim:

proc dyncall1*(i: int): int {.importc}

This works because dyncall1 is defined in api.c, and C functions are essentially just a symbol name, and we can decide its type. So we match it with the expectations in the game engine. Nim integers may be 64-bit, but when the engine reads it, it’s just a register-sized integral type. If it then casts it to int, it will just get truncated to 32-bit. So it’s safe to use Nims int type in this case.

For benchmarking the host function call overhead, call number 3 is used like in the other programs:

proc bench_dyncall_overhead() {.exportc.} =
api.dyncall3()

I measured it to be 6ns, which I believe is the same as with C++.

In the final test we call back into the engine with some fixed-size data:

proc test5() {.exportc.} =
var data : MyData
data.buffer[0..20] = "Hello Buffered World!".toOpenArray(0, 20)
api.dyncall4(addr data, 1, addr data)

Apparently we can fill a 32-byte string buffer from a string almost directly. With that last test out of the way, we have our final program.

type
MyData* = object
buffer*: array[32, cchar]
proc dyncall4*(d1: ptr MyData, s1: csize_t, d2: ptr MyData) {.importc}

Creating the callback into the game engine was simple enough.

The final executable ended up being 380kb. It’s size is mostly from debug info (103kb stripped). Let’s test it:

Hello Nim World!
dyncall1 called with argument: 0x12345678
i = 42
>>> myscript initialized.
test1 called with: 1 2 3 4
test1 returned: 10
Call overhead: 11ns
Benchmark: std::make_unique[1024] alloc+free Elapsed time: 24ns
Test3 called with Oh, no! An exception!
test4 called with: 1 2 3 4 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 Hello, World!
Benchmark: Overhead of dynamic calls Elapsed time: 6ns
dyncall_data called with args: 'Hello Buffered World!' and 'Hello Buffered World!'

The output seems to ~match the other programs we made. And the performance looks quite good, too. That’s nice!

Finally, in order to test if the Nim program was really using the native performance heap functions, I disabled it and ran the program again, and I got this:

Benchmark: std::make_unique[1024] alloc+free  Elapsed time: 136ns

So, when heap allocation is native, the alloc/free test runs in ~18% of the regular time. Quite good!

Debugging Nim

Nim is more complex than Nelua, and it’s probably nice to have at least basic remote debugging with GDB.

I don’t know how to make this smaller

Nim embeds its own debuginfo pretty printers in the program when it’s built with --debugger:native . Debugging it is just a matter of jumping to that function and instantiating the RSP server. Once the server is listening, we will need to start GDB with the Nim program as argument and make GDB connect remotely to the waiting program.

// If GDB=1, start the RSP server for debugging
if (getenv("GDB"))
{
// Setup the test6 function to be debugged from GDB
// This is a vmcall that doesn't start running:
script.machine().setup_call();
script.machine().cpu.jump(script.machine().address_of("remote_debug_test"));
// Start the RSP server on port 2159
fmt::print("Waiting for GDB to connect on port 2159...\n");
riscv::RSP rsp(script.machine(), 2159);
if (auto client = rsp.accept(); client) {
fmt::print("GDB connected\n");
// From here on GDB (through you) is in charge:
while (client->process_one());
fmt::print("GDB session ended\n");
} else {
fmt::print("Failed to accept GDB connection. Waited too long?\n");
}
}

A remote GDB session is started when the program is ran with GDB=1 . We are making a manual vmcall using setup_call(args...) followed by a jump to the remote_debug_test function. That is essentially a vmcall. We’re doing it manually so that it doesn’t actually start running and finish before we can debug it! Start multiarch GDB with the Nim program, and then connect remotely like so:

$ gdb-multiarch .build/output.elf
(gdb) target remote :2159

The only unexpected thing I found was that I had to disable stack traces from exceptions when debugging, with --stackTrace:off. Something to look into later, but otherwise it works great!

So, I think we can conclude that Nim works exceptionally well as a scripting language. It doesn’t require much work to define functions you can call in the program, and the other way is equally simple. The build settings are a roadblock, but just use my build script. It works.

-gonzo

--

--