Using C++ as a scripting language, part 3

fwsGonzo
7 min readNov 10, 2020

Extending the script with statically type-checked functions

If you’re wondering what’s going on here, check out my previous musings on this subject: Using Fast Virtual Machines In A Game Engine

These writings have continously gotten more and more complex, so I don’t blame anyone for getting lost in the sea of words down below.

Dynamically extending guest functionality

If we are going to add a group of functions to a guest dynamically, we will have to make sure that the overhead is low and that there is no room for collisions with other functionality. So, if you want to extend the guest environment by adding a new system call, then the system call number you choose must be unique each time, and both sides have to know about this number somehow. You also have to permanently use that number for this feature now, forever. Except, what if you could just pick any free system call number, and both sides would just use that, unknowingly?

While this is possible, and I did implement a solution that took free system call numbers and wrote them into guest memory, forming trampoline functions for this new functionality, it actually turns out that simply using one system call backed by a hash map is faster. Oh well. The reason is fairly straight-forward: When you assign a memory range to a certain purpose, and then group it into group ids and indexes within that again, you have to do a lookup. Might as well just look up a hash.

So, let’s have a look at the feature itself. We need to be able to define a function by type, and give it a name. Let’s look at the simplest example: Stopping a timer.

inline void Timer::stop() const {
constexpr Call<void(int)> stop_timer {"timer_stop"};
stop_timer(this->id);
}

We define a function stop_timer which performs the dynamic call named “timer_stop”. It has the function type given by the template parameter, and the instantiation creates a callable object. Calling it with an integer will stop the given timer, provided there is a handler for “timer_stop” in the game engine.

On the host side we hook up a handler for “timer_stop” like so:

script.set_dynamic_call("timer_stop",
[&] (Script& script) {
const auto [timer_id] = script.machine().sysargs<int> ();
timers.stop(timer_id);
});

Notice the capturing of timers by reference. The sysargs template call takes each template argument and figures out which register it must come from based on the C calling convention, and puts it into the correct tuple index in the return value. Zero overhead, of course.

This system uses CRC32 checksums of strings to avoid string comparisons. We can do those at compile-time in the script using C++. When the hashes are being added we also check for collisions, removing a potential headache that way, however unlikely it is. The polynomial used for checksumming is a template parameter on both sides, so if a collision actually happens, simply changing it can resolve it.

template <typename Func>
struct Call {
const uint32_t hash;
constexpr Call(const char* f) : hash(crc32(f)) {}
constexpr Call(uint32_t h) : hash(h) {}
template <typename... Args>
auto operator() (Args... args) const {
static_assert( std::is_invocable_v<Func, Args...> );
using Ret = typename std::invoke_result<Func, Args...>::type;
using FCH = Ret(*)(uint32_t, Args...);
auto fch = reinterpret_cast<FCH> (&dyncall_helper);
return fch(hash, args...);
}
};

The actual implementation is fairly voodoo. We create a callable object with the function type as template argument. The callable makes sure that the arguments passed to it matches the function type, which serves as static type checking. Then, we create a C function call that has our hash baked into it, feeding it into a trampoline function. On the host side we are then able to read these same arguments out, as if it was a C function call. Fairly neat, actually.

Benchmarking

Using a shared system call number for all dynamic calls, backed by a CRC32 hash value yielded decent results: The median call was effectively 13ns, on my machine. There is performance overhead associated with using std::function and a hash lookup, however implementors always have the option of using system calls directly, which has a fixed 3ns overhead.

Stack mirroring

A remote call is just one virtual machine calling into another, more or less directly. If you haven’t read one of my earlier posts, remote calls are what you do when you don’t want to bake everything into one script. When you have a complex game script you often have common code that is largely only based on singular things like an object. For example the death of a bush should add leaf sprites that fall down and blow with the wind, it should make a sound and so on. So if you wanted to re-use that effect for any particular cutscene or event, you could just call into the machine that had this code using a remote call. Perhaps you would construct a temporary object and pass that, or maybe the function just takes everything by value. Either way, it’s a nice thing to have and easy to implement since all the virtual machines don’t have an on/off state — you just call into them when you need to.

When you execute a function remotely (in another machine) it’s not without caveats. For example, you can’t pass a reference to a local object without it being accessible through a shared memory area. That requires coordination and helper tools. It also means recursive remote calls will not work. It can be bearable, but what if there was a way to mount the stack of one machine into another just for a remote call?

If the remote machine is running the same binary as in the caller you can also use read-only static data and code: Strings, constants, functions etc.

It is possible to make the stack from one machine available in another provided they lie in different places in memory. So, when you make a remote call, all you have to do is mount the stack of the caller machine into the remote machine for the duration of the call. Except, that you don’t have to mount all of it. The stack pointer tells you where the current “top” is, and not only that, but you only need the mount the stack if one of the integer registers passed to the remote machine contains a value that is within that area.

Using these ideas we can put together direct C++ function calls from one machine and into a remote machine, with very low overhead. No copying, no extra cost when the stack isn’t in use. Using the same things we learned from dynamic function calls we can make farcalls type-safe too:

constexpr FarCall<void()> fc("gameplay2", "empty_function");
fc();

In the example above we had to specify the type of the function, for type safety and using the correct ABI when making the call. There’s an optimization for when you are running the same binary, which allows you to avoid having to specify the function type as well:

constexpr ExecuteRemotely fc("gameplay2", empty_function);
fc();

The difference is that we point to the function directly, and from that we can see the return value and arguments, as well as the address. We don’t have to lookup the functions address when we already have it. The name gameplay2 dynamically refers to a machine. Another difference, and potential benefit, is not having to use the C calling convention. The call will be invoked as a regular C++ function call.

We can pass readable, writable data from the callers stack:

constexpr ExecuteRemotely somefunc("gameplay2", some_function);
SomeStruct some {
.string = "Hello 123!",
.value = 42
};
int r = somefunc(1234, some);

The function simply prints the values:

long some_function(int value, SomeStruct& some)
{
print("Hello Remote World! value = ", value, "!\n");
print("Some struct string: ", some.string, "\n");
print("Some struct value: ", some.value, "\n");
return value;
}

And that data will be visible as-is on the remote machine, as seen in the logs:

>>> [gameplay2] says: Hello Remote World! value = 1234!
>>> [gameplay2] says: Some struct string: Hello 123!
>>> [gameplay2] says: Some struct value: 42

Because the function is directly referenced in the code it does not need to be made a public function, nor does it need to be added to the file containing public symbols.

template <typename Func>
struct ExecuteRemotely {
const uint32_t mhash;
const Func func;
constexpr ExecuteRemotely(const char* m, Func f)
: mhash(crc32(m)), func(f) {}
constexpr ExecuteRemotely(uint32_t m, Func f)
: mhash(m), func(f) {}
template <typename... Args>
auto operator() (Args&&... args) const {
static_assert( std::is_invocable_v<Func, Args...> );
using Ret = typename std::invoke_result<Func, Args...>::type;
using FCH = Ret(uint32_t, Func, Args... args);
auto* fch = reinterpret_cast<FCH*> (&direct_farcall_helper);
return fch(mhash, func, args...);
}
};

The remote execution looks very much like the dynamic call implementation, but we also have to provide a hash of the name of the destination machine.

There is one huge gotcha with the remote function calls that reference a function directly: We do NOT cast the trampoline function to a C function call. It is still a C++ function call. That means you cannot remotely call a C function, unless you perhaps added a template argument like bool cxx = true. I don’t know if you can actually determine the calling convention of a function you are referencing. That said, when referencing a function by string name, the function name must not be mangled, and we can assume it’s a C function call.

That is all I have, for now. I recently tried to compare some of these modern interpreted languages and web assembly emulators, and I noticed that the APIs you are forced to deal with to build your own APIs between the host and guest is simply horrid. Hopefully this will give some ideas to the implementors out there of these well known projects.

-gonzo

--

--