Fimo Engine II: Module Lifecycle

Introduction

It’s been a while since my last post, where I introduced a nice linker quirk ¹, that allows us to collect a global list of resources, spanning multiple compilation units, at compile time into a single array. In this post, we will explore how we can utilize this trick to introduce the concept of a module, and how these can be combined to create a modular game engine. A lot had to happen behind the scenes, to bring this all together. Let’s start by answering the simple question: What is a module?

What is a Module?

I briefly touched on the topic of modules in the introduction of my previous post, but let’s explore it in more detail. Like with many unimaginative speeches, we could start with the dictionary definition of a module. Instead, since we’re being quirky, we’re going to use an AI-generated one:

A module is a self-contained unit of code that can be loaded and unloaded at runtime. It provides a way to organize and manage the functionality of a game engine, allowing for greater flexibility and modularity.

Really, using modularity for the definition of module wouldn’t have been my first choice, but, other than that, this is roughly what I expected. Generally speaking, when mentioning modules to other developers, they will think one of two things:

Weakly coupled components in a codebase to avoid “spaghetti code” and improve maintainability.
Plugins that can be loaded and unloaded at runtime, providing a way to extend the functionality of a game engine without modifying the core codebase.

Personally, I tend to believe that the first definition is really just good coding practice, if not taken to absurdity, so I’ll stick with the second definition. Let’s do the obvious thing and see what others have done when implementing modules.

Audio Plugins and CLAP

Recently, I have been dabbling in the world of audio programming and came across the CLAP project ², which is an ABI-stable interface for audio plugins and Digital Audio Workstations.

I am not going to go into all the details of CLAP, but the way they accomplish this on the plugin side is by requiring the presence of a specific entry point in a compiled shared library:

typedef struct clap_plugin_descriptor {
    clap_version_t clap_version;
    const char *id;
    const char *name;
    const char *vendor;
    const char *url;
    const char *manual_url;
    const char *support_url;
    const char *version;
    const char *description;
    const char *const *features;
} clap_plugin_descriptor_t;

typedef struct clap_plugin {
    const clap_plugin_descriptor *descriptor;
    void *plugin_data;
    bool(CLAP_ABI *init)(const struct clap_plugin *plugin);
    bool(CLAP_ABI *destroy)(const struct clap_plugin *plugin);
    bool(CLAP_ABI *activate)(const struct clap_plugin *plugin,
                             double sample_rate,
                             uint32_t min_frames_count,
                             uint32_t max_frames_count);
    void(CLAP_ABI *deactivate)(const struct clap_plugin *plugin);
    bool(CLAP_ABI *start_processing)(const struct clap_plugin *plugin);
    void(CLAP_ABI *stop_processing)(const struct clap_plugin *plugin);
    void(CLAP_ABI *reset)(const struct clap_plugin *plugin);
    clap_process_status(CLAP_ABI *process)(const struct clap_plugin *plugin,
                                           const clap_process_t *process);
    const void *(CLAP_ABI *get_extension)(const struct clap_plugin *plugin, const char *id);
    void(CLAP_ABI *on_main_thread)(const struct clap_plugin *plugin);
} clap_plugin_t;

typedef struct clap_plugin_factory {
    uint32_t(CLAP_ABI *get_plugin_count)(const struct clap_plugin_factory *factory);
    const clap_plugin_descriptor_t *(CLAP_ABI *get_plugin_descriptor)(
        const struct clap_plugin_factory *factory, uint32_t index);
    const clap_plugin_t *(CLAP_ABI *create_plugin)(const struct clap_plugin_factory *factory,
                                                   const clap_host_t *host,
                                                   const char *plugin_id);
} clap_plugin_factory_t;

typedef struct clap_plugin_entry {
    clap_version_t clap_version;
    bool(CLAP_ABI *init)(const char *plugin_path);
    void(CLAP_ABI *deinit)(void);
    const void *(CLAP_ABI *get_factory)(const char *factory_id);
} clap_plugin_entry_t;

/* Entry point of the plugin */
CLAP_EXPORT extern const clap_plugin_entry_t clap_entry;

This is the CLAP plugin interface, stripped down to its essentials. A plugin developer would start by defining the clap_entry entry point, where they specify a factory of type clap_plugin_factory_t, which, in turn, provides a way to create instances of the plugins defined in the library.

As can be seen in the definition of the clap_plugin struct, and really should be expected from an audio plugin, the plugins are generally quite restricted in their functionality, as they consist only of an optional state and a set of callbacks, where the host application is in control at all times. This is typical for audio plugins, as their main purpose is to process audio signals coming from the host application at regular intervals, and to display the occasional UI to change the audio settings.

While this example of a plugin interface serves as a good starting point, it is not sufficient for our purposes, where we require the ability to define general purpose modules that can depend on other modules, and even initiate procedures of their own. So, let’s continue looking at the other end of the complexity spectrum, namely kernel modules for the Linux kernel.

Linux Kernel Modules

The Linux kernel follows a monolithic design, where unlike with a microkernel, most/all the kernel functionality runs in kernel space. This design has the upside of being faster than a microkernel, but brings similar challenges as any large monolithic codebase. This is why, to improve extensibility and maintainability, most of the kernel functionality, like drivers, are implemented as kernel modules, or even as loadable kernel modules which can be loaded and unloaded at runtime.

Like with CLAP plugins, we are not going into the details. Here is the implementation of a simple hello world kernel module ³:

/*
 * <PD> Hello World Module </PD>
 */
#include <linux/module.h>
#include <linux/init.h>

/*
 * This is the starting point of the kernel module's code execution.
 * Right now we will just print a hello and return from here.
 */
static int __init my_init(void)
{
    printk(KERN_INFO "Hello from Hello Module");
    return 0;
}

/*
 * This is the exit point of the kernel module. When the module is removed
 * this function is called to do any sort of cleanup activities and other such
 *  stuff.
 *
 * For example you have made a tree where you keep someinformation - you would
 * like to place the code for removing the nodes of the tree here.
 */
static void __exit my_exit(void)
{
    printk(KERN_INFO "Bye from Hello Module");
}

module_init(my_init);
module_exit(my_exit);

MODULE_DESCRIPTION("Sample Hello World Module");
MODULE_AUTHOR("Rishi Agrawal <rishi.b.agrawal@gmail.com>");
MODULE_LICENSE("GPL");

Unlike in our CLAP example, the module is defined using the MODULE_* macros, which register some module metadata in a predefined program section, which the kernel can access at runtime to initialize the module. At first glance, the interface of kernel modules seems pretty similar to the special-purpose interface for audio plugins, weren’t it for the fact that, once loaded, the modules become part of the kernel, as is evident that the hello world module makes use of the internal printk function.

In fact, kernel modules are even more powerful, as they can introduce new symbols (functions and data) into the kernel’s symbol table with the help of the EXPORT_SYMBOL macro. This allows for modules to import symbols from other modules, enabling inter-module communication, as in the following example ⁴:

File: `mymodule1.c`

/*
 * <PD> Module 1 for demonstration of circular dependency </PD>
 */
#include <linux/module.h>
#include <linux/init.h>

int GLOBAL_VARIABLE = 1000;
EXPORT_SYMBOL(GLOBAL_VARIABLE);

/*
 * Function to print hello for num times.
 */
void print_hello(int num)
{
    while (num--) {
        printk(KERN_INFO "Hello Friend!!!\n");
    }
}
EXPORT_SYMBOL(print_hello);

/*
 * Function to add two passed number.
 */
void add_two_numbers(int a, int b)
{
    printk(KERN_INFO "Sum of the numbers %d", a + b);
}
EXPORT_SYMBOL(add_two_numbers);

static int __init my_init(void)
{
    printk(KERN_INFO "Hello from Export Symbol 1 module.");
    return 0;
}

static void __exit my_exit(void)
{
    printk(KERN_INFO "Bye from Export Symbol 1 module.");
}

module_init(my_init);
module_exit(my_exit);

MODULE_DESCRIPTION("Module to demonstrate the EXPORT_SYMBOL functionality");
MODULE_AUTHOR("Rishi Agrawal <rishi.b.agrawal@gmail.com");
MODULE_LICENSE("GPL v2");

File: `mymodule2.c`

/*
 * <PD> Module 2 for exporting symbol demostration </PD>
 */
#include <linux/module.h>
#include <linux/init.h>

extern void print_hello(int);
extern void add_two_numbers(int, int);
extern int GLOBAL_VARIABLE;

/*
 * The function has been written just to call the functions which are in other module.
 * This way you can also write modules which does provide some functionality to the
 * other modules.
 */
static int __init my_init(void)
{
    printk(KERN_INFO "Hello from Hello Module");
    print_hello(10);
    add_two_numbers(5, 6);
    printk(KERN_INFO "Value of GLOBAL_VARIABLE %d", GLOBAL_VARIABLE);
    return 0;
}

static void __exit my_exit(void)
{
    printk(KERN_INFO "Bye from Hello Module");
}

module_init(my_init);
module_exit(my_exit);

MODULE_DESCRIPTION("Module to demonstrate the EXPORT_SYMBOL functionality");
MODULE_AUTHOR("Rishi Agrawal <rishi.b.agrawal@gmail.com>");
MODULE_LICENSE("GPL v2");

In this example, we have created two modules, where the first module exports the GLOBAL_VARIABLE, print_hello, and add_two_numbers symbols. The second module then imports these symbols and can use them as if they were defined in the same module. Internally, this is achieved by introducing a link step during module loading, that wires the required symbols to the module.

Inspired by kernel modules and WebAssembly modules ⁵, we can now start exploring modules in the fimo engine.

Fimo Modules

After informing myself about other implementations of modules, I determined the following list of requirements that I wanted to implement for fimo:

Modules must be general enough to implement any engine functionality.
Modules must be programming language agnostic.
The module ABI must be stable but extensible.
Modules must be explicit in their requirements and exposed resources.

Most of those requirements should be obvious from the previous and also the current article. I envision fimo to be fully modularized, so the modules must be powerful enough to implement all engine subsystems, and since the engine itself is language-agnostic, the same must apply to the modules. Essentially, it should be possible to implement a module in any language and environment that supports the module ABI. Since we are now talking about plugins, ABI changes will not cause any compilation errors, instead causing runtime crashes at best, and weird behavior at worst. So we must ensure that whatever we define the module ABI to be, it must support backwards-compatible improvements. The last requirement is really a nice-to-have, as it simplifies the loading process. By knowing all imports and exports of a module, we can statically ensure that this can be satisfied and provide a more user-friendly experience, with nicer error messages and development experience.

With those requirements set in stone, we can now proceed to define the module ABI.

Exporting Modules

We define a module by exporting an instance of a FimoModuleExport struct, which is defined as follows:

typedef struct FimoModuleExport {
    const void *next;
    FimoVersion version;
    const char *name;
    const char *description;
    const char *author;
    const char *license;
    const FimoModuleParamDecl *parameters;
    FimoUSize parameters_count;
    const FimoModuleResourceDecl *resources;
    FimoUSize resources_count;
    const FimoModuleNamespaceImport *namespace_imports;
    FimoUSize namespace_imports_count;
    const FimoModuleSymbolImport *symbol_imports;
    FimoUSize symbol_imports_count;
    const FimoModuleSymbolExport *symbol_exports;
    FimoUSize symbol_exports_count;
    const FimoModuleDynamicSymbolExport *dynamic_symbol_exports;
    FimoUSize dynamic_symbol_exports_count;
    const FimoModuleExportModifier *modifiers;
    FimoUSize modifiers_count;
} FimoModuleExport;

I am not going to bore you with all the details, but as can be seen from the struct definition, a module consists of the following components:

Version of the host context compiled against.
Module metadata, like a unique name.
A global list of parameters, that may be accessible by other modules.
A list of resource paths relative to the module’s root directory.
A list of symbol namespaces to import.
A list of required symbols.
A list of provided symbols, either statically or instantiated at runtime.
A list of “modifiers” that change the way the module is loaded or add additional metadata.

Of interest for this article are symbols and modifiers, so let’s dive into them.

Symbols

Inspired by the likes of kernel modules and WebAssembly modules, fimo modules support exporting and importing functionality to and from other modules. We do this by listing symbols required and provided by the module, and implementing a module linker in the module loader. In this context, a symbol is just a pointer to a variable or function managed by some module, and are uniquely identified by their name and the namespace they belong to, as can be seen in the definition of an import:

typedef struct FimoModuleSymbolImport {
    FimoVersion version;
    const char *name;
    const char *ns;
} FimoModuleSymbolImport;

In addition to a name and namespace, each symbol also specifies a semantic version ⁶. This is essential, as modules don’t actually define dependencies on other modules, but rather on the symbols they provide, and we want to allow module developers to make breaking changes to the symbol interface without having to define a new name. In our implementation, the module loader will, at link time, verify that a symbol with a compatible version is available or abort the module loading process.

As a sidenote, the repository is split into two types of packages: meta packages and module packages. module packages contain the implementation of some module, and will, in the case of compiled languages, produce either a shared library or a likable artifact containing the plugin. On the other hand, meta packages provide the definitions of symbols that can be imported or implemented by modules. This is similar to the header/source split in C/C++ projects. In some cases, like bindings for the Zig or Rust languages, the meta packages will also provide some utilities that improve the development experience when using the defined symbols, but we generally only require a C ABI. This split has the advantage that we can port the engine to other languages by just providing bindings to the required meta packages.

Going back to the topic of module symbols. How do we actually declare them? As you might expect, by filling another struct:

typedef enum FimoModuleSymbolLinkage : FimoI32 {
    FIMO_MODULE_SYMBOL_LINKAGE_GLOBAL,
} FimoModuleSymbolLinkage;

typedef struct FimoModuleSymbolExport {
    const void *symbol;
    FimoModuleSymbolLinkage linkage;
    FimoVersion version;
    const char *name;
    const char *ns;
} FimoModuleSymbolExport;

This definition shows how we declare a statically known symbol, i.e. a symbol that is known at compile time. Since the symbol is already known at compile time, we can simply expose an opaque pointer to it, and while, strictly speaking, not standard C conforming, as data pointers and function pointers can differ in size, this works for all targeted platforms. We could replace the pointer with a union, if we wanted to support some esoteric platform in the future. Additionally, the design of the module symbols is inspired by the symbols in an ELF file, which supports various linkages, including global, local, and weak symbols. A global symbol participates in the symbol resolution process and must be unique among all loaded modules. If the need arises, we can also introduce the local and weak linkages, along with others.

A runtime-known symbol, i.e. a dynamic symbol, is declared similarly by constructing an instance of an FimoModuleDynamicSymbolExport, where the symbol field is replaced with a constructor and destructor function. The constructor function is a bit unique, for reasons that will become clear later.

Modifiers

The last piece of the puzzle is the modifiers field, which was introduced as a mechanism to provide additional metadata that enable additional features in an extensible manner. The FimoModuleExportModifier struct is a key/value pair, that we can use to bolt essentially any future optional extension, without having to break the module ABI. As an example, hot-reloading support could be added in the future by defining a modifier enabling this feature. Currently, fimo specifies a limited list of modifiers, including module lifecycle management.

By default, a module is considered pure by the module loader, as it is not bound to any state. This can be changed by adding a state modifier, which defines a module constructor and destructor, and adding additional modifiers that subscribe the module to lifecycle events, like the module start and stop events.

Managing the lifecycle of a Module

Now comes the part where we implement the module loader. From the previous section, we know that a module is stateless by default, but can subscribe to optional lifecycle events by specifying the corresponding modifiers. Considering all current modifiers, this gives rise to the following finite state machine (FSM) of a module:

stateDiagram-v2
    [*] --> Loaded: load
    Loaded --> Constructed: state constructor
    Constructed --> Loaded: state destructor
    Constructed --> Started: start event
    Started --> Constructed: stop event
    Started --> [*]

For the sake of clarity, we can assume that when the modifiers are omitted, the module defines noop event handlers, allowing us to omit some transitions in our diagram. I have implemented FSMs multiple times, and this one seems almost trivial. So, how long could it take?

Months, literal months.

But, let’s not get ahead of ourselves. First, we need to paint a clearer picture of the module interface by defining a module instance.

Module Instances

Let’s start with the definition of a module instance (I promise that this is the last struct definition in this article):

typedef struct FimoModuleInfoVTable {
    void (*acquire)(const struct FimoModuleInfo *info);
    void (*release)(const struct FimoModuleInfo *info);
    void (*mark_unloadable)(const struct FimoModuleInfo *info);
    bool (*is_loaded)(const struct FimoModuleInfo *info);
    bool (*try_acquire_module_strong)(const struct FimoModuleInfo *info);
    void (*release_module_strong)(const struct FimoModuleInfo *info);
} FimoModuleInfoVTable;

typedef struct FimoModuleInfo {
    const void *next;
    const char *name;
    const char *description;
    const char *author;
    const char *license;
    const char *module_path;
    FimoModuleInfoVTable vtable;
} FimoModuleInfo;

typedef struct FimoModuleInstanceVTable {
    void (*acquire)(struct FimoModuleInstance *ctx);
    void (*release)(struct FimoModuleInstance *ctx);
    FimoResult (*query_namespace)(const FimoModuleInstance* ctx, const char *ns,
                                  bool *has_dependency, bool *is_static);
    FimoResult (*add_namespace)(const FimoModuleInstance* ctx, const char *ns);
    FimoResult (*remove_namespace)(const FimoModuleInstance* ctx, const char *ns);
    FimoResult (*query_dependency)(const FimoModuleInstance* ctx,
                                   const FimoModuleInfo *info,
                                   bool *has_dependency,
                                   bool *is_static);
    FimoResult (*add_dependency)(const FimoModuleInstance* ctx,
                                 const FimoModuleInfo *info);
    FimoResult (*remove_dependency)(const FimoModuleInstance* ctx,
                                    const FimoModuleInfo *info);
    FimoResult (*load_symbol)(const FimoModuleInstance* ctx, const char *name,
                              const char *ns, FimoVersion version,
                              const void **symbol);
    FimoResult (*read_parameter)(void *ctx, void *value,
                                 FimoModuleParamType type, const char *module,
                                 const char *param);
    FimoResult (*write_parameter)(void *ctx, const void *value,
                                  FimoModuleParamType type, const char *module,
                                  const char *param);
} FimoModuleInstanceVTable;

typedef void FimoModuleParamTable;
typedef void FimoModuleResourceTable;
typedef void FimoModuleSymbolImportTable;
typedef void FimoModuleSymbolExportTable;

typedef struct FimoModuleInstance {
    const FimoModuleInstanceVTable* vtable;
    const FimoModuleParamTable *parameters;
    const FimoModuleResourceTable *resources;
    const FimoModuleSymbolImportTable *imports;
    const FimoModuleSymbolExportTable *exports;
    const FimoModuleInfo *module_info;
    FimoContext context;
    void *module_data;
} FimoModuleInstance;

At first glance, this should appear pretty similar to the clap_plugin type, except for two differences. First, the definition of the instance type declares some opaque Table types, like a FimoModuleParamTable. These tables are initialized by the module loader, and in the case of the imports and exports fields, they contain all symbols available to the module. And here the first challenge already arises. Since the symbol tables of a module are initialized at instantiation and remain valid until its destruction, we must ensure that all dependencies are also kept alive. But we can go even further.

Imagine the scenario of a hypothetical module named fimo-graphics, which provides a platform-agnostic abstraction over the various rendering APIs, like Vulkan, DirectX and Metal. This could be conceived naturally, by introducing the fimo-vulkan, fimo-directx and fimo-metal modules, with each providing the low-level bindings to their own API. The fimo-graphics module could then implement its own API on top of one of those lower-level APIs. The module could also expose this relationship, by, for instance, exposing the internal handles of textures and buffers, which would have the benefit of giving the user a higher degree of control when the situation calls for it. Sounds nice, doesn’t it? Well, there is a catch. It doesn’t work because the dependencies of a module must be known before it is instantiated so that the loader can validate that all dependencies are satisfied.

Let’s try it anyway. You might have noticed that the module instance also contains a vtable fields, a pointer to a table of function pointers. Among those, add_dependency and load_symbol are required to implement the hypothetical fimo-graphics module, where add_dependency registers the fimo-graphics module as being dependent on another existing module, in our case either one of fimo-vulkan, fimo-directx or fimo-metal. Once registered, it can access its symbols by calling load_symbol. This design is similar to how shared libraries are implemented on Linux, where loading a shared object causes the dynamic linker to find all required symbols defined at compile time, but the same process can be accomplished manually with the likes of dlopen and dlsym. How do we implement it? Let’s talk about lifetimes.

Module Lifetimes

The fundamental question we should ask ourselves is: How long does a module live? As well as: Who is responsible for managing the lifecycle of a module. Should each module manage itself, like a shared_ptr in C++ or an Arc in Rust, or should there be a central registry of loaded modules that is in charge?

The search for a satisfying answer to those questions led to multiple rewrites of the engine over the years, but at least I am now, fairly certain, that modules should be centrally managed by a context represented by the FimoContext handle in the module instance. Unlike simple reference counting, this design has several upsides, with the main one being explicit lifetime management, which is desirable if you want the ability to load and unload modules. Regrettably, this is not a silver bullet, and brings its own set of compromises, but one step after the other.

Let’s use the prior Linux kernel modules as an example, where we have a module A which exports the symbols print_hello, add_two_numbers and GLOBAL_VARIABLE, and another module B importing those. These relations can be modelled as a directed graph, as follows:

graph TD
    A ==> S0((print_hello))
    A ==> S1((add_two_numbers))
    A ==> S2((GLOBAL_VARIABLE))
    B ==> A
    B --> S0
    B --> S1
    B --> S2

In all cases, the symbols of a module are represented as leaf nodes in our graph, since only modules can depend on symbols, and are strongly bound to their owners. There is technically also another edge from the hidden node representing the context to and from each module, as the modules are owned by the context and must keep it alive, but let’s forget about it for now.

Let’s assume, for a moment, that we only support static dependencies that must be satisfied by the linker. Let’s also assume that have an empty context and want to load \(n\) modules sequentially. Starting with the first one, the module loader just adds the node and its symbols to a graph and is done. For all following modules, the loader first adds the nodes to the graph, and then the linker may connect the new module to existing modules in the graph. This is possible, as we require that each dependency is satisfied before we load a module, or in other words, a loaded module cannot depend on an unloaded module. If we think about this for a moment, we might notice that this property is sufficient to guarantee that our dependency graph is acyclic, which is pretty nice, as this simplifies our lifetime management a lot.

Going back to our little example of two modules A and B, now the context puts on its hat as a module garbage collector and can try to determine whether the modules are still alive. We still don’t know how long B must be kept alive, but we know without shadow of a doubt, that A must outlive B. We can write this relation as \(B \leq A\). If the context now wants to remove A it now knows to wait until B is also removed.

Now, this can be generalized. For any two loaded modules \(M_1\) and \(M_2\), we define the order as \(M_1\ \leq M_2\), if \(M_1\) has a dependency on \(M_2\). Notably, this relation is also transitive, i.e., if \(M_1\ \leq M_2\) and \(M_2\ \leq M_3\), then \(M_1\ \leq M_3\) must also be true, because if a module \(M_1\) depends on a module \(M_2\), that depends on another module \(M_3\), then \(M_3\) must both outlive \(M_2\) and \(M_1\). In math, this is known as a partial order ⁷.

Can we determine this order? You bet we can! With a little bit of graph theory, we might notice that this problem is the same as finding the topological ordering ⁸ of our dependency graph. If we now want to perform a garbage collection cycle, we can implement this by computing a topological sort of our current graph, and removing the modules in that order. And here it should become evident, why we require our dependency graph to be acyclic, as otherwise the topological ordering is not defined, and we cannot determine a safe unloading order automatically.

With that out of the way, we can try to implement dynamic dependency management on top of the static one, which boils down to implementing the acquire_dependency, remove_dependency and load_symbol methods.

Since we have an internal dependency graph, we can implement add_dependency and remove_dependency by adding/removing an edge in the dependency graph. This is simple enough, but must consider two constraints, the graph must remain acyclic and we cannot remove static dependencies. Since we know that the graph was acyclic before adding the dependency, we only need to check that there is no path from a module to itself after adding the edge to maintain this invariant. This is easy to implement with a depth-first search, and is also cheaper than checking the entire graph for cycles.

The only thing that remains is load_symbol, where we must ask ourselves if we should allow a module \(M_1\) to access a symbol from the module \(M_3\), if there is an indirect dependency, like \(M_1 \leq M_2 \leq M_3\). The answer to this question is no, as \(M_2\) could remove its dependency to \(M_3\), allowing \(M_3\) to be unloaded, and leaving \(M_1\) with a dangling symbol. This is a nice excuse to do the simple thing and only allow access to the symbols of a module, if there is a direct path going from \(M_1\) to \(M_3\). There is technically still the possibility of a dangling symbol, if the caller first calls add_dependency, followed by a load_symbol and finally a remove_dependency, but this is a clear API contract violation, and we don’t need to concern ourselves with it.

So, we are finally done! Right?

Not even close.
– Narrator

*Sigh*. What is the problem now? Let’s try with the following example:

sequenceDiagram
    participant Context
    participant A
    participant B
    participant C
    A->>Context: add "B"
    Context-->>A: return
    A->>B: pass callback
    B-->>A: return
    Context->>A: unload
    C->>B: emit event
    destroy A
    A-->>Context: destroyed
    B->>A: invoke callback
    B-->>C: return

The sequence diagram shows a plausible sequence of operations involving three modules. Where A first adds B as a dependency and then uses this to pass a callback capturing itself to B. This callback, along with others, are then invoked by another module C at some point in the future. Imagine B being a GUI-framework, A a module implementing a GUI widget, and C another module spawning a pointer event to repaint the GUI. Since the engine is multithreaded, those operations can execute in parallel, which in this case causes a segmentation fault.

Since we don’t allow cyclic dependencies it is impossible for A to communicate to the context, that it must be kept alive until the callback is unregistered, and it is free to garbage collect it. So, the solution is simple, right. Just unregister the callback before destroying A:

sequenceDiagram
    participant Context
    participant A
    participant B
    participant C
    A->>Context: add "B"
    Context-->>A: return
    A->>B: pass callback
    B-->>A: return
    Context->>+A: unload
    C->>+B: emit event
    A->>B: unregister callback
    B->>A: invoke callback
    B-->>-C: return
    B-->>A: return
    A-->>-Context: destroyed

In this version, A will try to clean up all shared resources before being destroyed by the context. And it seems to work, but let’s introduce a small change:

sequenceDiagram
    participant Context
    participant A
    participant B
    participant C
    A->>Context: add "B" and "D"
    Context-->>A: return
    A->>B: pass callback
    B-->>A: return
    Context->>+A: unload
    A->>A: lock mutex
    C->>+B: emit event
    A->>B: unregister callback
    B->>A: invoke callback
    A->>A: lock mutex
    A->>A: do stuff
    A->>A: unlock mutex
    A-->>B: return
    B-->>-C: return
    B-->>A: return
    A->>A: unlock mutex
    A-->>-Context: destroyed

Since the engine is multithreaded, the modules must also be thread-safe, with one way of doing it being by using a mutex to guard the critical sections. Aaaaand, it deadlocks.

We motivated the central lifetime management of modules as having a better development experience, but now it appears that it is far too easy to shoot yourself in the foot. Honestly, I don’t want to implement a full garbage collector with cycle detection, so should we just use reference counts, like all major libc implementations? At least, then we will have leaked modules instead of segmentation faults.

And this was the moment, when I said: “Screw it, we will do both!”.

The plan is as follows. We keep the acyclic dependency graph, as was described earlier, and use it to determine if the context is allowed to provide access to the symbols of another module. In addition to that, we introduce a reference count to signal that a module is actively being referenced in a way that the context cannot comprehend. For instance, a cyclic dependency. Most important, from the view of the context, increasing the reference count does not allow a module access to the symbols of another module, but the context guarantees that the module is only unloaded if there are no dependents in the dependency graph and if the reference count is zero. In the future, I’ll likely introduce a prepare-stop event, to notify the module, that it should try to release the shared resources.

To make this more explicit, the context is not allowed to share module instances with other module instances. Instead, we impose a module isolation by default, and each module implementor must ensure that if they decide to give out instances to themselves, that they better be doing the correct thing or risk a segmentation fault. Publicly, other modules are encouraged to use the explicit dependency management of the context, which, although more restrictive, will always do the safe thing. They can still refer to other modules, by using the FimoModuleInfo handles, which are reference counted weak references to the owning modules, and can be used to check if the owning module is still loaded, and try to increase their reference count in a thread-safe manner.

This leaves us with the itsy-bitsy problem that we cannot actually access modules from the outside. Maybe it’s easier to explain with the classical “chicken or the egg” dilemma: “which came first: the chicken or the egg?” Or in our case: “if only modules can access the symbols of another module, how do we bootstrap our modules?” You guessed it, another hack.

Pseudo Module Instances

Well, I lied. There kind of is a way to bootstrap the engine, namely to implement the bootstrapping logic into an empty module, but this would require some nasty workarounds, as the caller would not be managed by the context.

Instead, we introduce a nicer hack. If we require a module instance to access the resources of other modules, we just create an empty unmanaged module at runtime.

We call this instance a pseudo module instance, as it looks like an empty module from the outside, allowing it to add dependencies to other modules, but is unmanaged in the sense, that it does not partake in the automatic garbage collection by the runtime. Instead, this module is purely reference counted, as the owner is responsible for destroying it when it is no longer required, typically at engine shutdown. As a side effect, it is not possible to add a pseudo module instance as a dependency, but this is typically not required, as it does not provide any symbols anyway.

Now! Now, we should be done…except that loading modules is a pain in the ass.

Module Loading Sets

Imagine you have the modules A, B, C and D, and would like to load them. In which order should they be loaded?

Well?

You don’t know? That is precisely the problem.

Since we require that all dependencies of a module be satisfied before loading it, we have created the situation that the user must search for an appropriate load order manually. This is not only insane, even negligent. So how do we solve it now? We load multiple modules at the same time.

The idea is simple. If we know that the dependencies of a set of modules can be satisfied, we let the context figure out an appropriate module load order automatically.

Again, let’s consider only the static dependencies for the moment. Instead of loading all modules individually, we register our intent of loading a module into a FimoModuleLoadingSet and commit a batch of load operations to the context. The loading set will then determine dependency conflicts, compute a load order for the modules, and commit them in that order.

We can proceed like in the context and create a dependency graph for the loading set, which we fill with the modules to be loaded. Unlike in the context, the dependency graph can contain incomplete modules, as some of their dependencies might already be satisfied by modules in the context, by other modules in the set, or might be still unsatisfied. For this reason, we must ensure that adding a module will not introduce a cycle in the dependency graph. Then, we can proceed with another topological sort of the dependency graph, yielding again relations \(M_1 \leq M_2\), where we now know that \(M_2\) must be loaded before \(M_1\).

Now, we extend this again to dynamic dependencies, and…wait a minute! I thought that this was an article about the lifecycle of modules. This is quite a tangent, I have gotten myself into.

Asynchronous Module Initialization

So, you want to initialize a module. Well, you’re in luck! It is as simple as the following:

void* module_initializer(const FimoModuleInstance *module) {
    // Initialize the state.
    void *state = malloc(...);
    init_state(state);
    return state;
}

void module_destructor(const FimoModuleInstance *module, void *state) {
    uninit_state(state);
    free(state);
}

// Add the fuctions as a modifier to the module export.
...

Ahh, as you can see, it’s nice and simple. Wait, assume that the malloc call fails, or that the module requires a specific set of CPU features. Then, the initializer should be able to signal an error and abort the initialization process. Simple, enough:

FimoResult module_initializer(const FimoModuleInstance *module, void **state) {
    // Initialize the state.
    *state = malloc(...);
    if (*state == NULL) return FIMO_ENOMEM;
    init_state(state);
    return FIMO_EOK;
}

Now, the initializer returns a result object, which carries the information whether the operation was successful. So, we modified the implementation of the module loader to allow aborts, and we’re good to go. It feels like I am forgetting something, let’s look at the documentation of the constructor function:

Constructor function for a module.
The module constructor allows a module implementor to initialize some module specific data at module load time. Some use cases for module constructors are initialization of global module data, or fetching optional symbols. Returning an error aborts the loading of the module. Is called before the symbols of the modules are exported/initialized.

Ahh, yes, loading symbols. This is already possible, as the constructor is provided the FimoModuleInstance of the instantiated module as a parameter. So it can just add dependencies and load symbols. But wait.

We want to improve the developer experience by making the loading of modules as simple as possible. So we notice that determining the module loading order manually is a pain in the ass and automate it with module loading sets. But then, in the very next step, we allow a module to add a dependency to a module, that might not yet exist, leading to an error anyway? This is asinine.

Oh, I know, let’s just pass the module loading set along to the module constructor so that it can request additional modules. That ought to solve everything:

FimoResult module_initializer(
    const FimoModuleInstance *module,
    FimoModuleLoadingSet set,
    void **state
) {
    // Add the dependency.
    if (!dependency_exists(module->context)) {
        add_dependency_to_set(set);
        // What now?
    }
    add_dependency(module);

    // Initialize the state.
    *state = malloc(...);
    if (*state == NULL) return FIMO_ENOMEM;

    init_state(state);
    return FIMO_EOK;
}

That does not seem to solve it. We are still stuck. If the dependency was not loaded at that moment we cannot proceed, so what we actually need is something like the following:

FimoResult module_initializer(
    const FimoModuleInstance *module,
    FimoModuleLoadingSet set,
    void **state
) {
    // Add the dependency.
    if (!dependency_exists(module->context)) {
        add_dependency_to_set(set);
        return FIMO_RETRY_ONCE_THE_DEPENDENCY_HAS_BEEN_LOADED_AND_NO_EARLIER;
    }
    add_dependency(module);

    // Initialize the state.
    *state = malloc(...);
    if (*state == NULL) return FIMO_ENOMEM;

    init_state(state);
    return FIMO_EOK;
}

I see absolutely no problem with this approach. Well, maybe a tiny one, as we have not told the module loader the dependency we are waiting on, it would retry polling the constructor indefinitely. Also, this does not seem that easy for more complex examples. Perhaps we need a different approach. Indulge me for a bit (really if you are still reading this, at this point, kudos to you):

async FimoResult module_initializer(
    const FimoModuleInstance *module,
    FimoModuleLoadingSet set,
    void **state
) {
    // Add the dependency.
    if (!dependency_exists(module->context)) {
        add_dependency_to_set(set);
        await wait_until_depedency_is_loaded(set);
    }
    add_dependency(module);

    // Initialize the state.
    *state = malloc(...);
    if (*state == NULL) return FIMO_ENOMEM;

    init_state(state);
    return FIMO_EOK;
}

Turns out, what we really need if we want to have proper modules, is an asynchronous initializer. With asynchronous initializers, we can simply suspend the initialization until all dependencies are loaded. Alas, C doesn’t have asynchronous functions, so we have to improvise.

There are generally two ways to implement async/await, also known as coroutines. Stackful coroutines and stackless coroutines. With stackful coroutines, we allocate a separate stack for the coroutine, and the yield operation is implemented by saving the current register state onto the stack and swapping the stack and register state to some stored prior. Think of it like setjmp and longjmp, except that we don’t jump to a prior point in our own call-stack, but into a different one. I prefer this style of coroutines, as it works without any code modification, but is much more difficult to implement. The alternative is to implement stackless coroutines, where the coroutine consists of a state that carries the current “stack” variables and is progressed by continuous polling. Kind of like the prior example. The main disadvantage of stackless coroutines is, that not all code is compatible with it, as it must be written in a manner that supports polling. An example is Rust, where they implement async, by transforming a function into a finite state machine. Our initializer could then be represented as follows:

stateDiagram-v2
    s0 : Check dependency or init wait
    s1 : Wait until loaded
    s2 : Add dependency and init

    [*] --> s0
    s0 --> s1 : dependency does not exist
    s0 --> s2 : dependency exists
    s1 --> s1 : dependency not yet loaded
    s1 --> s2 : dependency loaded
    s2 --> [*]

In this example, the only yield point is when we wait for a dependency to be loaded by the module loader, and is visible in the state diagram by the node that has a transition to itself. The coroutine can then be implemented by remembering the current FSM state and following the transitions until we reach the end:

struct InitializerState {
    int state;
    const FimoModuleInstance *module;
    FimoModuleLoadingSet set;
    wait_until_loaded_future fut;
};

struct InitializerResult {
    FimoResult result;
    void *state;
};

static bool poll_initializer(
    struct InitializerState *state,
    struct InitializerResult *result
) {
    while (true) {
        switch (state->state) {
        case 0:
            if (!dependency_exists(state->module->context)) {
                add_dependency_to_set(state->set);
                state->fut = wait_until_depedency_is_loaded(state->set);
                state->state = 1;
                continue;
            }
            state->state = 2;
            continue;
        case 1:
            if (poll_future(&state->fut) == false) return false;
            state->state = 2;
            continue;
        case 2:
            add_dependency(state->module);
            result->state = malloc(...);
            if (result->state == NULL) {
                state->state = 3;
                result->result = FIMO_ENOMEM;
                return true;
            }
            init_state(result->state);
            state->state = 3;
            result->result = FIMO_EOK;
            return true;
        }
        default:
            break;
    }
}

This construct should not be foreign, if you have ever had the need to implement an FSM by hand. Not depicted here is the initialization routine, which would allocate and initialize the InitializerState and InitializerResult structs, but the poll function is conceptually simple. It tries to make as much progress as possible, and if it can’t continue it returns false, signaling that it should be polled again. Actually, we can improve this further, by not polling the FSM continuously, even if we know that no progress can be made. How the Rust language does this, and also how it is implemented in the fimo engine, is to pass an additional FimoAsyncWaker instance, which has a wake method to signal the event loop to schedule another poll operation.

Now, this version is not much better than our original one, but serves as demonstration purpose, that any function can be transformed into an FSM and then turned into a future. A fully synchronous function can be implemented by an FSM with a single state, and is therefore trivial, but the complexity can skyrocket fairly quickly, depending on the control flow of the function where yield point is located, and if it should support other control flow constructs, like unwinding.

It may solace you to know that I had to implement the entire module loader in this manner, as it is written in Zig, which disabled its async support in version 0.12.

We then do the same for the start event and the dynamic symbol constructors, and call it a day. The stop event and the destructors could be asyncified in the future, but there is no imminent need for it to be the case.

So, where does that leave us now?

The current state

The module loader, as described here is just a part of the module subsystem of the fimo engine, which is implemented in the fimo-std library ⁹, using Zig as the implementation language. I have finally released the first version, version 0.1.0, after many years of iterations and failures.

The library is the foundation of the engine and of all future modules. Its purpose is to provide the fundamental definitions and abstractions to allow the development of modules. For that purpose, we implemented cross-platform time and file system path utilities, error definitions, like the FimoResult type as an abstract ok-or-error union, and, most important of all, the FimoContext type.

The context is an opaque object, which provides access to various subsystems, like the aforementioned module subsystem. In addition to that, it also provides an async subsystem, providing utilities to manage the single-threaded event loop used by the module subsystem to implement module loading and unloading, but is intended to be general-purpose and allows spawning custom futures if the need arises. The last subsystem currently implemented is a tracing subsystem, which provides utilities for structured logging and profiling.

The C-API ¹⁰ of the library is purposefully kept bare-bones, providing only the required type and function definitions, so it may not be as user-friendly as it could. Instead, I would advise looking into the Zig or Rust bindings to the library, which provide additional, much needed, utilities, like integrated async support in Rust, helper to generate the correct module instance type definitions.

Let’s close this section with an example of how two write two simple modules sharing symbols in Zig:

const std = @import("std");

const fimo_std = @import("fimo_std");
const Context = fimo_std.Context;
const Module = Context.Module;

// Symbol definitions.
const global_var_sym = Module.Symbol {
    .name = "global_variable",
    .version = .{ .major = 0, .minor = 1, .patch = 0 },
    .T = i32,
};
const print_hello_sym = Module.Symbol {
    .name = "print_hello",
    .version = .{ .major = 0, .minor = 1, .patch = 0 },
    .T = fn (i32) callconv(.c) void,
};
const add_two_numbers_sym = Module.Symbol {
    .name = "add_two_numbers",
    .version = .{ .major = 0, .minor = 1, .patch = 0 },
    .T = fn (i32, i32) callconv(.c) void,
};

// Module A
var global_variable: i32 = 1000;

fn printHello(num: i32) callconv(.c) void {
    for (0..num) |_| std.debug.print("Hello Friend!!!\n", .{});
}

fn addTwoNumbers(a: i32, b: i32) callconv(.c) void {
    std.debug.print("Sum of the numbers {}\n", .{ a + b });
}

fn onStartA(ctx: *const Module.OpaqueInstance) !void {
    std.debug.print("Hello from module A.\n", .{});
}

fn onStopA(ctx: *const Module.OpaqueInstance) void {
    std.debug.print("Bye from module A.\n", .{});
}

const ModuleA = Module.exports.Builder.init("module-a")
    .withDescription("Module to demonstrate the export functionality")
    .withAuthor("Gabriel Borrelli")
    .withLicense("MIT OR APACHE 2.0")
    .withExport(global_var_sym, "exp1", .global, &global_variable)
    .withExport(print_hello_sym, "exp2", .global, &printHello)
    .withExport(add_two_numbers_sym, "exp3", .global, &addTwoNumbers)
    .withOnStartEventSync(onStartA)
    .withOnStopEvent(onStopA)
    .exportModule();

// Module B
fn onStartB(ctx: *const Module.OpaqueInstance) !void {
    const module: *const ModuleB = @ptrCast(ctx);
    std.debug.print("Hello from module B.\n", .{});

    const imports = module.imports();
    imports.print_hello(10);
    imports.add_two_numbers(5, 6);
    std.debug.print("Value of global_variable.\n", .{imports.global_variable.*});
}

fn onStopB(ctx: *const Module.OpaqueInstance) void {
    const module: *const ModuleB = @ptrCast(ctx);
    std.debug.print("Bye from module B.\n", .{});
}

const ModuleB = Module.exports.Builder.init("module-b")
    .withDescription("Module to demonstrate the import functionality")
    .withAuthor("Gabriel Borrelli")
    .withLicense("MIT OR APACHE 2.0")
    .withImport(global_var_sym, "global_variable")
    .withImport(print_hello_sym, "print_hello")
    .withImport(add_two_numbers_sym, "add_two_numbers")
    .withOnStartEventSync(onStartB)
    .withOnStopEvent(onStopB)
    .exportModule();

comptime {
    _ = ModuleA;
    _ = ModuleB;
}

Where to go from here?

This was supposed to be a short article, but it ended being longer than my previous one. Like I said, I just released the first version of the engine on GitHub ¹¹. There is still an endless list of things left to do for the fimo-std library, like implementing Python support (Python bindings are “easy”, whereas Python modules are quite a bit harder), and better module discovery.

All that work has left me exhausted, so I am officially calling it quits for now, and I am instead going to implement some modules. I would really like, if I could bring the engine to a state, where it can show its first triangle by the end of this year, but I’m much more interested in getting the fundamentals right so that it becomes a project I would actually consider using myself.

To that end, I am trying to introduce a structured development process, where lay-out all features that I consider appropriate in the issue tracker, and assign them to a future roadmap version. For the moment, I plan to follow the Zig development process, where I try releasing a new version at roughly 6-month intervals. Everything that does not fit, will be postponed to a later version.

Since each symbol is versioned, I could start to introduce stable versions sooner, rather than later, if I don’t encounter any major issues when developing new modules.

Up next is the fimo-tasks module, an implementation of multithreaded stackful coroutines and synchronization primitives. I have been working in the implementation for the last month, and it is almost finalized. This is a major milestone for the engine, as it is the one module I wanted to implement since I started the fimo project five years ago. So, saying that I am exited is an understatement.

See you soon! Hopefully, with a shorter article.

Introduction#

What is a Module?#

Audio Plugins and CLAP#

Linux Kernel Modules#

File: mymodule1.c#

File: mymodule2.c#

Fimo Modules#

Exporting Modules#

Symbols#

Modifiers#

Managing the lifecycle of a Module#

Module Instances#

Module Lifetimes#

Pseudo Module Instances#

Module Loading Sets#

Asynchronous Module Initialization#

The current state#

Where to go from here?#