Fimo Engine I: Auto Registration

What is fimo

Recently, I have started working on a small game engine, which I named fimo. By “recently” I really mean “a couple of years ago”, and “started” should really be understood as me constantly restarting from scratch once I encountered some major design flaw or started to get bored…nevertheless I now settled on an in my humble opinion, interesting core architecture, with many challenged ahead of me.

Some might question the utility of having projects that I not only never intend to bring to completion, but even frequently restart. To them, I would say, you’re right. I’m sorry, how dare a man have some hobbies and not spend all of their time for their sustenance and the greater good?

But really, sarcasm aside, some only see programming as a 9-5 job, or a means to an end if you may. Needless to say, my view differs a bit, and I don’t mind large chunks of my time on something that never comes to fruition. And besides, I wholeheartedly encourage everyone, especially beginners and less experienced programmers, to work on something that seems outside of one’s capability.

You may never know how much you can really accomplish if you never find the limits of your knowledge, and even then, there is often nothing preventing you from broadening your reach, and learning new things. How much can someone really learn by creating, yet another, to-do page with JavaScript and React? But maybe that is just my biased view as a researcher.

Personal Goals

Tangent aside… fimo, the latest revision of my work-in-progress game engine. The goal of fimo is to build a flexible, modular, cross-platform and language-agnostic game engine with a minimal set of dependencies. For the supported platforms, I’m currently targeting the usual suspects of Windows, Linux and macOS, but also have WebAssembly in my sights, for when some extensions that are relevant to the planned features are stabilized.

My approach when designing project without any time constraints is to start with a bottom-up approach, where I spend most of the time on what I consider fundamental building blocks, instead of speed-running for my personal best time-to-triangle.

The challenge this time around is to have a fully modular architecture, with a plugin-system and dependency injection ¹ as a core building block. The plugins, which in fimo lingo are called modules, are isolated units that have a statically known of imports and exports, such that their lifetime and dependencies can be fully managed by the core runtime, and support features like hot-code reloading.

Plugin-System

I envision the module-system to be fully programming language agnostic, such that one can choose an appropriate language for the task specific to the module. For instance, one may choose to write one module in an interpreted language to quickly iterate, such as writing GUIs, but then rewrite it in a language that compiles to native code if the performance characteristics require it.

In practice, we aren’t quite that flexible, as not all languages tend to collaborate. Specifically, we require languages that expose a stable interface, such that the different languages can be made to properly understand each other. In the computing world, this means that any language that we may like to use, is required to support the C ABI, as it is the de facto lingua franca of computing ².

This requirement considerably shrinks the list of suitable languages, including those compiling to native machine code, like C (obviously), C++, Rust and Zig, but also includes some that run in a virtual machine, like C#, Java, Kotlin and, believe it or not, Python. Also, I am fairly confident that it should be possible to write modules in JavaScript, by embedding an engine, like V8 ³. Regardless of the used language, all modules will be usable in all other supported languages.

The plan is to write the majority of the modules in either Rust or Zig, entirely dependent on the amount of unsafe required to implement the function of said module, but also include Python for some rapid prototyping.

The core library, tasked with managing module loading, unloading and dependencies, along with other common utilities, is written in Zig, since it’s easier to start with a language closer to the metal.

Roadmap

The engine is currently in its infancy, with only the core library implemented, but plan to tackle a multithreaded job system, with cross-language coroutines similar to Go’s goroutines ⁴ and tailored scheduling algorithms, an entity-component-system ⁵ leveraging Zigs comptime reflection to improve performance and cache locality, a bespoke GUI system, and many other modules required along the way.

If you are interested, or just want to watch, you can find the project on GitHub ⁶. I also plan to post development updates and considered designs here.

TL;DR

I’m building a game engine… maybe.

Auto Registration

With the rather long, introduction behind us, we finally arrive to the main part of this article, namely auto registration, or maybe global registration ⁷ could be described as a more descriptive name. To describe the problem at hand, we will start with a simple introductory example.

What is Auto Registration

Let’s assume that our source code defines a set of resources, be they types, functions, values or whatever. For the sake of simplicity, we assume that the type of resource is homogeneous, like the following example of Python code:

# foo.py
def foo() -> any:
    return "foo"

# bar.py

# Register this
def bar() -> any:
    return "bar"

# Don't register this
def bar_no_export(a: int) -> None:
    print(a)

Sometimes, it would be awfully convenient, if it were possible to somehow collect this set of resources, such that we may access them from a single place afterward, like so:

# Function to access all registered resources
def registered_resources() -> Iterable[Callable[[], any]]:
    ...

# Call all registered functions
for res in registered_resources():
    print(res())

Possible output:

foo
bar

Some may already see some use cases, especially during initialization or finalization. For instance, we may want to collect a list of test functions for a custom unit-testing framework, list a set of components that may be used with our ECS, or maybe collect all defined modules in our source code.

Now that I have, hopefully, convinced you somewhat about the utility of auto registration, let’s walk through different approaches to implement this.

The Non-Solution

A first instinct may simply be to create an array containing all resources, like so:

import foo
import bar

# Contains all registered resources
_resources: list[Callable[[], any]] = [
    foo.foo,
    bar.bar,
]

def registered_resources() -> Iterable[Callable[[], any]]:
    return iter(_resources)

There are some advantages to this method, for one, it works on all languages that let you define a global array of references, but there is a major downside.

The presented approach requires us to manually list all elements, which is not only inconvenient and error-prone, but sometimes not even possible, as we may not have access to the list. It would be much more convenient, and useful, if we could register our resource where we defined it, like the following:

# foo.py
def foo() -> any:
    return "foo"

# Register foo
register_resource(foo)

# bar.py
def bar() -> any:
    return "bar"

def bar_no_export(a: int) -> None:
    print(a)

# Register bar
register_resource(bar)

Auto Registration in Python

Thankfully, implementing this correctly in Python is not much more complex, we just need to create another Python module:

# resource_registry.py

# Contains all registered resources
_resources: list[Callable[[], any]] = []

# Creates an iterator over the registered resources
def registered_resources() -> Iterable[Callable[[], any]]:
    return iter(_resources)

# Registers a new resource
def register_resource(x: Callable[[], any]) -> Callable[[], any]:
    _resources.append(x)
    return x

Then we can just import the module:

# foo.py
from resource_registry import register_resource

def foo() -> any:
    return "foo"
    
# Register foo
register_resource(foo)

# bar.py
from resource_registry import register_resource

# Or using Python's decorator syntax
@register_resource
def bar() -> any:
    return "bar"

def bar_no_export(a: int) -> None:
    print(a)

There are two factors that make this work. First, in Python there is no separate compilation step, and everything written in the global scope of a module will be executed once the module is imported. Second, the Python runtime ensures that each module is only initialized once, therefore registering all resources only once.

The question is, can we achieve something similar in C, or generally, any language that compiles to machine code?

First attempt in C

Let’s try to do the same thing in C. For simplicity’s sake, let’s collect a list of const int* ⁸:

/// main.c
#include <stdlib.h>

const int **resources = NULL; 
int num_resources = 0;

int register_resource(const int* x) {
    num_resources++;
    resources = realloc(resources, sizeof(const int*) * num_resources);
    resources[num_resources - 1] = x;
    return *x;
}

const int foo = 5;
const int bar = 10;

int foo_ = register_resource(&foo);
int bar_ = register_resource(&bar);

Now, we just need to compile it, and…

<source>:16:12: error: initializer element is not constant
   16 | int foo_ = register_resource(&foo);
      |            ^~~~~~~~~~~~~~~~~
<source>:17:12: error: initializer element is not constant
   17 | int bar_ = register_resource(&bar);
      |            ^~~~~~~~~~~~~~~~~

Oh, no! So, it turns out, in C it is not allowed to execute functions at compile-time, which we require to fill our array. But I know, let’s try C++ where we have constexpr and consteval exactly for that purpose.

Second attempt in C++

const int **resources = nullptr;
int num_resources = 0;

consteval bool register_resource(const int* x) {
    const int **new_resources = new const int*[num_resources + 1];
    for (auto i = 0; i < num_resources; i++) {
        new_resources[i] = resources[i];
    }
    delete[] resources;

    resources = new_resources;
    num_resources++;
    return true;
}

constexpr const int foo = 5;
constexpr const int bar = 10;

static_assert(register_resource(&foo));
static_assert(register_resource(&bar));

So, now it should work ⁹:

<source>:19:32: error: non-constant condition for static assertion
   19 | static_assert(register_resource(&foo));
      |               ~~~~~~~~~~~~~~~~~^~~~~~
<source>:19:32: error: 'consteval bool register_resource(const int*)' called in a constant expression
<source>:4:16: note: 'consteval bool register_resource(const int*)' is not usable as a 'constexpr' function because:
    4 | consteval bool register_resource(const int* x) {
      |                ^~~~~~~~~~~~~~~~~
<source>:5:48: error: the value of 'num_resources' is not usable in a constant expression
    5 |     const int **new_resources = new const int*[num_resources + 1];
      |                                                ^~~~~~~~~~~~~
<source>:2:5: note: 'int num_resources' is not const
    2 | int num_resources = 0;
      |     ^~~~~~~~~~~~~
<source>:20:32: error: non-constant condition for static assertion
   20 | static_assert(register_resource(&bar));
      |               ~~~~~~~~~~~~~~~~~^~~~~~
<source>:20:32: error: 'consteval bool register_resource(const int*)' called in a constant expression

F@$#k. So, constexpr is pretty limited in what you can, specifically doing anything related to IO.

In this example, there are already multiple problems. The first problem is that a constant-evaluated function cannot access runtime-known values. Qualifying resources and num_resources as constexpr also doesn’t solve the problem, as then we wouldn’t be able to modify them.

Second, until recently, it was not allowed to perform any allocation in a const-evaluated context, so the new[] would have failed either way. This was changed recently with C++20, but the standard only allows for transient allocations.

What is a “transient allocation”, you may ask? It is an allocation that exists only at compile-time, and is deallocated immediately after. So it wouldn’t be of any use to us either way.

Is there really no way to fill an array at compile-time? Well, there is one way that even works in C.

A ray of hope

Let me show you a magic trick ¹⁰:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int a = 5;
const int b = 10;
const int c = 15;
const int d = 20;
const int e = 25;
const int f = 30;
const int g = 35;
const int h = 40;
const int i = 45;

int main() {
    size_t num_elements = ((intptr_t)&i - (intptr_t)&a) / sizeof(a);
    for(size_t i = 0; i <= num_elements; i++) {
        printf("%d\n", (&a)[i]);
    }
}

Which, remarkably I may add, outputs:

It is not immediately clear what is actually happening here, and why it even works, so let’s try playing with it for a bit. As a side note, don’t even think about doing something like this in production, as it is highly illegal, and will blow up in your face the moment that the compiler gods detect even a trace of disrespect.

Let’s start by marking some of the globals as mutable ¹¹:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int a = 5;
int b = 10;
const int c = 15;
int d = 20;
const int e = 25;
int f = 30;
const int g = 35;
int h = 40;
const int i = 45;

int main() {
    size_t num_elements = ((intptr_t)&i - (intptr_t)&a) / sizeof(a);
    for(size_t i = 0; i <= num_elements; i++) {
        printf("%d\n", (&a)[i]);
    }
}

Somehow, now we are only printing the constant globals:

From the output, it appears that the compiler chooses to place all constant globals together, and we could hypothesize that it would do the same thing for the mutable globals. Let’s try it out ¹²:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int a = 5;
int b = 10;
const int c = 15;
int d = 20;
const int e = 25;
int f = 30;
const int g = 35;
int h = 40;
const int i = 45;

int main() {
    size_t num_elements_const = ((intptr_t)&i - (intptr_t)&a) / sizeof(a);
    for(size_t i = 0; i <= num_elements_const; i++) {
        printf("constant: %d\n", (&a)[i]);
    }

    printf("\n");

    size_t num_elements_mut = ((intptr_t)&h - (intptr_t)&b) / sizeof(a);
    for(size_t i = 0; i <= num_elements_mut; i++) {
        printf("mutable: %d\n", (&b)[i]);
    }
}

As expected, we can now access both the constant and mutable globals in two different locations.

constant: 5
constant: 15
constant: 25
constant: 35
constant: 45

mutable: 10
mutable: 20
mutable: 30
mutable: 40

It gets even weirder, if we access out of bounds ¹³:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int a = 5;
int b = 10;
const int c = 15;
int d = 20;
const int e = 25;
int f = 30;
const int g = 35;
int h = 40;
const int i = 45;

int main() {
    size_t num_elements_const = 16;
    for(size_t i = 0; i <= num_elements_const; i++) {
        printf("constant: %d\n", (&a)[i]);
    }

    printf("\n");

    size_t num_elements_mut = 16;
    for(size_t i = 0; i <= num_elements_mut; i++) {
        printf("mutable: %d\n", (&b)[i]);
    }
}

Not only are the constants placed together with other constants, and mutable globals with other mutable globals, but as we can see, the locations where they are placed have entirely different characteristics:

constant: 5
constant: 15
constant: 25
constant: 35
constant: 45
constant: 1936617315
constant: 1953390964
constant: 1680154682
constant: 1970077706
constant: 1818386804
constant: 622869093
constant: 2660
constant: 990059265
constant: 40
constant: 4
constant: -4116
constant: 108

mutable: 10
mutable: 20
mutable: 30
mutable: 40
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0
mutable: 0

Executable File Formats

To figure out what is actually happening, we must first understand how a binary is actually stored after being compiled and linked.

As we might suspect, binaries aren’t just an amorphous blob of bytes, as that would make it really difficult for an operating system to actually load and execute them, rather there is a platform-dependent data structure, which stores additional metadata about the binary.

On Windows, this data structure is called PE (Portable Executable) ¹⁴, on Apple systems it’s Mach-O (Mach object) ¹⁵, and other *nix, like Linux, use ELF ¹⁶.

Conceptually, these three formats work similarly, and group the compiled code into regions called sections. There is a small caveat for Mach-O, but we’ll get there shortly.

For the ELF format, we can use the readelf utility to print the sections and their contents ¹⁷:

There are 23 section headers, starting at offset 0xd18:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000009d  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  00000760
       00000000000000a8  0000000000000018   I      20     1     8
  [ 3] .data             PROGBITS         0000000000000000  000000e0
       0000000000000010  0000000000000000  WA       0     0     4
  [ 4] .bss              NOBITS           0000000000000000  000000f0
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .rodata           PROGBITS         0000000000000000  000000f0
       000000000000002f  0000000000000000   A       0     0     4
  [ 6] .debug_info       PROGBITS         0000000000000000  00000120
       00000000000000df  0000000000000000   C       0     0     8
  [ 7] .rela.debug_info  RELA             0000000000000000  00000808
       0000000000000318  0000000000000018   I      20     6     8
  [ 8] .debug_abbrev     PROGBITS         0000000000000000  00000200
       00000000000000a9  0000000000000000   C       0     0     8
  [ 9] .debug_aranges    PROGBITS         0000000000000000  000002b0
       000000000000002f  0000000000000000   C       0     0     8
  [10] .rela.debug_[...] RELA             0000000000000000  00000b20
       0000000000000030  0000000000000018   I      20     9     8
  [11] .debug_line       PROGBITS         0000000000000000  000002e0
       0000000000000081  0000000000000000   C       0     0     8
  [12] .rela.debug_line  RELA             0000000000000000  00000b50
       00000000000000d8  0000000000000018   I      20    11     8
  [13] .debug_str        PROGBITS         0000000000000000  00000368
       00000000000000a7  0000000000000001 MSC       0     0     8
  [14] .debug_line_str   PROGBITS         0000000000000000  00000410
       0000000000000084  0000000000000001 MSC       0     0     8
  [15] .comment          PROGBITS         0000000000000000  00000494
       000000000000003a  0000000000000001  MS       0     0     1
  [16] .note.GNU-stack   PROGBITS         0000000000000000  000004ce
       0000000000000000  0000000000000000           0     0     1
  [17] .note.gnu.pr[...] NOTE             0000000000000000  000004d0
       0000000000000030  0000000000000000   A       0     0     8
  [18] .eh_frame         PROGBITS         0000000000000000  00000500
       0000000000000038  0000000000000000   A       0     0     8
  [19] .rela.eh_frame    RELA             0000000000000000  00000c28
       0000000000000018  0000000000000018   I      20    18     8
  [20] .symtab           SYMTAB           0000000000000000  00000538
       00000000000001f8  0000000000000018          21     9     8
  [21] .strtab           STRTAB           0000000000000000  00000730
       000000000000002d  0000000000000000           0     0     1
  [22] .shstrtab         STRTAB           0000000000000000  00000c40
       00000000000000d3  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

As we can see, the compiled program is made up of multiple sections, with the most important being .text, .data, .bss, and .rodata. The .text section contains all functions of our program, whereas .data, .bss, and .rodata contain all our global variables.

As we have seen with our examples, mutable and constant global variables are placed at different positions, namely in the .data section if they are mutable and in the .rodata section if they are constant. The reasoning for this is, that the operating system can enforce additional write-access protections on different sections to prevent bugs from accidentally overwriting a constant or even functions. The .bss section is a special case, for globals that are initialized to 0, meaning that they don’t need to be actually stored in the binary.

To ensure that the global variables are truly placed where we think they were, we can dump the contents of the .data and .rodata sections:

Hex dump of section '.data':
  0x00000000 0a000000 14000000 1e000000 28000000 ............(...

Hex dump of section '.rodata':
  0x00000000 05000000 0f000000 19000000 23000000 ............#...
  0x00000010 2d000000 636f6e73 74616e74 3a202564 -...constant: %d
  0x00000020 0a006d75 7461626c 653a2025 640a00   ..mutable: %d..

We need some base 16 knowledge to decipher the content of the section, but we can clearly see 0x0a = 10, 0x14 = 20, 0x1e = 30 and 0x28 = 40 in the .data section, corresponding to our four mutable variables, and 0x05 = 5, 0x0f = 15, 0x19 = 25, 0x23 = 35 and 0x2d = 45 in the .rodata section for the remaining constant variables.

As we have seen, there are specific rules to how a compiler lays-out the compiled code and data, and how it is afterward loaded by the operating system. As a last step, we need to briefly explain the last utility that is involved when compiling a program.

Enter the linker

The final part of the compilation process, that I neglected to mention, is that the compiler is not actually responsible for outputting the final executable, instead its only job is to generate intermediate object files for each individual compilation unit. In the context of C and C++, a compilation unit is one .c/.cpp file, but for Rust, it is an entire crate.

The final step of merging all compilation units into a single executable is left to the linker. It usually does it work, quietly, in the shadow of the compiler, but you may have encountered it, if you defined a function of the same name multiple times in different compilation units, or use an undefined function.

A deep dive into the works of a linker would take too long for this blog post, but it suffices to say, that, provided with a list of object files, it takes generates the final executable, by merging sections of the same name. In other words, it essentially copy-pastes the contents of each section into a bigger section of the same name.

With this, we have all the puzzle pieces in place, to make the compiler bend to our will.

(Ab)Using the linker for Auto Registration

As discussed, the compiler places our defined globals contiguously into one specific section over multiple translation units, which the linker then merges into a single section. We don’t have much control how, and in what order, the linker concatenates the sections, which is why we cannot use this for our use case. That is, if we only consider preexisting sections…

ELF

The trick is to instruct the compiler to place our data into a section that is entirely controlled by us. Luckily, this can be accomplished through non-standard extensions to the C compilers ¹⁸:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int *dummy_resource 
        __attribute__((retain, used, section("my_section"))) = NULL;

extern const int *__start_my_section;
extern const int *__stop_my_section;

const int a = 5;
const int b = 10;
const int c = 15;
const int d = 20;
const int e = 25;
const int f = 30;
const int g = 35;
const int h = 40;
const int i = 45;

const int *a_ __attribute__((retain, used, section("my_section"))) = &a;
const int *b_ __attribute__((retain, used, section("my_section"))) = &b;
const int *c_ __attribute__((retain, used, section("my_section"))) = &c;
const int *d_ __attribute__((retain, used, section("my_section"))) = &d;
const int *e_ __attribute__((retain, used, section("my_section"))) = &e;
const int *f_ __attribute__((retain, used, section("my_section"))) = &f;
const int *g_ __attribute__((retain, used, section("my_section"))) = &g;
const int *h_ __attribute__((retain, used, section("my_section"))) = &h;
const int *i_ __attribute__((retain, used, section("my_section"))) = &i;

int main() {
    const int **cursor = &__start_my_section;
    for (const int **it = &__start_my_section; it != &__stop_my_section; it++) {
        if (*it == NULL) continue;
        printf("resource: %d\n", **it);
    }
}

Outputting:

resource: 5
resource: 10
resource: 15
resource: 20
resource: 25
resource: 30
resource: 35
resource: 40
resource: 45

Here we have the same setup as beforehand, where we want to register to 9 variables, but now do that by introducing a custom section.

Let’s start by dissecting the __attribute__((retain, used, section("my_section"))) statement. __attribute__(...) is a GCC and Clang extension to the C and C++ languages, and is required to instruct the compiler to function in a way that differs from the standard.

Starting from right to left, section("my_section") instructs the compiler to place the definition into the my_section section so that we can iterate over the contents of that section.

Placing the values in the correct section is sadly not enough to ensure correctness, as the compiler and the linker are allowed to remove symbols they deem not useful, if they cannot detect any references to them. The used attribute ensures to the compiler that the symbol is truly required and may not be used. The retain attribute does the same, just for the linker.

We use that attribute, to define dummy_resource, which forces the creation of the my_section section, containing a single NULL pointer. Without this, it is not possible to have an empty resource list, as the section is only created when at is filled.

For each section, the linker defines the __start_{name} and __stop_{name} symbols, which identify the starting and ending position of the section in memory. We can then use those to iterate over the contents of our section.

To actually fill the section, we can proceed like with dummy_resource, but with a pointer to an actual global. To note, we don’t know the actual number of elements in our section, so we always iterate over the entire section, and determine if a slot is filled by comparing it with NULL.

Unlike our first experiment with C, this behavior is well-defined, and relied upon by many software packages. We adapt this a little to also work on Apple platforms.

Mach-O

In Mach-O there is a subtle difference to how sections are linked. Notably, Mach-O also requires that we define to which segment a section belongs to, where a segment is a group of multiple segments ¹⁹:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

const int *dummy_resource 
        __attribute__((retain, used, section("__DATA,my_section"))) = NULL;

extern const int *start_my_section __asm("section$start$__DATA$my_section");
extern const int *stop_my_section __asm("section$end$__DATA$my_section");

const int a = 5;
const int b = 10;
const int c = 15;
const int d = 20;
const int e = 25;
const int f = 30;
const int g = 35;
const int h = 40;
const int i = 45;

const int *a_ __attribute__((retain, used, section("__DATA,my_section"))) = &a;
const int *b_ __attribute__((retain, used, section("__DATA,my_section"))) = &b;
const int *c_ __attribute__((retain, used, section("__DATA,my_section"))) = &c;
const int *d_ __attribute__((retain, used, section("__DATA,my_section"))) = &d;
const int *e_ __attribute__((retain, used, section("__DATA,my_section"))) = &e;
const int *f_ __attribute__((retain, used, section("__DATA,my_section"))) = &f;
const int *g_ __attribute__((retain, used, section("__DATA,my_section"))) = &g;
const int *h_ __attribute__((retain, used, section("__DATA,my_section"))) = &h;
const int *i_ __attribute__((retain, used, section("__DATA,my_section"))) = &i;

int main() {
    const int **cursor = &start_my_section;
    for (const int **it = &start_my_section; it != &stop_my_section; it++) {
        if (*it == NULL) continue;
        printf("resource: %d\n", **it);
    }
}

The only two differences here are, that section(...) must be of the form section({SEGMENT},{SECTION}), here we specify that our section belongs to the __DATA segment.

The other notable change is, that the linker chooses other names for our section-range symbols, namely section$start$__DATA${SECTION} and section$end$__DATA${SECTION}. These are invalid C identifiers, but we can refer to them by explicitly binding start_my_section and stop_my_section to them.

PE

Finally, there is PE, which is the odd one out, as the linker functions slightly differently ²⁰:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

/// Define the sections as read only.
#pragma section("my_sec$a", read)
#pragma section("my_sec$u", read)
#pragma section("my_sec$z", read)

__declspec(allocate("my_sec$a")) const int *start_my_section = NULL;
__declspec(allocate("my_sec$z")) const int *stop_my_section = NULL;

const int a = 5;
const int b = 10;
const int c = 15;
const int d = 20;
const int e = 25;
const int f = 30;
const int g = 35;
const int h = 40;
const int i = 45;

__declspec(allocate("my_sec$u")) const int *a_ = &a;
__declspec(allocate("my_sec$u")) const int *b_ = &b;
__declspec(allocate("my_sec$u")) const int *c_ = &c;
__declspec(allocate("my_sec$u")) const int *d_ = &d;
__declspec(allocate("my_sec$u")) const int *e_ = &e;
__declspec(allocate("my_sec$u")) const int *f_ = &f;
__declspec(allocate("my_sec$u")) const int *g_ = &g;
__declspec(allocate("my_sec$u")) const int *h_ = &h;
__declspec(allocate("my_sec$u")) const int *i_ = &i;

int main() {
    const int **cursor = &start_my_section;
    for (const int **it = &start_my_section; it != &stop_my_section; it++) {
        if (*it == NULL) continue;
        printf("resource: %d\n", **it);
    }
}

Unlike other platforms, we first have to specify which sections we want to create, and what access permission we grant at runtime. Here, __declspec() has the same function as __attribute__.

There is an additional limitation, that section names may only be 8 characters long. An additional quirk of the linker is that it does not create section-delimiter symbols, unlike other platforms. We can force the same outcome, by using three different variants of our section, my_sec$a, my_sec$u and my_sec$z. Due to another quirk of the linker, it orders the contents of each section lexicographically by the suffix after the $ in our section name. In other words, all the data in my_sec$a will be placed before all the data of my_sec$u, which is placed before the start of my_sec$z.

With this knowledge, we can force our own section-delimiter symbols, by allocating one global in the my_sec$a and my_sec$z sections, while allocating all other data in the my_sec$u section.

Macro Magic

With a bit of macro magic we can abstract the platform away ²¹:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

#define CONCAT(a, b) CONCAT_INNER(a, b)
#define CONCAT_INNER(a, b) a ## b

#ifdef _WIN32
#pragma section("my_sec$a", read)
#pragma section("my_sec$u", read)
#pragma section("my_sec$z", read)

__declspec(allocate("my_sec$a")) const int *start_my_section = NULL;
__declspec(allocate("my_sec$z")) const int *stop_my_section = NULL;

#define register_export(SYM) \
    __declspec(allocate("my_sec$u")) const int *CONCAT(SYM, __LINE__) = &(SYM)
#elif __APPLE__
const int *dummy_resource 
        __attribute__((retain, used, section("__DATA,my_section"))) = NULL;

extern const int *start_my_section __asm("section$start$__DATA$my_section");
extern const int *stop_my_section __asm("section$end$__DATA$my_section");

#define register_export(SYM) const int *CONCAT(SYM, __COUNTER__) \
        __attribute__((retain, used, section("__DATA,my_section"))) = &(SYM)
#else
const int *dummy_resource 
        __attribute__((retain, used, section("my_section"))) = NULL;

extern const int *start_my_section __asm("__start_my_section");
extern const int *stop_my_section __asm("__stop_my_section");

#define register_export(SYM) const int *CONCAT(SYM, __COUNTER__) \
        __attribute__((retain, used, section("my_section"))) = &(SYM)
#endif

const int a = 5;
const int b = 10;
const int c = 15;
const int d = 20;
const int e = 25;
const int f = 30;
const int g = 35;
const int h = 40;
const int i = 45;

register_export(a);
register_export(b);
register_export(c);
register_export(d);
register_export(e);
register_export(f);
register_export(g);
register_export(h);
register_export(i);

int main() {
    const int **cursor = &start_my_section;
    for (const int **it = &start_my_section; it != &stop_my_section; it++) {
        if (*it == NULL) continue;
        printf("resource: %d\n", **it);
    }
}

With this, we have something that kind of looks like our starting Python example, but works across multiple languages like Rust and Zig, as long as the binaries are statically linked together.

Rust

In Rust, we can use a similar strategy like we did in C and C++, by using macros, but the result is a lot cleaner due to Rust’s macro hygiene ²²:

#[macro_export]
macro_rules! register_resource {
    ($val:literal) => {
        const _: () = {
            #[allow(dead_code)]
            #[repr(transparent)]
            struct Wrapper(&'static i32);

            #[used]
            #[cfg_attr(windows, link_section = "my_sec$u")]
            #[cfg_attr(
                all(unix, target_vendor = "apple"),
                link_section = "__DATA,my_section"
            )]
            #[cfg_attr(
                all(unix, not(target_vendor = "apple")),
                link_section = "my_section"
            )]
            static EXPORT: Wrapper = Wrapper(&$val);
        };

        // For ELF targets the linker garbage collection tends to
        // remove our custom section. On the C/C++ side, we can
        // use the `retain` attribute to force the linker to keep
        // the section. As a workaround, we can keep the section
        // alive by adding a relocation.
        #[cfg(all(unix, not(target_vendor = "apple")))]
        core::arch::global_asm!(
            ".pushsection .init_array,\"aw\",%init_array",
            ".reloc ., BFD_RELOC_NONE, my_section",
            ".popsection"
        );
    };
}

#[allow(dead_code)]
const _: () = {
    #[allow(dead_code)]
    #[repr(transparent)]
    struct Wrapper(*const i32);
    unsafe impl Sync for Wrapper {}

    #[used]
    #[no_mangle]
    #[cfg(windows)]
    #[link_section = "my_sec$a"]
    static start_my_section: Wrapper = Wrapper(std::ptr::null());

    #[used]
    #[no_mangle]
    #[cfg(windows)]
    #[link_section = "my_sec$z"]
    static stop_my_section: Wrapper = Wrapper(std::ptr::null());

    #[used]
    #[no_mangle]
    #[cfg_attr(
        all(unix, target_vendor = "apple"),
        link_section = "__DATA,my_section"
    )]
    #[cfg_attr(
        all(unix, not(target_vendor = "apple")),
        link_section = "my_section"
    )]
    static dummy_resource: Wrapper = Wrapper(std::ptr::null());
};

// Can be removed, if only the registration has to be implemented.
extern "C" {
    #[cfg_attr(
        all(unix, target_vendor = "apple"),
        link_name = "section$start$__DATA$my_section"
    )]
    #[cfg_attr(
        all(unix, not(target_vendor = "apple")),
        link_name = "__start_my_section"
    )]
    static start_my_section: *const i32;

    #[cfg_attr(
        all(unix, target_vendor = "apple"),
        link_name = "section$end$__DATA$my_section"
    )]
    #[cfg_attr(
        all(unix, not(target_vendor = "apple")),
        link_name = "__stop_my_section"
    )]
    static stop_my_section: *const i32;
}

register_resource!(5);
register_resource!(10);
register_resource!(15);
register_resource!(20);
register_resource!(25);
register_resource!(30);
register_resource!(35);
register_resource!(40);
register_resource!(45);

pub fn main() {
    unsafe {
        let start: *const *const i32 = &start_my_section;
        let stop: *const *const i32 = &stop_my_section;
        let num_resources = stop.offset_from(start);

        for i in 0..num_resources {
            let cursor = start.offset(i);
            if (*cursor).is_null() { continue; }
            println!("resource: {}", **cursor);
        }
    }
}

Here we can directly export a pointer due to Rust’s const promotion rules.

Zig

For good measure, we can also implement this in Zig ²³:

const std = @import("std");
const builtin = @import("builtin");

comptime {
    _ = if (builtin.os.tag == .windows)
        struct {
            export const start_my_section: ?*i32 linksection("my_sec$a") = null;
            export const stop_my_section: ?*i32 linksection("my_sec$z") = null;
        }
    else if (builtin.os.tag.isDarwin())
        struct {
            export const dummy: ?*const i32 linksection("__DATA,my_section") = null;
            comptime {
                asm (
                    \\.global start_my_section
                    \\start_my_section = section$start$__DATA$my_section
                    \\
                    \\.global stop_my_section
                    \\stop_my_section = section$end$__DATA$my_section
                );
            }
        }
    else
        struct {
            export const dummy: ?*const i32 linksection("my_section") = null;
            comptime {
                asm (
                    \\.pushsection .init_array,"aw",%init_array
                    \\.reloc ., BFD_RELOC_NONE, my_section
                    \\.popsection
                    \\
                    \\.global start_my_section
                    \\start_my_section = __start_my_section
                    \\
                    \\.global stop_my_section
                    \\stop_my_section = __stop_my_section
                );
            }
        };
}

extern const start_my_section: ?*const i32;
extern const stop_my_section: ?*const i32;

pub inline fn registerResource(comptime x: i32) void {
    const section_name = if (builtin.os.tag == .windows)
        "my_sec$u"
    else if (builtin.os.tag.isDarwin())
        "__DATA,my_section"
    else
        "my_section";
    
    _ = struct {
        const val_ptr: *const i32 = &x;
        comptime {
            @export(&val_ptr, .{
                .name = "resource_" ++ @typeName(@This()),
                .section = section_name,
                .linkage = .strong,
                .visibility = if (builtin.os.tag == .windows) .default else .hidden,
            });
        }
    };
}

comptime {
    registerResource(5);
    registerResource(10);
    registerResource(15);
    registerResource(20);
    registerResource(25);
    registerResource(30);
    registerResource(35);
    registerResource(40);
    registerResource(45);
}

pub fn main() void {
    const start: [*]const ?*const i32 = @ptrCast(&start_my_section);
    const stop: [*]const ?*const i32 = @ptrCast(&stop_my_section);
    const num_resources = stop - start;
    for (start[0..num_resources]) |cursor| {
        if (cursor) |cur| std.debug.print("resource: {d}\n", .{cur.*});
    }
}

Limitations

As presented, this trick only works, if the linker can see all resources that we wish to collect, which is only possible with static linking. In our case, this is not a big deal, as I don’t intend to link my executables dynamically.

Note that there is a difference between dynamic linking and producing a shared library that will later be consumed with something like dlopen.

Another problem is that the number of possible sections is limited on PE, so special care should be taken to ensure that the chosen section name is reserved for us.

Future steps

With this post, we have already taken the first big step of the fimo engine. We can now specify how to register and also discover modules in compiled binaries. In the next, hopefully shorter, post, we will look into the module interface, and how to load them at runtime.

What is fimo#

Personal Goals#

Plugin-System#

Roadmap#

TL;DR#

Auto Registration#

What is Auto Registration#

The Non-Solution#

Auto Registration in Python#

First attempt in C#

Second attempt in C++#

A ray of hope#

Executable File Formats#

Enter the linker#

(Ab)Using the linker for Auto Registration#

ELF#

Mach-O#

PE#

Macro Magic#

Rust#

Zig#

Limitations#

Future steps#