Wednesday, October 10, 2018

A milestone for Enki

So, I can now write an Enki source file like this:


import sys
sys:write("Hello world!\n")
return 0

and compile like so:

./enki inanna_amd64_target.ini test.e
jo@isis:~/Enki$ md5sum a.enk
339294406a9953c910984b87a8bd2334  a.enk

and then run this one single binary, which is 100% native statically compiled machine code, on all three x86 OSes Enki currently supports:

jo@isis:~/Enki$ ./a.enk
Hello world!

D:\git\Enki>a.enk
Hello world!

Joels~Mac:Enki jo$ ./a.enk 
Hello world!

    What arcane wizardry is this?

    As far as I'm aware, no other language out there can do that. Either you have a binary compiled for a specific OS, or something like a Java jar with bytecode which a compiler on the target system will turn into code for a specific OS, or an interpreter on the target system. That's because each OS has its own standard library with its own ABI, its own calling convention (though MacOS and Linux use the same calling convention on x86-64; Windows, Microsoft being Microsoft, decided to do their own thing) and its own executable file format.

    So, how is this accomplished?


    I defined my own executable file format, which I called Inanna. There's a dynamic linker, written in Enki, which has a specific build for each OS. This knows how to load and relocate Inanna binaries and also contains the Enki standard library (such as it is! it's very much a temporary placeholder), which is cross-platform - sys:write is binary compatible across different OSes, but within the loader boils down to making a write syscall on Linux and MacOS or calling WriteFile on Windows. This is possible because I also define my own calling convention for Enki (which I would need to do anyway since I want language support for continuations rather than using a conventional stack). Since the loader is just the equivalent of ld-linux plus libc - no compiler, JIT or otherwise; the code is already ready to go - it's a lot more lightweight and faster to start than something like a JVM.
    The executable format is defined as starting with '#!/usr/bin/env enkiloader' which means on Unix it transparently invokes the Enki dynamic linker; on Windows I just associate .enk files with it. In either case enkiloader gets invoked with the executable as argv[1] and does its thing.
    An executable file is actually less arcane than you might think - it really just consists of a series of records saying 'this bit of this file is code so you should mmap it as readable and executable, it's assuming it'll be loaded at this address but here's the bits you need to change if it gets put elsewhere, this bit of the file is constant data so map it read only' etc etc. By tagging the records with an architecture too I think it should be easy enough to make a cross-platform executable which contains both ARM and x86-64 programs and is able to share segments between the two where appropriate (strings for example come to mind). Noone else is doing this as far as I'm aware; closest are Mach-O fat binaries which are really two full executables smooshed together in an archive, or I guess maybe the MacOS app resource system, though that does mean your app is a directory, not just a single file.

    I guess the next step is to make this work on ARM  - there are likely to be 32 vs 64 bit issues and the relocations will be different (on x86 we have the luxury of inline 64-bit constants, on ARM not so much). Plus, while Enki supports generic functions a la CLOS, the linker doesn't yet (and supporting that is going to be complicated given a program should be able to add specialisations to generic functions declared in the library). But I thought low level nerds might find this sort of thing interesting. :)