Wednesday, October 10, 2018

A milestone for Enki

So, I can now write an Enki source file like this:


import sys
sys:write("Hello world!\n")
return 0

and compile like so:

./enki inanna_amd64_target.ini test.e
jo@isis:~/Enki$ md5sum a.enk
339294406a9953c910984b87a8bd2334  a.enk

and then run this one single binary, which is 100% native statically compiled machine code, on all three x86 OSes Enki currently supports:

jo@isis:~/Enki$ ./a.enk
Hello world!

D:\git\Enki>a.enk
Hello world!

Joels~Mac:Enki jo$ ./a.enk 
Hello world!

    What arcane wizardry is this?

    As far as I'm aware, no other language out there can do that. Either you have a binary compiled for a specific OS, or something like a Java jar with bytecode which a compiler on the target system will turn into code for a specific OS, or an interpreter on the target system. That's because each OS has its own standard library with its own ABI, its own calling convention (though MacOS and Linux use the same calling convention on x86-64; Windows, Microsoft being Microsoft, decided to do their own thing) and its own executable file format.

    So, how is this accomplished?


    I defined my own executable file format, which I called Inanna. There's a dynamic linker, written in Enki, which has a specific build for each OS. This knows how to load and relocate Inanna binaries and also contains the Enki standard library (such as it is! it's very much a temporary placeholder), which is cross-platform - sys:write is binary compatible across different OSes, but within the loader boils down to making a write syscall on Linux and MacOS or calling WriteFile on Windows. This is possible because I also define my own calling convention for Enki (which I would need to do anyway since I want language support for continuations rather than using a conventional stack). Since the loader is just the equivalent of ld-linux plus libc - no compiler, JIT or otherwise; the code is already ready to go - it's a lot more lightweight and faster to start than something like a JVM.
    The executable format is defined as starting with '#!/usr/bin/env enkiloader' which means on Unix it transparently invokes the Enki dynamic linker; on Windows I just associate .enk files with it. In either case enkiloader gets invoked with the executable as argv[1] and does its thing.
    An executable file is actually less arcane than you might think - it really just consists of a series of records saying 'this bit of this file is code so you should mmap it as readable and executable, it's assuming it'll be loaded at this address but here's the bits you need to change if it gets put elsewhere, this bit of the file is constant data so map it read only' etc etc. By tagging the records with an architecture too I think it should be easy enough to make a cross-platform executable which contains both ARM and x86-64 programs and is able to share segments between the two where appropriate (strings for example come to mind). Noone else is doing this as far as I'm aware; closest are Mach-O fat binaries which are really two full executables smooshed together in an archive, or I guess maybe the MacOS app resource system, though that does mean your app is a directory, not just a single file.

    I guess the next step is to make this work on ARM  - there are likely to be 32 vs 64 bit issues and the relocations will be different (on x86 we have the luxury of inline 64-bit constants, on ARM not so much). Plus, while Enki supports generic functions a la CLOS, the linker doesn't yet (and supporting that is going to be complicated given a program should be able to add specialisations to generic functions declared in the library). But I thought low level nerds might find this sort of thing interesting. :)


Wednesday, March 29, 2017

Enki - my pet compiler project

So, for the last few years I've worked a bit at a time on a pet project - a compiler for a language of my own design, Enki, to Linux/MacOS/Windows/Android native code. Enki syntactically looks like Python with a dash of Pascal mixed in and is a statically-typed, statically-compiled language. I'm intending to explore things like cross-platform native-code binaries, extensible OO and just generally cool and fun programming language features.

Here's interview classic FizzBuzz:

Uint64 max = 100
Uint64 count = 0
Byte[20] number
while count < max
    if (count % 15) == 0
        write("Fizzbuzz\n")
    elif (count % 3) == 0
        write("Fizz\n")
    elif (count % 5) == 0
        write("Buzz\n")
    else
        num_to_str(count, @number)
        write(@number)
        write("\n")
    count = count + 1

Also supported are nested functions, Python-style generators and CLOS-style multimethods/OO.

GitHub link is here: https://github.com/jotheberlock/Enki

Links to HTML documentation (note the internal links won't work, unfortunately):

Overview
FAQ
What the compiler does when

Wednesday, February 1, 2012

Wayland, Android and desktop Linux - a marriage made in heaven?

I noticed today that Wayland is rapidly reaching 1.0. It's no secret that many people (Canonical for one) see this as being the desktop graphics solution for the future...but I got to thinking; what if it could help us get proper Linux like Ubuntu onto Android devices too?

See, Android has its own userland, wildly different from a normal Linux system. I'd personally much rather be able to have the normal Posix/GNU/Linux API and utilities available to me, along with all that lovely desktop open-source software  (which is why I was rather upset when Nokia canned http://en.wikipedia.org/wiki/MeeGo). The kernel is GPLed, so we generally have the ability enforced by law to build Linux kernels for these devices, and for most devices it's possible to take the ICS source and combine it with the binary blobs from the OEM and install your own version of Android, so you'd think bringing up something like Ubuntu on the phone or tablet of your choice would be simple enough. Sadly this turns out not to be the case, and the main problem is the graphics drivers.

Sadly, essentially all Android devices have closed-source graphics hardware. The support for it is provided in the form of a binary userspace library that provides OpenGL ES and is linked with Bionic, Android's stripped-down libc, and not glibc. That means for Linux programs to use it they have to use Android's userspace; so getting X11 running on an Android device would a) mean porting it to bionic (and it would not surprise me if bionic is missing some Posix stuff X would like) and b) using OpenGL ES as its driver backend, which to my knowledge noone has done. So while you do see Ubuntu running on Android tablets, the usual method is to run Ubuntu in a chroot with an X11 server running as an unaccelerated VNC server, then running an Android VNC viewer to actually see the desktop. No hardware acceleration whatsoever and an extra trip over the network to boot.

Ugh. This is bad enough in performance with an 800x480 phone. With a tablet it's untenable and will get worse when 'retina display' tablets become a thing. You pretty much need at least accelerated compositing and bitblt (for scrolling), which is why OEMs are required to provide hardware acceleration for 2d in ICS if they want Google certification.

However! Wayland is built to use OpenGL ES already, and it's small by design. So, what if we were to port Wayland to Bionic/Android and SurfaceFlinger? (while keeping the Wayland client libraries on glibc). The common cases of 'composite windows' and 'move windows around' becomes hardware accelerated, as it should be. You can build support for this into something like CyanogenMod and every CyanogenMod device can suddenly run Ubuntu, Debian and co. as a first class citizen. Or, you can not use the Android stuff and turn your Galaxy Tab or Nook Color into a proper Ubuntu tablet with just enough Android userland to run Wayland and deal with wifi, talking to the cellphone or anything else that needs a binary driver. Seems like a win-win to me.






Sunday, January 29, 2012

Taking another look at LLDB

I decided to take a small break from working on my ereader and see how lldb (the LLVM project's debugger) is coming along for Linux. Unfortunately, it doesn't seem to have got very far since the last time I looked at it, when I provided a patch to fix problems with ptrace() -

http://lists.cs.uiuc.edu/pipermail/lldb-dev/2011-October/000686.html
http://lists.cs.uiuc.edu/pipermail/lldb-dev/2011-October/000690.html

That fix is in, but there are various small problems in the source (mostly missed header includes) that prevent compilation, so I've resubmitted a patch for that.
There's a FreeBSD/Linux fork that some people are working on, but it seems the same problems apply there too, so I decided to supply compile patches for both -

http://lists.cs.uiuc.edu/pipermail/lldb-dev/2012-January/000783.html

The FreeBSD fork does at least provide a valid stack trace of sorts when debugging a random little test application that just loops and printf's 'Hello world', but it doesn't seem to be in main(), and attaching to an already running process straight-up segfaults. I guess I'll poke around a bit in there and see what I can come up with, because the project as a whole is pretty exciting. It's a shame it's not being worked on more out of the MacOS X world.

Wednesday, January 18, 2012

Calliope hits alpha

I've gotten to the stage where my ereader has the basic functionality I wanted - you can read books with it, you can correct mis-spelled words on the fly, and the corrections are persistent. So I put it up on the Android market in case anyone wants to play with it -

https://market.android.com/details?id=org.kde.necessitas.example.calliope

It's glitchy, partly because of some bugs in my code, partly because Qt for Android is in alpha and has its own quirks (for example, it's a known bug that the onscreen keyboard defaults to upper case for some reason, and there seems to be an issue with settings not being saved; this'll no doubt be corrected by the next Qt release). I've also started working on making the UI work differently for Android versus the desktop; the button bar shows up by default above the page for the desktop, and pops up with the menu button on Android.

Still, bugs and all, you can read with it, and I've added some nice-to-have as opposed to essential features as well - the reader uses a filesystem watcher on the directories in which it searches for books, so drop a new one in and it'll show right up on the menu, and it interacts with Windows/X11 session management so you can log off and on again and not lose your place.

The coolest thing I've added, though, is the filter manager. Basically, the reader works with a stream of elements parsed out of the HTML - some images, some pagebreaks, but mostly paragraphs of text. Filters operate at various points in the paragraph's transition from 'list of words with attributes (e.g. bold, italic)' to 'group of words at given x/y coordinates in a bounding box', and also when someone clicks/touches the screen inside a paragraph.

The spelling-correction filter is run before the text layout process; it has a map of corrections of the form 'the third word of the paragraph, which is coler, should actually be colour' and makes the appropriate alterations in the list.
There's also a dictionary-lookup filter which is invoked (if set as the active touch filter) when a word is pressed, after the text has been laid out. At some point I'll likely also add a filter that operates after text layout but before rendering to justify the text (such that it lines up on both left and right margins as opposed to the default ragged right alignment).

 I've not done much with the dictionary yet, and the API needs work (it should be asynchronous for a start). That done, it would be easy enough to
for example look a word up on Wikipedia from within the application given Qt's http support. Right now, though, the only dictionary is for Latin (which I'm in the process of learning), which in itself took a bit of work. Latin is a highly inflected language - that is, where we in English add words to change the meaning of a word, it tends to use different endings instead. 'I have' is habeo, 'we have' is habemus, 'we were having' is habebamus, and so forth, so it's not a straightforward thing for a computer to go from a random Latin word to the canonical form in which it appears in a conventional dictionary.

There is a program that does know how to do this, though, using some very clever algorithms and knowledge of Latin grammar; it's known as Whitaker's Words, and it is open source. Unfortunately, its author made the somewhat...unusual choice of Ada as its implementation language; unsurprisingly, an Ada compiler is not part of the Android NDK. The 'nice' way to bring that capability to Android would be to reimplement the program in C++, but that would involve quite a bit of work, and this is more for my own use than anything else, so I took a quick and dirty route.

Available for Whitakers Words is a list of every word understood by the program in all its forms (so it would include habeo, habemus, habebamus etc). I wrote a little program which reads that list, invokes Whitakers Words on each one, and writes the output into a file, writing an index into another file of the form source word, current position in the output file, length of the string from Words. This takes quite a while to run (about as long as ICS takes to build on my machine) and generates about a 250 meg output file and 20 meg index. My dictionary loads the index the first time a word is queried and uses that to seek into the output file and pull out the word's definition (I originally tried simply using Qt's built-in IO facilities to write out the QHash into a binary index file, but that actually ended up producing a bigger file for some reason).

On the off chance this would be useful for someone else in the same situation, the utility is at

https://github.com/jotheberlock/whitakerwords

and the source for the dictionary is whitaker.cpp in Calliope's source.

Incidentally, some of the books I've been working with make me really sympathetic towards the developers of browsers (an ebook reader is, after all, functionally a simplified HTML renderer with some special needs). For the most part Calliope basically displays anything in <p> tags as paragraphs, but one book in particular was a long stream of text, not in any form of block tag, broken only with <br>'s at the end of each paragraph. From what I can find on the web this was all the rage back in about 1992.  I put something in to deal with that case, but there's at least one other book out there where most of the text doesn't show up; I'm investigating why.


Saturday, December 24, 2011

How to get Android Icecream Sandwich working on a Pandaboard ES

I have, after some messing about, compiled Android from the AOSP source release and got it running on my Pandaboard ES. Here are some random hints for anyone else trying to do the same -


Most importantly, make sure you're pulling the latest master branch from AOSP. A fix went in on, oh, Tuesday or so that actually made the bootloader and fastboot work on the ES board. Without this, you are doomed. Thread here -

http://groups.google.com/group/android-building/browse_thread/thread/9d784fc702451c9f?pli=1

With this, the instructions in  device/ti/panda/README more or less work out of the box (exception: fastboot flash userdata and fastboot flashall both need -p panda specified). Without it you get stuck at the Waiting for Omap43xx... step because fastboot doesn't recognise the ID of the ES board, only the original Pandaboard.

If using HDMI, make sure to use the connector on the corner of the board - I saw SGX crashes trying to use the other.

The engineering build is sloooooow and would be a pain to develop with, I think; the user-with-root build is still a bit sluggish at first before code gets JITed but is usable.

Don't be tempted to try and format the SD card yourself, as I was before finding out how to get fastboot working; there's a script floating around out there called omap3-mkcard.sh, but for me it didn't produce anything I could boot. It has a bug in it, too - there's a section that goes like this:

if [ -x `which kpartx` ]; then
     kpartx -a ${DRIVE}
fi


If you actually have kpartx installed then this will 'hold onto' the partitions on your SD card, causing the filesystem creation code later to fail, every time, because the device is busy. Not sure how that got through testing, but it can be replaced with

sfdisk -R $DRIVE

Also, if you're adding udev rules so that you can use fastboot and adb as non-root, these are the entries needed for the ES (different from the plain Pandaboard) -

# fastboot protocol on panda (PandaBoard ES)
SUBSYSTEM=="usb", ATTR{idVendor}=="0451", ATTR{idProduct}=="d010", MODE="0600", OWNER="<user>"
# adb protocol on panda (PandaBoard ES)
SUBSYSTEM=="usb", ATTR{idVendor}=="0451", ATTR{idProduct}=="d101", MODE="0600", OWNER="<user>"

where <user> should of course be replaced with your login. On my (Kubuntu 11.10) system these appear to live in /lib/udev/rules.d and not /etc/udev

Hope this helps someone! I'm pretty happy with the board now it's running; I have accelerated 3d (make sure to get the latest 4.0.3 binary driver drop from the Nexus drivers page) and working wifi. No audio, but I can live with that for now. I was also pleasantly surprised by the build time - Google recommends an absolute monster of a build machine, but my fairly middle of the road triple-core machine compiled ICS from scratch with -j 6 in about two hours.









Thursday, September 29, 2011

Digging up some old code

A year or so ago I started writing my own little debugger, just basically so I could know a bit more about what's going on under the hood in such programs. It was a natural follow-on from writing my own compiler/linker, which I sort of stopped working on after I satisfied myself I knew how to write such a thing (and came across LLVM which does that sort of thing about a thousand times better than I was going to be able to do on my own). Life events got in the way of my doing too much with it at the time, but here's the code anyway for those who might be curious - http://hu.gs/~emily/debugtoy.tar.gz It uses the Linux ptrace() system call and my own code to parse ELF and DWARF binaries; there are libraries out there that can help you do the latter, but as I say I was doing this to learn my way around the format myself. It can attach to processes, halt/single-step them, display/edit registers and memory, figure out the name of the function you're in, and identify the line in the source code that corresponds with the current instruction pointer.