Thursday, August 18, 2011

Calliope, a Mobipocket (Kindle) compatible ereader

I've lately been working on writing an ereader in Qt, with Android in mind. I let the project sit for a few weeks because of a combination of personal life events and bugs in the (very much still in development) Android port of Qt, but a couple of days ago I achieved the milestone of being able to read a commercially-released book on my Galaxy Tab.

Why work on an ereader, one might well ask? Amazon already have one for Android, and it's quite nice. Well, one thing that always annoys me reading ebooks is the somewhat....variable quality of the spellchecking; it seems quite common for them to be poorly spelled compared to printed books. This is a hundredfold more true of public-domain books from places like Project Gutenberg that have been OCR-scanned and tossed up on the site. How nice it would be, then, to be able to edit the book in situ and correct such errors. I plan to allow this, not by physically editing the ebook file (I'm wary of the legal implications of modifying a copyrighted work) but by storing an overlay file which in essence says 'word 20 of paragraph 32 should be 'mistake' and not 'mistaek''. As a side benefit, I can also provide a filter ability which can automatically convert all spellings of a word from one form to another - 'any time you see 'color', substitute 'colour'' - it always feels a little weird to me, as a Briton resident in the US, to buy a book by a British author whose spelling has been Americanised. I'm not yet at the point of doing any of this; right now I'm happy simply to be able to read the text as written, but I figured I might as well document what I've done to get to that point.

The reader, like the Kindle, reads Mobipocket files. These were originally designed to be read on Palm OS devices, and are stored in Palm-style database file. These consist of some headers and a series of blocks of data, each of which can be individually compressed with a form of Lempel-Ziv encoding (similar to what gzip uses, for instance). Mobipocket books consist of header information including optional tags for things like ISBN numbers, followed by the blocks that comprise the actual book (compressed, each block limited to 4k compressed size), followed by any images, each in its own block (required to be GIF format). The book text itself is HTML 3.2 with some customisations for purposes such as referring to images. The first image in the book is by convention the book cover. Books bought from Amazon are encrypted; while that encryption has been cracked and I could probably add support for reading them, I haven't because I'm not sure of the legal implications, even though it would only be used for books I've legitimately purchased.

I've written a 'bookshelf' that looks in a standard directory (currently fixed by platform; ~/Documents on desktop Linux, /sdcard/kindle on Android), sniffs all files in it to see if they're ebooks it can read, and displays those books in a list. Choosing one fires up a Page widget which actually displays (currently) the img and p tags from the book.

Doing this is more involved than one might think. Firstly there's the challenge of turning those compressed buckets into an uncompressed text stream. I accomplish this by writing a custom QIODevice (the underlying abstraction for Qt's files, network sockets etc) which wraps the document text and uncompresses it on the fly. Since it's a QIODevice I can also easily write the uncompressed text to disc. The Mobipocket header includes information on the character format used in the book (generally either Latin-1 or UTF-8) so I can construct a QTextStream that will do the appropriate conversions.

The book being HTML 3.2, Qt's inbuilt XML parser naturally chokes on it (I really can't blame it), so I had to write my own simple HTML parser. I've written it in more of a SAX than a DOM style (i.e. it parses the HTML stream incrementally rather than all at once) because by the nature of the thing an ereader is only concerned with displaying a small part of the document at a time; parsing the whole book on opening it would both take time at startup and needlessly consume RAM.

The stream is then split into Elements, each representing a block to be displayed on the screen; currently image and paragraph elements, the latter consisting of string fragments, which are collections of words with the same attributes (bold, italic, etc). Each element can report a size to the page layout algorithm and render itself; paragraph elements are careful not to render text where it would be partially cut off by the bottom of the page. Currently, the page renders as many elements as it can before hitting the bottom of the page; hitting the right of the window to go to the next page moves them 'up' by the height of the page then resumes rendering beginning with the element that was at the end of the last page.

Unfortunately, since I don't want to keep all elements parsed forever, I don't really have a good way to go back at present; an ereader that only goes forwards is a bit limited! I'm trying to figure out both how to handle this and how best to store position within the book, bearing in mind these problems -

- The page can be resized both during reading and between runs of the application (including on Android; think of rotating the tablet, for instance). These will cause paragraphs to reflow, taking up more or less lines on the page.

- There are ways to put pagebreaks into the book, which I do not yet support but will need to to handle, for example, the end of chapters.

I think the way I'm going to approach it is by viewing the book as a very very tall virtual screen of fixed width; so elements will have a virtual y coordinate starting at 0 and going down all the way to the end of the book, divided into pages every height-of-window pixels. The page itself acts as a window onto this virtual screen; going backwards will involve putting the page back by its height, then reparsing the book from the beginning until reaching elements that are visible on the page. I'll have to evaluate this for speed; it may be worth cacheing the previous page or two's elements since generally readers don't go much further back than that.

As for resizing, as well as storing the virtual y coordinate of the page, I'll keep track of the topmost element on the page (and in the case of a paragraph the word within it). Resizing involves repaginating and parsing from the beginning until that element and word appear on a page, then displaying that page.

Source for the app is at https://github.com/jotheberlock/reader