Writing workflow

So how do I write?

Well, that depends on how you mean the question. So for now, let's just talk about the mechanics. The tools I write WITH.

When you look under the hood, the EPUB ebook format is a zip file containing XHTML files. But nobody would write a book in handwritten XHTML. It would drive you crazy. One misplaced character could screw up an entire chapter. [X]HTML is not truly a language intended for humans. The only sane choice for writing in is some kind of actual office document application, be it MS Word, OpenOffice/LibreOffice, WordPerfect, or some walled-garden Mac app.

I actually remember a time when at least some Microsoft software was really pretty good. Word for Windows was paradigm-changing when it came out. But Microsoft software has become progressively more and more shit over the years. There was a bug in MS Excel that Microsoft knew about perfectly well, that went unfixed for over 14 years solely because too much other Microsoft code depended on that specific code being broken in that specific way, and nobody could be arsed to fix all the bits of wrong code. And then there's stuff like Windows 11's new CoPilot Recall feature, which creates a record of every single thing you do or look at on a Windows 11 PC that can be trivially exfiltrated — stolen — in seconds, by nearly anyone, with minimal difficulty.

Nope. No Microsoft crap. Not happening. And I've never been a fan of walled gardens, and I detest the Mac anyway.

So I write using LibreOffice, then export from LibreOffice into EPUB.

And therein lies a problem, because LibreOffice's EPUB export is both incomplete, and ... not as good as it could be. Also, because of the inefficient, lazy way that LibreOffice (and Word, and probably all the others as well) store text formatting information, the export ends up with a HUGE number of redundant SPAN tags in the XHTML. Places where a span of class X ends and is IMMEDIATELY followed by the start of another span of exactly the same class. Every time you make an edit, even to fix a single-character typo, LibreOffice wraps your edit in a new span. Word does the same thing. But just because everybody does it, doesn't mean it's not lazy and sloppy.

So the next step is to import that exported EPUB into Calibre, to add missing EPUB metadata like author and publisher, fix the table of contents that LibreOffice doesn't export correctly, clean up incomplete or redundant CSS, and occasionally hand-fix incorrect CSS, then optimize and clean up the HTML. (LibreOffice generates incorrect CSS for superscripts, for exam[ple, and often gets table column widths wrong.) Then, after that, I clean up the redundant SPAN tags with a custom tool that unpacks the archive, processes each XHTML section looking for redundant SPANs and combining them, and then repackages it all. That tool removes literally tens of thousands of redundant SPANs. This reduces the uncompressed size of the XHTML files by as much as 30% — which means they load faster in your ebook reader and take less memory.

And now we're almost there. One last step remains, to open up the EPUB in Sigil to have it automatically fix some of the missing XHTML metadata that the previous steps omitted. DOCTYPE declarations and the like.

And NOW, at last, after a final visual inspection via send-to-Kindle, it's ready for upload to Kindle Desktop Publishing.

Comments