Wednesday, November 24, 2010

Sigil 0.3.2

You can get the release in the downloads area. Changelog follows:

  • added a new toolbar button for turning Tidy cleaning on/off; this option is also available from the Tools menu (issue #553)
  • added support for TrueType Collection fonts with extension TTC
  • InDesign (as of CS5) refuses to list the fonts it embeds in the OPF manifest of the epub files it exports, even though the epub standard demands it. This causes Sigil to not pick up these fonts when opening such epub files. A workaround has been added that will detect such problematic epubs and then load the font files.
  • worked around a Tidy bug that added blank lines to the start of <pre> and <style> elements (issue #655)
  • fixed a rare crash issue when loading epub files

All of these fixes have been in the repo for a while now; I wanted to get a few more things in before pushing a new release, but I’m so pressed for time right now that I know I won’t be able to do it any time soon.

The most notable changes are a new toolbar button for turning off Tidy cleaning. A bare-bones version of Tidy will still run to make sure your source is valid XHTML, but there is no element rewriting or CSS extraction etc. When you turn it off, it’s off for loading, view switches… everywhere.

The second notable change is a workaround for InDesign’s crappy epubs with embedded fonts. Since ID doesn’t list them in the manifest, previous versions of Sigil wouldn’t pick them up during import (that’s what the standards say should happen). 0.3.2 will notice when such ID epubs are loaded and pick up the fonts.

Same thing goes for TrueType Collection fonts; ID puts them in the epub even though the standard only allows TTF’s and OTF’s… Sigil will pick up TTC’s too now if your epub has them. You shouldn’t use them, but Sigil shouldn’t silently drop them if you do. The choice should be up to the user.

With these and previous font de-obfuscation changes in place, support for InDesign epubs with embedded fonts should be fairly complete.

Thursday, November 11, 2010

FlightCrew 0.7.1

The release is here. Changelog follows:

  • added an automatic update checker to the GUI app
  • the GUI now displays a "No problems found" message when the epub passes all checks (issue #9)
  • fixed an issue with missing XHTML files causing the GUI to show a dialog about an std::exception and the CLI to report that the epub itself was not present (issue #8)
  • fixed an issue that was causing empty error messages for incorrect uses of XML encodings (issue #5)
  • fixed an issue with anchor links to the current file (links with fragments only) incorrectly throwing errors in the reachability analysis (issue #3)

Besides some nice bug fixes, you also get two nice features: a message is now displayed when no problems have been found (lots of people asked for that one… and I agree, it should have been there from day one) and an update checker for the GUI app. The update checker is the same module used in Sigil, so it works the same way. It will notify you when you start the app that a new version exists if it detects one on the server. The check is only performed if the last one was more than six hours ago.

This of course means that FlightCrew-gui will now access the Internet when you start it.

Also note that the installers now install everything in a “FlightCrew” folder by default, and not in “FlightCrew-gui”. This means that FC 0.7.1 will not overwrite an installation of FC 0.7.0. You should uninstall the old one before installing the new one, otherwise you’ll end up with two different versions of FC on your computer. It won’t prevent either from working correctly, it’s just annoying. Smile

Mac users will see that there is now a “FlightCrew-cli” application along with “FlightCrew-gui.app” in the DMG. It should have been there in the first release, but I forgot to add it. Such is life.

Sunday, November 7, 2010

0.3.1

You know the drill. You can get the new release here, and the changelog follows:

  • added a new "Font Obfuscation" context menu for font files in the Book Browser; the user can now select (or de-select) the use of Adobe's or the IDPF's font obfuscation methods; this also resolves the problem where Sigil refused to open epub files that use such obfuscated fonts with the message that the epub has DRM
  • fixed a validation issue caused by Sigil using "application/x-font-opentype" as the OPF mimetype instead of "application/vnd.ms-opentype"
  • fixed a crash when opening the TOC editor for some epub files (issue #654)

There are two reasons why you are seeing a new release so soon: the first one is an unfortunate bug that causes a crash when opening the TOC editor for certain epub files. While rare, it can still happen and frankly, it’s not rare enough. This needed to be fixed pronto.

The second reason is the recent introduction of the “this epub has DRM and cannot be opened” message for epubs that have an “encryption.xml” file in the META-INF folder. The message is valid since Sigil can’t open encrypted epub files (nothing can without the key, that’s the sad point of DRM), but unfortunately Adobe InDesign obfuscates embedded font files. This, naturally, creates an entry in the “encryption.xml” and now Sigil refuses to open your epub. And the only thing you did was tell InDesign to embed a font.

As far as I know, it’s not possible to tell InDesign not to obfuscate embedded fonts. It does this by default. The sad thing is that most people don’t know this is happening; they just embed the fonts and since they appear to work fine when the epub is opened with ADE, they call it a day. Personally, I have a problem with this: if Adobe wants to placate the font foundries with some patently absurd font mangling scheme, fine. But give the user a choice of turning this crap off. Or at the very least tell them it’s happening. If I were Adobe, I’d spend less time thinking up such nonsense and more time improving their epub export option since it’s currently… somewhat unusable[1].

Don’t get me wrong, I know perfectly well why they want fonts obfuscated: licensing issues. Most font licenses don’t allow embedding a font in a way that the font itself can be easily extracted from the distributed file. Font obfuscation is thought to solve this.

It doesn’t.

De-obfuscating a font mangled in such a way is laughably trivial; a twelve-year-old could write a program in 10 minutes that de-obfuscates the font files. The same thing goes for the IDPF’s method. It doesn’t prevent anyone from doing anything. While I find the whole thing ridiculous, I understand some want to give themselves at least an illusion of legal justification. All I’m saying is that Adobe shouldn’t be forcing people who know better into using something so silly.

Little do the users of InDesign know that this scheme makes their epub files invalid and that such fonts will only work in Reading Systems that support this. It works in ADE since Adobe naturally made sure that their RS supports their font obfuscation method. While I personally think that font obfuscation is utterly pointless, lots of Sigil’s users have epubs coming from InDesign… So Sigil has to cave.

You will notice that there is now a new right-click context menu entry for font files in the Book Browser. Under the “Font Obfuscation” sub-menu, you will see two actions: “Use Adobe’s Method” and “Use IDPF’s Method”. These select which font obfuscation method to use on the font files.

By default, the state in which the font file was found when the epub was opened is preserved when saving; so if you open an epub file with InDesign-embedded fonts, you will see a check-mark next to  “Use Adobe’s Method” in the  “Font Obfuscation” sub-menu. This means that the obfuscation is in place and will be preserved after a save. You can also click on “Use Adobe’s Method” to remove the check-mark, thus un-obfuscating the font file (I highly recommend this). Same thing goes for  “Use IDPF’s Method”; clicking on a checked method unchecks it and removes the obfuscation.

Since Sigil now supports font obfuscation (and de-obfuscation), you will not see that “cannot open because of DRM” message for epubs that use such fonts.

Don’t forget that InDesign still[2] flat-out refuses to to list the embedded fonts in the OPF file, even though the specs say it has to. That means the fonts won’t be picked up when the epub is opened in Sigil. I’ll probably add yet another InDesign workaround down the line to handle this.

Footnotes

[1] That’s the most restraint I’ve shown in a year.

[2] As of CS5. I just checked.

Thursday, November 4, 2010

0.3.0 FINAL

So here it is, the final version of 0.3.0. The changelog from 0.3.0 FINAL follows. Don’t forget that if you’re coming from 0.2.4, changes from RC1 and RC2 also apply.

  • root rights no longer needed to install on Linux
  • fixed an issue with some child headings being attached to incorrect parent headings in the TOC if the parent was marked as "not in TOC" (issue #476)
  • fixed an issue with some UTF-8 characters outside the BMP (usually CJK characters) not being saved properly (issue #180)
  • fixed an issue with certain path types not being correctly updated as a result of the fix for issue #501 (issue #561)
  • the Book Browser now prevents adding files that already exist in the epub
  • previously, when adding external XHTML files through the Book Browser, any files (like CSS stylesheets or images) that were linked from that file were included in the epub under a different name if their original name was "taken"; this caused duplicates so this behavior has changed: files whose names are "taken" are now skipped over (issue #482)
  • fixed a rare issue that caused incorrect path updates for anchor links pointing to file names that were suffixes of other chapter file names, and the anchor had a fragment ID (issue #598)
  • fixed an issue with the image paths in background-image CSS rules not being updated when the path changes (issue #594)
  • Sigil now informs the user that DRMed files cannot be opened, instead of just crashing (issue #624)
  • this time *really* fixed the "acknowledgments" error that was reported as fixed in RC1
  • fixed a crash on load (with an error dialog) issue on Linux systems occurring when multiple users on the same machine tried to use Sigil (issue #642)
  • fixed a randomly occurring crash, usually triggered on Macs during loading (issue #643)

And now I’m off to give FlightCrew some much-needed love.

Friday, October 8, 2010

Sigil 0.3.0RC2

Well that was fast.

A file-corruption-on-save issue was detected in RC1 after it was published. It was taken down until the problem was resolved, and now it has been.

RC2 is now available for download.

Sigil 0.3.0RC1

I’ve just released Sigil 0.3.0RC1. Please note that this release include some major changes under the hood, and as such may not be very stable. From the test I’ve made, it works great. Any problems I’ve found I’ve quickly fixed. But since this release brings a new version of Qt and completely replaces the internal XML DOM provider from QDom to Xerces, there are bound to be some regressions I’ve missed. Bear that in mind.

Here’s the changelog:

  • fixed a validation issue caused by using the American spelling for "acknowledgments" where the OPF spec uses the British "acknowledgements" (issue #611)
  • Sigil now uses "application/x-font-ttf" as the media type in the OPF for TrueType fonts (issue #609)
  • on Mac OS X, the universal build now includes an x64 version of Sigil, and builds now use Cocoa instead of Carbon; support for Mac OS X 10.4 is dropped along with support for PowerPC Macs
  • fixed a problem with opening files from the Ubuntu "Open With" menu (issue #524)
  • made Tidy handle common user errors like "&co." in the HTML source instead of "&amp;co."
  • fixed a rare Tidy bug with disappearing spaces when the only whitespace in an element was the newline following a start tag (issue #387)
  • changed the internal DOM engine from Qt's QDom to Xerces; this should also bring numerous bug fixes and performance improvements plus a small (~10%) decrease in memory consumption (issue #367)
  • updated Qt to 4.7: this should bring a 400%+ performance increase in Book View rendering along with countless smaller performance improvements and bug fixes across the board
  • switched to the MSVC 10 compiler for Windows releases; should bring ~5% general performance improvement
  • fixed several crash/error problems relating to opening, saving and modifying epub files which have onerous file permissions set for internal content files (issue #574)
  • added a workaround for broken epubs created by other epub-producing software which caused a crash on certain searches with the Find dialog (issue #548)
  • fixed a problem with Book Browser's "Merge with previous" action if a file was previously deleted from the Book Browser (issue #565)
  • fixed a problem with chapter splits being placed in the wrong reading order if a file was previously deleted from the Book Browser (issue #497)

Performance improvements all around, plus some major fixes. Mac and Linux users would sometimes get a “permission denied” error when opening an epub with Sigil; this was caused by badly constructed epub files. A workaround has been implemented so you should not see this anymore.

Mac users also get an x64 build in the universal binary, and this should bring another 10% performance improvement to people who have a Mac that can support this architecture. Support for Tiger is dropped along with support for PowerPC Macs.

Linux users get some custom love too: you should now be able to use Ubuntu’s “Open With” menu with Sigil.

If you see any regressions, please report them ASAP. Sigil 0.2.4 stays the “official” versions until any major problems with 0.3.0 are resolved.

Saturday, October 2, 2010

Introducing FlightCrew, the epub validator

I’ve been talking about this for a while under the name of “that epub validating library”, and now it has a name. That name is FlightCrew.

It’s a C++, cross-platform, native code epub validator (it’s also open source). The project is composed of three parts:

  1. FlightCrew, the validation library;
  2. FlightCrew-cli, the command-line front-end to the FlightCrew library;
  3. FlightCrew-gui, the GUI front-end to the FlightCrew library.

There are installers and DMG’s for download that package FlightCrew-gui, which provides a nice GUI interface to the underlying library. Errors have a reddish background (ok, it’s pink), while warnings have a yellow one. Here’s a screenshot:

I’ve kept the interface to a minimum on purpose. There’s something to be said about simplicity. As Antoine de Saint-Exupéry said: “Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.”

You can also drag files from your desktop or file browser and drop them on the FlightCrew-gui window. This will instantly run validation checks on it. This drag-and-drop interface works on all platforms.

FlightCrew-cli is included in all packages of FlightCrew-gui, and does the same thing as that application only from a command-line interface. It works the way epubcheck works—feed it a file, it spews out warnings and errors if necessary.

The current version number for all this is 0.7.0, which I’m using as a sort-of indication of it’s completness. I started working on this back in July, but since there was a rather lovely summer between now and then, it has only had about a month and a half’s work put into it. It’s still roughly 20 KLOC with a complete test suite, so it’s no slouch.

Why FlightCrew is better than epubcheck

First off, “better” is a dirty word. Each tool has its pros and cons. Epubcheck’s (EC) advantage is that it checks for a few things FlightCrew (FC) doesn’t (yet). But the reverse is also true: FC checks for a lot of things EC doesn’t. Off the top of my head, FC performs an extensive reachability analysis and will warn you if you have some resources listed in the manifest that are not used anywhere. It will also report an error if you have an OPS1 document that the user can reach—through the <guide> or <tours> element, the NCX or just normal links in the text—but that is not listed in the <spine>. This is one crucial mistake that can now be caught. Reachability analysis also catches files that are used but not present in the manifest.

There are many other things that FC will check for that EC will not, and most of those you care about deeply and just don’t know it. The things EC checks for that FC doesn’t? Two “big ones”: OPF-listed fallbacks and DTBook syntax verification. If you haven’t heard of either, then you’ve never used them, never will and probably shouldn’t. These are very rarely used features of epub that I have personally never seen used in practice. But they’re big parts of the epub specifications so FC should check for them (and will, fairly soon) for the sake of completeness. There are a few other odds and ends that EC looks for but FC doesn’t.

But here’s where FC blows EC out of the water…

Error reporting done right

Let’s pretend I don’t know most of the epub specs by heart and that I’m a newcomer to epub. I made my first epub book, and I’ve heard that I should validate it. I’ve downloaded both FC and EC and now I’m going to use both. I’m going to use EC first because I’ve heard it’s “what the pros use”. Note that the a pair of EC/FC examples refers to the exact same problem with the file, and the messages usually come with line numbers (unless otherwise specified) which have been omitted. Commentary has been added for the sake of ridicule.

EC: length of first filename in archive must be 8, but was 19 [no line number, ed.]

Um… what? WTF is that supposed to mean? What filename? And why must it be exactly 8? What the hell are you talking about?

FC: Bytes 30-60 of your epub file are invalid. This means that one or more of the following rules are not satisfied:
  1. There needs to be a "mimetype" file in the root folder.
  2. Its content needs to be *exactly* "application/epub+zip".
  3. It needs to be the first file in the epub zip archive.
  4. It needs to be uncompressed. [no line number, ed.]

Ah… not only does this point out the problem (correctly!), it also tells me how to fix it. Nice.

EC: required attributes missing

Huh? I understand that you’re trying to tell me that some required attributes are missing (one or more? you haven’t said), but how about telling me which ones you frigging bastard. Am I supposed to read through the entire XHTML specification, hunting down which attributes this element should declare or even know that I’m supposed to do exactly that?

FC: missing required attribute 'alt'

Thanks! That was awesome. Saved me a ton o’ hassle.

EC: unfinished element

Exceedingly useful, that. Mind telling me how it’s unfinished?

FC: The <title> element is missing.

Now that’s more like it. I’d kiss you if I could.

EC: unfinished element

Didn’t you just say this? And on the exact same element? Why am I getting this again, I thought I fixed this…

FC: The <identifier> element is missing.

You just keep getting better and better!

EC: unfinished element

Fuck you, epubcheck. Fuck you.

FC: The <language> element is missing.

Want to pet my cat? ‘Cause you’re awesome, and I only let awesome things pet my cat.

Back to reality. I hope you got the point, cause if you haven’t, I can pull out tons of other examples.

Now I know that most of the error messages from EC are actually coming from an internal component of it called Jing and that has crappy error messages, but as a user I don’t care. Adobe should use something better instead, or fix Jing. And lots of the crappy messages come from EC core; that little “length of filename” gem was all theirs.

Have I also mentioned that EC development is pretty much dead? From it’s public source repository, it has had a whopping one source code commit in the last ten months, and that commit was four days ago.

In short, use FC first, then EC to get some of the checks FC doesn’t (yet) perform. After FC becomes a strict superset of all EC functionality (roughly a couple of months), drag epubcheck down to the cellar and shoot it in the back of the head.

Or just stop using it, if you prefer.

Footnotes

[1] A funny way of saying HTML document. It’s more than that of course, but for now mentally replace “OPS” with “HTML” .

Friday, October 1, 2010

Mac OS X 10.4 support is going the way of the dodo

Just a quick note to people who may still be using Sigil on Tiger: 0.2.4 was the last version to feature support  for that operating system. Why? Many reasons. Let’s start with a  few:

  • Sigil 0.3.0 will use Qt 4.7, and Nokia is slowly discontinuing support for Carbon with this release. It’s still there, but it’s second-tier.
  • So far, Sigil was offered in 32bit versions for PPC and Intel architectures on Mac (as a universal binary). I want to provide 64bit versions, and for that I need to use the Cocoa version of Qt, which doesn’t exist for Tiger.
  • From site statistics, about 4.5% of all Mac users of Sigil use Tiger. 73% use Snow Leopard. These people would like explicit 64 bit support, and could certainly put it to good use. As previously reported, Sigil runs about 10% faster on such architectures.

There are a few other reasons.

In short, this is the right way to go. I know some people will be seriously pissed off at this, but I have to think about the majority of the users. Sigil 0.2.4 will be indefinitely available for download on the project site (as with all versions of Sigil), so you can always just continue using that.

Tuesday, September 21, 2010

So Qt 4.7 is out… where’s Sigil?

As the title says, Qt 4.7 is out. The trunk version of Sigil has been running on Qt 4.7 RC1 for weeks now, and any problems that I’ve noticed I’ve already fixed. But it’s not all roses…

I’ve also transitioned to Visual Studio 2010 and the MSVC 10.0 C++ compiler… and it apparently miscompiles Qt 4.7 when compiling as x64. Here’s the bug report on Nokia’s tracker. To be clear, this is not a Nokia bug, but a Microsoft bug. MS apparently has a hotfix already, and it’s about to be released any day now. They’re testing it to make sure it doesn't break most of the world’s software. :)

What does that mean for Sigil? Well I was hoping that the hotfix would be released before Qt, but that didn’t happen. So the release of Sigil 0.3.0RC1 is delayed for the time being. There were a few more bugs I wanted to fix anyway.

I’m going to give MS about a week more to release the hotfix; if it’s not out by then, Sigil goes back to MSVC 9.0 (the same version used for all Sigil releases so far) until it is. 

Sunday, September 5, 2010

Typing like crazy at 25wpm

So what have I been doing for the last few weeks? Two things: working furiously on the epub checking library and trying to transition from the QWERTY keyboard layout to Dvorak (touch typing, of course).

You have no idea how hard the latter is.

I tried to switch a couple of times this year, but failed both times. The problem is that it takes you about two to three weeks of regular, daily practice (we’re talking a couple of hours a day just typing) to get to something like 50wpm. During that time, you’re crawling like a snail. It’s absolutely insufferable when you’re used to your thoughts just flowing out of your fingers at 60wpm. And if you have something with a deadline, something you just have to get done, forget it. You’re either not going to do it, or you’re going to switch back to QWERTY. And that will kill you.

You absolutely have to go cold turkey on your old layout to be able to switch to a new one. Trying to “ease yourself into it” doesn’t work; believe me, I’ve tried twice :). You basically have to rewire your brain to the new layout, and reminding it of the old one repeatedly undoes everything you learn.

So this is a whole new world of frustration for me. But it’s worth it in the end, it’s much more comfortable to type this way. My hands used to hurt after a long day; they don’t anymore. Others have told me that after you really get used to it (a couple of months), you type about 30% faster than you did, and without pain.

But enough about that. Epub!

The epub checking library[1] is progressing quite nicely:

  • OPS checking is done (including SVG Full and OPS <switch>), except for the DTBook syntax[2].
  • OPF checking is about 90% done with fallbacks remaining plus a few other checks.
  • OCF checking is done for all six META-INF XML files as it pertains to schemas.
  • NCX checking is done at the schema level; link checking remains.
  • The CLI and GUI clients have been designed, but no code has yet been written.

Note: “done” means done to the point I’m aiming for in the first release, and which includes most of the things you care about.

All in all, it’s going well. My last “class-based” semester is starting tomorrow, so naturally things are going to slow down now. Otherwise I’d be done in a week. With classes, it’s more likely to be three weeks, barring unexpected complications.

Also, Qt 4.7 is at the RC stage, and I’m aiming to integrate Xerces as a QDom substitute for the next release of Sigil to go along with the new Qt version. So that will eat at least a week too.

Footnotes

[1] It has a name now, but for a few fairly ridiculous reasons I’m not willing to reveal it until the first release.

[2] Because nobody uses it! AFAIK it’s also scheduled for deprecation in epub 2.1. Validation for this syntax will eventually make it in for the sake of completeness, but not for the first release.

Thursday, August 12, 2010

0.2.4

I just pushed a new release. Here’s the changelog (highlights bolded):

  • fixed a problem with updating image paths for images with the same filename but coming from different parent directories (issue #501)
  • added a new "Merge With Previous" context menu action for XHTML files in the Book Browser (issue #265)
  • changed Tidy to handle the common typing mistake of ending entities with a ':' instead of a ';' (issue #535)
  • fixed a bug where double-clicking a file in the Book Browser for a file that was already opened in a tab switched that tab back to Book View; the tab now retains whatever View it was in previously
  • newly opened tabs now default to the View of the current tab (issue #468)
  • re-engineered the locations where Sigil stores its work files; the system-provided temp folder is now used; this should alleviate some permissions issues on certain machines, especially Macs (issue #404)
  • Sigil now prevents the renaming of files in the Book Browser to file-system invalid names (issue #493)
  • changed the keyboard shortcut that opens the Replace dialog from Cmd+H to Cmd+Shift+F on Macs only; Cmd+H is used by Mac OS X for window hiding (issue #477)
  • fixed an issue with Sigil using XHTML 1.0 for OPS doctypes, instead of XHTML 1.1 (issue #503)
  • several files can now be marked as having the Text semantic type (issue #522)
  • fixed an issue with Direction: All in book wide searching skipping last XHTML file (issue #520)
  • fixed an issue with the declared XML encoding not being picked up if it was wrapped in single quotes instead of the more standard double quotes
  • fixed an issue where the user could avoid the warning dialog for book-wide searching in Book View if he switched to this mode in Code View, and then switched back
This is mostly a bugfix release, with the notable exception of the “Merge With Previous” feature. Lots of people asked for that, so there it is.

I’ve also changed the location where Sigil stores its work files since it was causing an error when starting on certain Mac machines. I’ve also changed the “open Replace dialog” shortcut from Cmd+H to Cmd+Shift+F. You can now use Cmd+H to hide Sigil windows. :)

Sunday, July 25, 2010

Vacation time over, get back to work!

So I just returned from vacation.

I didn’t write a single line of code for the never-ending doxygen comment conversion, but I did end up working quite a bit on the epub validating library. It’s progressing quite nicely, it’s already a couple of KLOC. I’m also doing it TDD-style which I’m finding to be a rather pleasant workflow, once you get used to it. Seeing a test fail and catch a regression I wouldn’t notice until after it was too late brings a smile to my face. On a related note, gtest and gmock are awesome.

Oh, remember when I said that writing XML validation checkers by hand instead of using schemas would be painful? Guess what, it is. It really, really is. :)

One of the other benefits of working on this library is that I’m solving all sorts of problems with Xerces integration. The library uses Xerces for everything XML, and working with it has given lots of insights that will be applicable to Sigil when I start replacing QDom with it.

I’ll be mostly working on this library over the coming weeks, so Sigil will see only bug fixes going forward. Hey, the library is the major future feature for Sigil since it will be integrated into it, so transitively I’ll still be working on Sigil all this time.

Qt 4.7 is just around the corner too[1]. When it does arrive, you can expect a version of Sigil integrating it very quickly, provided there are no problems migrating to the newer Qt[2]. The QtWebKit improvements alone make me giddy like a little schoolgirl. 4x faster rendering? Hell yeah!

Footnotes

[1] Actually I expected it to be released while I was on vacation. Or at least QtWebKit 2.0. Nokia devs said that was coming in May, and it’s the very end of July now… grumble

[2] I seriously expect no problems. Nokia has always been adamant about backwards compatibility, and from past experience I can say they usually do a good job on this.

Thursday, July 8, 2010

Vacation and validation

The semester is finally over so I’ll be heading on vacation tomorrow. I’ll be gone for about two weeks, and in that time no issues will be examined, no code will be written etc. Well, not really. In general, I won’t have internet access but I may pop in from time to time to check the Sigil forum on MobileRead. No promises though.

While I’ll be spending most of my time lying on the beach, soaking up the sun with a comfortable book[1] by my side, I’ll also be doing some light development work. By “light” I mean finishing the doxygen comments conversion for Sigil (which has been taking forever) and some code for the new epub validation library I’m writing. Not a lot of code, just a bit.

What’s that last part about? Well what’s the most requested feature for Sigil on the tracker? It’s integrated epub validation. I’ve planned for months to start working on this as a separate project in August, and I’ve just started doing some architectural design for the library[2]. The dependencies are in place, the build system works across platforms and I have a pretty good idea of how things are going to mesh together.

The project will in fact be composed of the following:

  • the main epub validating library, written in portable C++;
  • a CLI application that uses that library (think epubcheck);
  • a GUI application with pretty buttons and drag-and-drop support.

I know a ridiculous number of people who don’t validate their epub books just because they are too scared of the command line to use epubcheck. While I think that’s absurd and childish, I can’t deny that having a simple GUI would be easier for the average user. And if the GUI supported dragging an epub file from the desktop and dropping it on the app window to initiate the validation, well that would be very useful. It could certainly speed things up.

The goals of the project are as follows:

  • check for everything epubcheck checks for, and much more;
  • be easily embeddable into native code applications (no frigging Java);
  • be easy to use and easy to understand: “unfinished element” means diddly-squat to people who don’t already know what the element is supposed to contain[3];
  • have developers that are active and responsive (and an active development process in general);
  • optionally warn about valid epub constructs that cause problems for certain high-profile Reading Systems.

That last part is interesting. We all know ADE has many quirks, and so does the iPad and other RS’s. Wouldn’t be lovely if you could instruct your epub validator to check for at least some of those problems as well?

I think that would be fairly useful.

This library will then be embedded into Sigil, and a simple toolbar button press will validate your epub and report any problems.

Before anyone gets ahead of themselves, bear in mind I won’t hit all those goals for the first release. It will be incremental and it will get better in time. But the scope of this project is thankfully much smaller than Sigil’s, so that’s a big relief.

Footnotes

[1] Currently it’s The Stand by Stephen King. That book has like a billion pages. It’s probably not helping matters that I’m reading the extended version.

[2] A software library, not a building. There was a misunderstanding about this a couple of days ago, and needless to say I laughed my ass off.

[3] This will entail writing some of the internal validators by hand and not resorting to schemas. Painful, yes, but necessary. Not just because of usability, but correctness too. For instance, a schema can’t check that the files listed in the manifest are actually in the epub. It won’t tell you if you’ve included the same file multiple times with different ID’s. There are many examples, most of which are checks that are way more important than an ID starting with a number.

I won’t do this for everything (I still intend to use an XML Schema for XHTML validation), but I will do it for the OPF. Most validation problems are there anyway. The OPF is the heart of an epub.

Tuesday, June 22, 2010

0.2.3

Bumpy ride lately. Changelog follows:

  • fixed an issue with the new data from one view sometimes not being saved in the final epub when switching to the other view
  • worked around a Qt focus issue causing current tab data to sometimes not be saved; this was uncovered by fixing the hang-on-save issue, which was caused by the same underlying problem (issue #466)

The work in 0.2.1 has caused some unfortunate synchronization issues.

Problem #1: You work in Code View, make some changes and switch to Book View. You see the changes transfer to BV, then save the epub. Opening the epub, the changes are not there.

This was my fault and should now be fixed.

Problem #2: You work in Book View (or Code View), make changes and save. Sometimes the changes do save, sometimes they don’t. Same thing goes for the TOC editor sometimes not seeing your changes, or the Find/Replace dialog messing things up and losing data.

This is caused by a “lost” focus event. I’ve observed this only once on Windows 7, but it’s reportedly much more frequent on Win XP machines. Linux and Mac machines seem to be immune (although similar issues caused by problem #1 can masquerade as this problem). This seems to be a Qt issue, and I’ve now worked around it. The same underlying “missing focus event” issue was causing the previous hang problems.

There was a third problem reported: you make changes in CV, then switch directly to BV. Any changes are now gone. I’ve only had one user report this, all other reports appear to be caused by problems #1 and #2. I’m still unable to reproduce this, no matter what I do or what machine I use. If you have this issue with the new 0.2.3, please report it ASAP. Use the issue tracker.

I’ve been stealing away hours from my other obligations to work on Sigil since these are major problems, but man, doing this has already started to haunt me…

Sunday, June 20, 2010

0.2.2

This is a very minor bugfix release. Changelog follows:

  • simplified the resource locking mechanism; should eliminate the hang-on-save issue
  • fixed a problem with Book View chapter splitting sometimes not being registered on save, causing duplicate content (issue #450)

I’m trying to leave a rather stable version of Sigil before going on my coding break, so this release is important. Users have reported a higher chance of encountering the hang-on-save problem, and now I've completely changed a fundamental aspect of the resource locking architecture and also minimized all the critical sections so this should really be fixed now. If it’s not… then it’s caused by something completely unrelated. Which is possible, seeing as how I can’t reproduce the issue and thus get an exact fix on the culprit. The bug is not deterministic, which clearly points to the threading-enabled code.

The second major bug was caused by a line a code I forgot to add after some recent refactorings.

If you see the hang-on-save issue with this new release (or any other kind of hang bug, or any other kind of bug at all :) ), please report it ASAP with as much detail as possible.

Friday, June 18, 2010

0.2.1

I’ve just pushed 0.2.1, which is mostly a bugfix-only release. Changelog follows:

  • XHTML files that specify two different encodings are now fixed by removing the incorrect one
  • Sigil now checks the XML encoding attribute for an encoding before the HTML metatag and charset; should now be more compatible with Calibre created epub books
  • created/used 16px versions of all icons; icons in menus are not blurry anymore (issue #121)
  • the Find&Replace dialog now uses the currently selected text (if any) as the default search term (issue #370)
  • fixed issues with unnecessary reloads of the code view (issue #412, issue #398)
  • fixed an issue with the HTML file filter in the open file dialog not correctly filtering files (issue #416)
  • fixed an issue with files without extensions not being saved in the final epub (issue #400)
  • fixed an issue with XPGT resources sometimes being saved blank (issue #433)
  • fixed a regression that made it impossible to add removed headings back into the TOC (issue #439)
  • fixed a problem with some file-wide replacements reverting
  • fixed a problem with the opened tabs not being updated until the user gave them keyboard focus when a file-wide replace was performed (issue 408)
  • fixed a problem with Book View not reflecting changes done in Code View when the Code View was used for editing, and then the tab closed
  • use of custom synchronization primitives should resolve most infrequent hang bugs
  • fixed a problem with the search not progressing in Book View find&replace when using recursive replacements
  • fixed an issue with inserting images that have apostrophes in the filename (issue #391)
  • TOC text now has leading and trailing whitespace trimmed, and inner whitespace condensed (issue #422)
  • an empty ALT attribute is now added to "img" elements that don't have them (issue #406)
  • added the build time to the About dialog, showing date and time in UTC

Remember when I said that 0.2.0 was surprisingly free of major bugs? I spoke too soon. :)

The major issue was that replacements performed with the Find&Replace dialog could sometimes revert back. Horrible, I know. This should now be fixed.

The next major issue actually goes way back to the start of 0.2.0 betas, I just couldn’t track it down: Sigil could sometimes completely hang. This usually happened during saves, and should also be fixed now. I’ve written some custom synchronization primitives[1] that should alleviate these problems. This also involved rewriting the way tabs release their locks etc. Lot’s of nice things. Should work now.

And the last major problem was a regression that made it impossible to add removed TOC items back in.

Footnotes

[1] Nothing too fancy, I needed a ReadWriteLock that was shallow and non-recursive. Qt provides one that is non-recursive by default, but not shallow. By “shallow”, I mean a lock that allows only one level of locking by the same thread, thus silently letting multiple lock calls succeed, even though only the first one did. Same thing goes for unlock: the first call unlocks, the others just skip. The lock and unlock calls are appropriately called LockIfNeeded() and UnlockIfNeeded().

Admittedly, a shallow lock is rarely needed, and if you’re using one, you damn well better know what you’re doing.

Monday, June 14, 2010

A brighter future

Well it’s been quite a while since the last post. I’ve been busy with university work, and while I have one more paper to submit in a few days and finals in about ten, I have a bit of free time now.

I’ll be working through the bug reports I received since 0.2.0 final went live. I must say I’m pleasantly surprised that nothing major was reported. I’d like to thank all the people who have written thorough bug reports. I haven’t had time to go through them all and respond appropriately, but I’m getting there.

Sigil 0.2.1 will probably come sometime in late July/early August. While I have a few days free this week, I’ll be studying the following three weeks and immediately after that I’m going on a two-week vacation.

There’s an ever-so-slight chance that I’ll push a very minor bugfix release in about a week and postpone all the work I wanted to do for 0.2.1 into 0.2.2.

I’ve also started work on replacing all the uses of QDom with Xerces-C++ in a separate repo, but that’s going to take quite a bit of time.

What I’ve really spent a lot of time with over the last several days is QtWebKit. There was a bug on the WebKit Bugzilla for the general QtWebKit performance problems… When the bug was reported, my testcase on one machine took 55 seconds to load and render. For Qt 4.7beta1, Nokia got that down to 21 seconds. With the trunk version of Qt 4.7 from a couple of days ago, it’s down to 13.3 seconds. Bottom line, they’ve done some good work speeding it up, and some more work still remains. It’s not native-WebKit or Firefox fast, where the testcase renders instantaneously, but it’s improving.

I downloaded the Qt trunk source a few days ago and extensively profiled QtWebKit rendering. 95% of the time is spent shaping glyphs in the call tree of QFontMetrics::width(), and man does a lot of code get run just to get the width of a text string. The calls drop into Harfbuzz, and all sorts of functions end up eating a bit of wall time here and there, and this all adds up. There’s no single point that does something very stupid that could then be optimized away, at least not in this module. There’s even an internal “simplified” codepath for getting the width of a text string, but I’m not sure why the Nokia guys are not using it in QtWebKit. I’m probably missing a piece of the puzzle. Either way, there was no low-hanging fruit to remove.

The most fruitful optimization work will probably come from higher up the call chain. For instance, there are superfluous layout calls for page rendering, which (if I’m reading the bug comments right) make the whole thing calculate the render tree three times. Ouch. But they’re working on it.

The point to take home is that when Qt 4.7 lands, Sigil should see a very nice performance boost for loading books and switching to the Book View. Probably for general WYSIWYG actions in the Book View too. Hurrah for that.

Tuesday, May 11, 2010

0.2.0 FINAL

And here it finally is. After months and months and months of finger-breaking[1] work, it’s finally done. Changelog is minimal going from RC4:

  • added new entries to the help menu for the online manual and the FAQ

Now cue tens of thousands of 0.1.9 users screaming “WHY DO I ONLY SEE JUST THE FIRST PAGE??!!”. See the FAQ.

Footnotes

[1] Like back-breaking, only for programmers.

Sunday, May 9, 2010

Manually, but for the web

When I decided I wanted to write the manual for Sigil, the first thing that came to mind was putting it online as some sort of website. A web version can be always up to date, you can put a link to it in the Help menu of your application and you don’t have to blow up the download size by including it in the installers.

But that was wishful thinking; Sigil is hosted at Google Code, and it doesn’t offer a way to host custom HTML pages, and I’m not buying a domain and webhosting just for a damn manual. So sadly, that couldn’t work.

The second idea was writing the manual in pure LaTeX, since if I’m going to write an “offline” version of the manual, I’m going to typographically do the best job I can. Epub can be used for technical manuals, but some of the nice typesetting tricks I hand in mind couldn’t be done in it.

Then it occurred to me that having a PDF-only version of the manual for an epub editor was… not the best political decision. I wanted to find a way to eat my cake and have it too: to offer both a nice PDF and a nice epub. I was willing to sacrifice some of the things I could do with custom LaTeX if the resulting PDF was “good enough”.

reStructuredText and Sphinx came into the picture. As I mentioned in a previous post, the PDF version of the manual sucks. Links don’t go where they’re supposed to, the layout is shoddy and in general, it just looks bad (to me at least). Sphinx was primarily designed for creating HTML documentation, and the LaTeX generator suffers because of it. I decided to upload what I had now, gather some feedback on the content and then rewrite the thing in custom LaTeX.

Then came the expected politics

While I’m convinced that I’m right[1], I wanted to talk to a few friends who lead or are members of other open source projects (some hosted on GC too) what their thoughts were on the subject. I got universal agreement that a LaTeX PDF was the way to go, but one of them pointed out that GC indeed does provide a way of hosting sites, just… unofficially. The Google guys apparently have no problems with people hosting their site inside their source code repositories. This could of course only work if your site serves only static content, which a manual would certainly be. They’ve even improved the SVN backend for direct web content serving, so they certainly approve of the practice.

Sigil uses Mercurial, and Google provides every mercurial-based project with many repositories. So I used Sphinx to generate the HTML, committed it all to a new “web” repo and voilà! An online manual.

Sphinx produces wonderful HTML, and I’m willing to sacrifice some typesetting beauty for online accessibility. It’s also much easier to customize the HTML output by injecting new CSS. Tweaking the LaTeX output was a nightmare.

There won’t be an offline epub version since there’s no point anymore and I want people to see the code samples and images on a large screen. But if someone wants one, they can easily download the manual sources and build one with Sphinx.

I think everyone should be satisfied now.

Footnotes

[1] Big surprise there.

Saturday, May 8, 2010

0.2.0RC4

Another minor update:

  • fixed a regression that broke FindNext opening new tabs when searching across HTML files (issue #384)
  • fixed an issue with autocompletion in Find dialog ignoring the case of search terms (issue #385)

I’ll be doing these until the explosions stop.

Manually

I’ve just uploaded the manual for Sigil. You can get it from the downloads section.

The manual was written in reStructuredText (RST) and then processed with Sphinx. Honestly, I’m not too happy with the layout of final file compiled from the Sphinx-generated LaTeX. Some of the hyperlinks don’t lead to where they’re supposed to (bug in Sphinx I guess), and typographically in general… well I don’t like it.

Initially, I wanted to write the manual in straight hand-coded LaTeX. I decided against that because that would mean the manual would only be available in PDF form.

Now, I’m the kind of person who always tries to use the right tool for the job. I love epub and absolutely abhor the PDF “ebook” novels. PDF for novels just doesn’t work, and if you’re producing them… don’t. Make epubs. But for technical manuals with complex layouts, code listing, figures, diagrams, sidebars and the like, PDF is the right choice IMO. Epub is just not the right call for these kinds of books.

Sure, you could make epub versions of technical manuals, but you’d have to sacrifice some of your layout.

Then there’s politics. An epub editor with a manual in PDF form only? That just looks bad. That’s kind of like Microsoft employees using Linux.

I was pondering this until someone told me that the next version of Sphinx will have an epub generator. You would give it RST and it would spit out a nice epub file from the same source it would generate LaTeX markup. That sounded like an amazing deal, so I went with that.

I loved RST until I started using it. I don’t love it anymore. It has a weird syntax and to get anything above the barebones, you have to do all these strange contortions etc. Add to this that the resulting LaTeX markup produces a PDF that I dislike. I tried to tweak it here and there, and I’ve improved the output considerably over the last few days, but still… Having an epub file to silence the naysayers is good, but not at the expense of quality.

I’m pretty sure I’ll be rewriting the manual in pure LaTeX when I get the time (a few weeks from now, probably). Until then, enjoy the current version.

And if someone feels that the new PDF-only manual shows Sigil in a bad light, I say screw ‘em.

Thursday, May 6, 2010

0.2.0 RC3

Ok, RC3 is now up on the site. Download it here.

Changelog:

  • fixed a regression that messed up the tab order of the controls in the Find dialog (issue #380)
  • fixed an issue with cross-file FindNext causing a hang when the search term is not in the book and "Direction: All" is used (issue #378)
  • the Text folder in the Book Browser is now expanded by default after loading

This is also the first release that was completely built and uploaded by an automatic process composed of several Python scripts. Since it’s brand new, it could have introduced some problems. The whole point of this automated build system is to eliminate those nasty situations where I would upload the wrong file, forget to include some critical libraries into the installers or something else.

Once the system has been set up to produce correct builds, it should stay that way.

If anyone’s wondering when 0.2.0 final will be released, my rule of thumb is three days without any showstopper bugs reported. In RC2, the showstopper was the FindNext hang that has now been fixed.

Tuesday, May 4, 2010

0.2.0RC2

The second release candidate is now available. Changelog follows:

  • fixed an issue with ReplaceAll across files not using correct replacement lengths
  • fixed an issue with code in Code View not being pretty-printed
  • fixed an issue with the ReplaceAll across files not informing the opened tabs of the change

Small but important. I consider all of these to be showstoppers. Hopefully these are all there is.

If anyone finds any other bugs, please report them on the tracker. If this release stays showstopper-free, it will be promoted to a full release later this week.

Monday, May 3, 2010

0.2.0RC1

The first Release Candidate is here and you can get it from the downloads section. Between now and final 0.2.0, I’m fixing only showstopper bugs and maybe a few of the trivial ones. If all goes according to plan[1], there won’t be any need for further changes before the final release and thus you’ll see it a few days from now. This is not called a Release Candidate for nothing.

So hammer this version hard, especially the Find & Replace functionality.

Here’s the changelog. It’s by far the largest batch of changes to Sigil for one release (disregarding the first 0.2.0 beta, of course). Also, some very important bugs were finally fixed plus lots of new features, so this really is a big release. Some highlights have been… highlighted:

  • changes in the Book Browser now update the modified state of the main window (issue #331)
  • the Book Browser can now be opened/closed from the View menu (issue #335)
  • all the toolbars now have UI-facing names
  • by injecting a custom XML reader into QDom, the following issues were fixed:
    • Book View search sometimes skipping over instances (issue #253)
    • Book View ReplaceAll causing Sigil to hang on rare occasions (issue #293)
    • spaces disappearing from some HTML constructs (issue #352)
  • implemented component-wide search&replace for Code View searches (issue #372)
  • the Find&Replace dialog now remembers up to 20 previously used search and replace strings (issue #369)
  • fixed an issue with positive regex lookaheads in normal Replace (not ReplaceAll) (issue #261)
  • fixed a rare off-by-one error in Book View searching when the caret was at the start of the matched string; this made the search skip that instance of the match (issue #280)
  • fixed an issue with the Find Dialog not correctly scrolling to the found text in Book View (issue #195)
  • fixed an issue with Tidy not fixing free ampersands into "&amp;", even when configured to do so (issue #365)
  • fixed an issue with the current tab unnecessarily reloading after book saves (issue #354)
  • fixed issues with filename basenames being read only until the first dot; was causing problems with OPF manifest ID generation (issue #351)
  • hitting the keyboard shortcut for the Find&Replace window while the window is open now switches focus to that window (issue #362)
  • fixed an issue with the applied headings not "sticking" and not showing up in the TOC editor (issue #300)
  • the special iPad- and Calibre-friendly cover meta tag information is now preserved after loading
  • added a new "Cover Image" entry for image resource in the "Add Semantics" Book Browser menu
  • if an image is not set as a cover image manually, Sigil now uses heuristics on save to determine if the epub has a cover image
  • if an epub has an image set as a cover image, Sigil will now write a special meta tag that identifies this image in the OPF; this tag is then used by the iPad (and Calibre) for the book cover, for instance
  • all OPF <guide> element information when loading epubs is now preserved
  • added a new "Add Semantics" menu for XHTML documents; it can be used to mark XHTMLs as "Dedication", "Colophon", "Glossary" etc. for the <guide> element of the OPF
  • the status bar now shows a message after chapter split operations
  • fixed an issue with filenames with characters that should not appear in valid XML IDs having those characters added anyway (issue #344)
  • fixed an issue with files with uppercase extensions not having a mimetype set in the OPF (issue #349)
  • fixed an issue with Sigil rewriting headings when the TOC was opened and no heading was edited (issue #327)
  • fixed an issue where adding an existing HTML file through the Book Browser would clear the current metadata in the book (issue #329)
  • added a check that prevents Sigil from loading the same resource multiple times in invalid epubs (issue #339)
  • fixed a bug that made the direct XHTML references in the NCX file less likely (issue #333)
  • fixed an issue with Sigil crashing when trying to save a loaded epub that had some badly formed metadata elements (issue #325)

It’s pretty damn huge.

Book-wide search

For now, you have to perform your book-wide searches in Code View. I know, I know. It’s not everything you wanted. But technical restrictions are causing problems[2]. I’ll try to write a post about it tomorrow explaining the issue, I’m too tired to go into it now. Just learn to live with only Code View searching until I work around this[3]. It should cover 95% of all your needs.

I’m making it up with the new remember-20-last-used-search-strings feature. The input fields also provide automatic text completion for previous searches. All of this was scheduled for 0.2.1, but hey.

Footnotes 

[1] Do I even need to chuckle at that?

[2] Cookies to the first person that points out the framework giving me grief.

[3] Something like 0.2.2, maybe sooner.

Monday, April 26, 2010

The new “Add Semantics” menu

After the iPad came out, everyone (myself included) wanted to know how well it would handle epub. As it turns out, not great. No embedded font support? Bad Apple, bad!

Anyway, the iBooks application for reading epub books on the iPad comes with this large shelf view where all the books display their covers. Shiiiiny. But as a content producer, you have to explicitly tell the iPad which image in the epub archive is the cover.

There was an interesting discussion about this on MobileRead. The idea to take away is that you need a special meta tag in the metadata section of your OPF file. Something like this:

<meta name="cover" content="coverID" />

The “coverID” needs to be the ID of the cover image in the manifest section.

Until now, you had to save your epub with Sigil, extract the epub, edit the OPF, add this meta tag and then zip everything up properly (mimetype file first with no compression etc.). Oh and you could never open that epub again with Sigil since it would remove that special meta tag.

A horrible PITA, wasn’t it?

Well that’s fixed now. The next version of Sigil will have a right-click context menu for images with which you’ll be able to mark an image as a cover. Done. Sigil does everything else required for you.

Along with this “Add Semantics” menu come new entries for the HTML resources. For those, you can now add the <guide> element semantic information. So if you mark one HTML file as, say, a “Title Page”, Sigil will add this information to the <guide> element in the OPF.

Naturally, this feature would be useless if Sigil wouldn’t preserve all of this information after opening an epub that already has it. So it does that now too: all information from the <guide> element is now preserved, including custom values for the “title” attribute (even though you can’t see that value in Sigil, it’s stored). The special meta tag identifying the cover image is read and that information is preserved, too.

Being smart about it

Sigil always does its best to help you out. Well, it at least tries :).

I’ve added heuristics to Sigil that will mark the appropriate image as the cover if you don’t do it yourself. If the first HTML file in the reading order is “very small” and has only one image in it[1], that image will be selected as the cover.

So if you follow best practices, Sigil helps you out. Still, mark it by hand if you can. You will always know better than the machine.

Things to note…

While the OPF spec technically does allow you to, for instance, specify several HTML files as the title page in the <guide> element (for God knows what reason…), Sigil stops you from doing this. It allows you to set only one instance of one <reference> type per book. So if one file is set as the title page, setting another file as the title will unmark the last one.

The exception are loaded epub files. If your loaded file specifies several HTML files as, for instance, the preface, then all of those are still marked as such in Sigil after they’re loaded. While I personally think you should never use more than one reference type instance per book[2], if you did this to one of your books before opening it in Sigil, that information will remain. Sigil won’t step on your toes here.

Finally, here are two images showing off the new menu. The file loaded is Three Men in a Boat. If you don’t want to wait for the next release[3], you can build from repo source and start using this right now.




Footnotes

[1] Sigil looks for a normal <img> tag or an SVG <image> one.

[2] It’s a terrible idea, and it would probably wreak havoc on unsuspecting Reading Systems. The spec actually doesn’t explicitly allow it, it just doesn’t talk about this possibility at all. I’m still not convinced that the spec writers didn’t just forget to forbid this behavior.

So Sigil won’t let you do this.

[3] Which will be RC1. No more betas! :)

Tuesday, April 20, 2010

QtWebKit 2.0 AKA pigs flying

So by now, you should know that QtWebKit is slow. Damn slow. If you don’t, refresh your memory.

If you're performing an editing operation in Sigil, and the UI blocks for 10+ seconds, it’s QtWebKit. If you’re loading a large HTML file into the Book View and everything grinds to a halt for 30+ seconds, yeah, that’s still QtWebKit. I’m not saying all the code I write is perfect (far from it), but QtWebKit has a special place in my heart of hatred.

Back in that loading performance post I linked, I talked about how it takes 75 seconds to load Three Men in a Boat in Sigil 0.1.9. Some of the Linux users may have been confused by that, since on comparable hardware, it would load much faster on a Linux machine.

And it’s true. Sigil performs about an order of magnitude faster on Linux than on Windows. Why is that?

Well it’s because while Qt is thoroughly tested on Windows, it’s certainly much less tested than on Linux. Qt developers are Linux users, and a lot of them are KDE developers as well. So the machines they use don’t run Windows. It’s a lot like the Sigil UI being fine-tuned for Windows, since I use it and develop on it almost exclusively.

So when there’s a performance regression in Qt on Windows, there’s a fair chance someone at Nokia will miss it. And they did.

The test case I attached to the bug report loads in 55 seconds on Windows and 6 seconds on Linux. Bear in mind the same test case renders instantaneously in Firefox, and probably also in Safari (although I haven’t tested it) which uses vanilla WebKit. Horrible, I know. They’re finally coming around to fixing that, since the bug is now included as one of the “release critical” bugs for QtWebKit 2.0 (which is now officially a separately released project from Qt). That means they’ll kill it before the next release, which should be sometime in May (Qt 4.7 a couple of months after that).

Here’s to pigs flying.

…and another thing

Remember the font variants issue preventing official font embedding support in Sigil? The QtWebKit bug causing the underlying problem has been tracked for the last eight fifteen months on their various trackers, and I’ve been told just a few days ago that it’s “not considered critical to the release” of QtWebKit 2.0… which probably means it won’t be fixed for at least another six months.

I ♥ Nokia.

Tuesday, March 23, 2010

β3

The third beta of Sigil 0.2.0 has just been released. Here are the release notes:

  • added two new WYSIWYG actions that work for both Views: "Insert SGF Chapter Marker" which inserts the old SGF horizontal rule chapter breaking marker and "Split On SGF Chapter Markers" which splits the current chapter according to the placement of these markers (issue #262)
  • chapter splitting now works in Code View
  • fixed an issue with Sigil adding "xmlns='http://www.w3.org/1999/xhtml'" to every element when performing a chapter break operation (issue #313)
  • fixed a rare issue with false spaces being inserted into words during import (issue #139)
  • added a confirmation dialog for removing items in the Book Browser (issue #306)
  • fixed an issue with the line number area overlapping the text in the Code View
  • made Sigil remove the CSS cruft WebKit was adding to the "body" element
  • fixed an issue with spaces in filenames causing bad anchor element path updates
  • fixed an issue with spaces in filenames not being URL encoded in "href" and "src" attributes in the OPF and NCX files
  • fixed an issue with spaces in filenames causing invalid IDs (issue #301)
  • fixed a regression causing Sigil to crash when importing HTML files that reference resources that don't exist on disk
  • Tidy now converts all uppercase attributes to lowercase; mixed-case attributes are left as is
  • fixed an issue with Tidy choking on uppercase attribute names

The beta process is taking substantially longer than I expected. This is mostly because I have less time to work on Sigil than I expected to have, but it’s also because of a huge number of suggestions and bug reports contributed by users since the process began. I can’t thank you all enough, Sigil is better for it.

Some of the feature requests have surprised me, but in a good way. The long awaited SGF chapter markers and splitting are now finally in, so I hope you all put it to good use (and report bugs if you find any).

Epub-wide search&replace is still MIA since I’ve yet to figure out a way to do it that doesn’t suck. I’m not satisfied with the approaches I’ve tried. But I’ll figure it out, don’t worry.

The next three weeks will see little development activity for Sigil since I’m heading into another round of university exams.

Wednesday, March 10, 2010

β2

The second beta of Sigil 0.2.0 has just been released. Here are the release notes:

  • fixed an issue with exported HTML/CSS/etc. files inside EPUBs having superfluous newlines
  • fixed an issue with the TOC editor adding empty "class" attributes to headings (issue #297)   
  • added a new "Window" menu item with new "Next Tab", "Previous Tab" and "Close Tab" actions (issue #273)
  • fixed an issue with the font used in the line number area in the Code View being incorrect when the Code View is first opened; the problem affected mostly Mac machines (issue #290)  
  • Sigil now handles corrupt epub files with an OPF referencing non-existent files (issue #289)   
  • the Book Browser now doesn't scroll back to the top when an item is deleted or added (issue #263)  
  • the Book Browser now allows a file's extension to change between HTM, HTML, XHTML and XML (issue #264)   
  • OPF and NCX files don't rely anymore on UTF-8 XML default parsing, but specify their UTF-8  encoding directly in the declaration   
  • fixed an issue with changes in the TOC editor not being reflected in the book (issue #277)   
  • fixed an issue with the TOC editor not recognizing the "title" attribute on headings (issue #271)  
  • fixed an issue with the user seeing the old, unclean source in the Code View (issue #286)  
  • fixed an issue with the user being prompted to save when quitting even when no changes  have been performed on the new/loaded file (issue #276)   
  • fixed an issue with Book/Code View keyboard shortcuts firing in the wrong view (issue #266)   
  • tentatively fixed an issue with Sigil locking up when chapter breaking (issue #267)   
  • fixed an issue with Tidy adding a superfluous “lang” attribute that is also not allowed  in XHTML 1.1   
  • making sure that ID attributes used in the manifest section of the OPF are always valid   
  • fixing export of epubs with XML files for OPS documents

Quite a few things made it to the second beta. As you can see, a truly great deal of bugs were fixed. Some of the things I wanted to include in this release didn’t make it in (like breaking chapters on SGF chapter breaks, issue #262). Sorry guys, but there were some really critical bugs in the previous beta (like the TOC editor not working… at all), and I wanted to push fixes for those ASAP.

My university duties and responsibilities are picking up again, so Sigil development will slow down a bit. Not too much I hope…

Tuesday, March 2, 2010

β1

Let’s get this out of the way: the first beta of Sigil 0.2.0 is now out. It’s what all of you want to know, right? There you go. There is also a release thread in the Sigil subforum on MobileRead, and all discussion on the beta should happen there.

You can only get it from the “Downloads” section, not from the main page. The 0.1.9 version is still the “official” one, and the new version notification will not be triggered. Bear in mind that this is the very first public release of the code I’ve been working on for the past 3+ months, so it’s going to be buggy. I’m hoping most of the major bugs will be reported quickly and the second beta will have them remedied.

So really try to bash it. Do whatever you normally do in Sigil and report any (unintentional) differences on the tracker. Or anything else you feel should be reported. Just don’t forget to state that you’re using the β1 release in your issue. This would also be a good opportunity to read the Reporting Issues wiki page if you haven’t already.

Why a beta and not a Release Candidate as I’ve previously announced? Well because not all the features that I want in the 0.2.0 release are done. The biggest omission is the cross-file search: in the current beta, you can only search (and replace) in the currently open tab, but not across all the files in the book. The whole search mechanism is about to be overhauled, since Qt 4.6 finally brought a native API for interacting with elements in a QWebPage.

There are numerous other minor annoyances that need to be dealt with before I proclaim a “real” 0.2.0. Things like being able to select multiple items in the Book Browser and delete them; currently you have to delete them one by one, which I find annoying.

Bottom line, it needs a bit of polish. I tried to stay away from that since you always try to make things work first, and then gradually improve them later.

So, betas first, and then after a few of those you’ll see at least one Release Candidate. When I’m satisfied with the way it looks and behaves, I’ll mark the next release after that as “official”.

With that settled, let’s move on.

SGF is now dead

The SGF format was created to provide Sigil with a native file format that could be changed and modified as needed. Initially, I wanted to make epub Sigil’s native format, but that didn’t seem like a good idea at the time. Back then, I wanted Sigil to eventually be able to save many different e-book formats, so I needed to make sure Sigil could store anything the epub format couldn’t but that was potentially in use by the other formats. So SGF was born. Whenever someone asked about it, the short answer was “SGF is to Sigil what PSD is to Photoshop”.

It was a product of thinking for the future… but the future has changed. Sigil is now mainly focused on producing epub books. With that, the major reason for SGF’s existence was gone.

SGF had a few advantages over epub in Sigil 0.1.x. For one thing, it stored the text you saw in the Code View raw, with no preprocessing on save. Saving as epub would split the One Huge Flow™ into different XHTML files according to your chapter breaks, and a great deal of other “book normalization” transformations were applied as well. So SGF was definitely the more “native” format.

In Sigil 0.2.0, the separate files are now kept separate, and any advantage the SGF format had is gone. With no advantage and no real future need for it, SGF format export has been removed: epub is now Sigil’s native file format.

Before you start screaming “but I have hundreds of SGF files!”, 0.2.0 can still open SGF. From there, you’re just a click away from saving it as epub. I’m not leaving you people out in the rain. :)

Show & Tell

I prepared this little screen cast of 0.2.0 in action. Things to watch out for:

  • The loading speed;
  • The  “paragraph merging” speed;
  • The Table of Contents open/close speed;
  • The CSS tab;
  • The XPGT tab;
  • The image tab, which supports SVG.

And here is the video…

Friday, February 12, 2010

Loading performance and 0.2.0

I’m the first person to acknowledge that loading epub and HTML files in Sigil 0.1.x is slow. Very slow. Abysmally.

Why is that? Well there are two components in the 0.1.x loading process:

  1. Extracting the epub file, reading the OPF, running Tidy on the source, updating resource references etc. Let’s call this the “file load”.
  2. After the file load creates one large HTML flow, this is then sent to the Book View (integrated QtWebkit) for rendering. Let’s call this the “QtWebkit load”.

You can tell when the file load has finished and the QtWebkit load is starting: the moment you see “File Loaded” in the status bar is the moment that all Sigil code stops executing and now it’s up to QtWebkit to render the page.[1]

QtWebkit load

QtWebkit load takes longer, by far. Using the Three Men in a Boat epub file[2] from the MobileRead ebook uploads forum as a reference, the whole loading procedure takes 75 seconds in Sigil 0.1.9.[3] Way, way too long.

Of these 75 seconds, only 14.5 are spent in the file load. The rest is all QtWebkit, so not something I can directly influence. The QtWebkit load used to be more than twice as fast in Sigil 0.1.5, but the subsequent versions of Sigil include Qt 4.6 (instead of 4.5), and in that version QtWebkit is much slower. Nokia developers admit they introduced some major performance regressions and are currently working on fixing that.

I have no intention on waiting for that to happen. It was bad before, but now it’s horrible. I considered it far too slow in Qt 4.5, and now?… So how about I come up with a way to work around this problem?

Currently, Sigil takes in all of the XHTML files in an epub, puts them all together and displays them as one. So you get one large “flow” where you can do all the editing. I chose this model because it’s the one used in the popular[4] Book Designer (which doesn’t support epubs).

This is where Sigil 0.2.0 comes in.

Sigil no longer does that. All the original XHTML files are preserved and are edited one by one. Since there is no one huge flow, QtWebkit rendering performance goes up tremendously (since there is less to render).

Now, when Sigil loads your epub file, the first XHTML file by reading order is loaded in the initial tab. Since this first XHTML file is usually a cover page, it takes less than half a second to render. So now instead of 60 seconds for the QtWebkit load, you get 0.2 seconds (for the TMB file).

Great, ha? :)

File load

But then again, there are those 14.5 seconds for the file load. It would be great if I could get that down.

Most of that time is spent doing two things: running Tidy on the large concatenated HTML document, and updating the resource reference paths. The updating process takes much longer.

The resource updating process is necessary since Sigil 0.1.x renames your resource files. Since images, CSS files, HTML files, fonts etc. now have different names, all the HTML tags and style rules referencing them have to be updated. This takes a long time.

In Sigil 0.2.0, there are now multiple XHTML files, and they all have to be updated the same way. The original resource filenames are now preserved, but the file structure changes so we still need to update the paths. All the XHTML’s have different content, but the file path updates are universal. This means we can now parallelize this:

  1. We create a thread pool equal to the number of logical CPU’s[5] on the system;
  2. We split the updating process into “tasks”, where each task represents the required update operations to be performed on each XHTML file;
  3. We let the threads munch the tasks as they become ready to process them.

So if you have a dual core system like I do, two different threads execute two tasks at the same time. As they finish the task they have been working on, they arbitrarily pick a new one and work on that. So the more logical CPU’s you have, the more threads you can run, the more tasks your computer can work on at the same time, the faster the file loading will be.

I then plugged in the old updating subsystem into this multi-threaded architecture and ran it on my dual core. The file load on TMB dropped to 11 seconds. Not quite the ideal linear behavior, but that’s to be expected since there’s overhead in talking to the threads, managing the task pool etc. And the OS eats your cores too, so your threads can’t stay active all the time. Also, not everything in the file load can be parallelized; lots of things have to stay sequential.

The other major problem is that TMB has a huge number of images, meaning that many HTML “<img>” elements have to be updated. With a more conservative epub file, the numbers would be even better.

But a 25% improvement on a measly dual core isn’t half bad. It would certainly be faster on a quad. But I can take that even more down, I know I can.

So I spent about six hours in front of a code profiler and Visual Studio, tracing the bottlenecks and optimizing the “hot” paths. The major bottleneck was—as expected—the large and cumbersome resource updating subsystem. After rewriting it in what must have been ten different ways (each version slightly faster than the previous), I came up with the final design.

For the sake of reference, my profiler says that the old version takes on average 470 milliseconds to run through one XHTML file in TMB. After six hours messing with it, the final version takes 15 milliseconds. That’s 31 times faster.

File load for TMB? It’s now 3.3 seconds. Including the 0.3 for the rendering of the cover page, it’s 3.6.

So from 75 seconds in Sigil 0.1.9 down to 3.6 seconds in the development version of 0.2.0, I think I’ve done a pretty good job improving the loading speed.

For epubs with a “normal” number of images and computers with more logical CPU’s, it’s even faster.

epub name Time – 0.1.9 (s)[3] Time – dev0.2.0 (s)[3]
Three Men in a Boat 75 3.6
Sylvie and Bruno 82.5 6
Savage Stories of Conan 90.2 4.5
David Copperfield 98 3.2

These are all x86 times. For x64, knock off 10%.

Footnotes

[1]  “Render” means that the colors for the pixels on the screen have to be calculated, i.e. the screen has to be “painted”.

[2] Written by Jerome K. Jerome and painstakingly hand-crafted by MobileRead user zelda_pinwheel. It’s a great book, you should read it. It’s also an amazing epub file, I use it as my main reference during Sigil development.

[3] x86 Windows version of Sigil, on Windows 7 x64. Computer is a Core 2 Duo 6400 with 4GB RAM.

[4] But horrible.

[5] Logical CPU’s are the number of actual cores on your system and any “virtual” cores from HyperThreading.

Sunday, February 7, 2010

63 bits plus one

Some of you may have noticed that while you can get precompiled binaries of Sigil for Linux in x86 and x64[1] flavors, you only have an x86 version for Windows. Why? Well Microsoft made very a good job ensuring that 32 bit applications still run on 64 bit Windows. So there was no need for x64 Sigil. On Linux, it’s slightly different and nowhere near that easy.[2]

So what are the main benefits (from an application’s point of view) to the newer instruction set/architecture?

  1. The application has access to a larger address space (both physical and virtual),
  2. Registers are 64 bits, not 32; this allows better/faster 64 bit math,
  3. Double the number of general-purpose registers,
  4. Double the number of XMM registers,
  5. SSE instructions can be safely used knowing that all x64 CPU’s have to support them.

And several other things. The drawback is that your compiled code is now larger: pointers are all 64 bits, but the caches on the CPU’s are the same size. This is a big issue.

In the end, whether you’ll see direct performance improvements from moving to x64 depends entirely on the application. Some will, but some won’t. The Visual Studio devs have chosen not to make the transition just yet. I know other developers who have stated that their apps also behave worse on x64.

So you have to profile it to know for sure.

I used to think Sigil wouldn’t benefit from x64, performance wise. But then there was that nagging feeling telling me I should test it and see. The main reason why I didn’t want to do this is because I’d have to setup an entire new build system, with an entirely new Qt etc., etc. I already have four: Win x86, Lin x86, Lin x64 and Mac Universal. And building Qt (AGAIN) takes about five hours. You basically have to sit in front of the console, watching green text fly by because it likes to flake out in the middle and then you have to start it again. Not my ideal way to spend an afternoon.

But today I came down with a fever so I was too weak to do anything useful anyway. I may as well sit dozing in front of a screen. So I did.

Many hours later, after I got everything compiled and working, I set out to test Sigil 0.1.8 in x86 versus the same in x64. The test would measure the time it takes to load an epub book, from start to finish (this is easily the longest running operation in Sigil 0.1.8). I chose three epub files, did five runs for each on both versions, recorded the times and voila!

The x64 version was consistently 10% faster.

So my assumption was wrong. In light of these results, the next public release of Sigil (which should be 0.2.0) will in all likelihood include an x64 version for Windows.

In other news, I’ll be getting the 2010 version of Visual Studio when it ships in two months, for several reasons. But it will also provide tangible performance improvements for Sigil, since MSVC10 C++ compiler optimizations have improved. That’s another 10%.

And then there’s Sigil 0.2.0 and multithreaded, multi-flow loading. Now if only Nokia could get QtWebKit in a respectable shape…

 

Footnotes

[1] Some say “64 bit”, some “x64” or “x86-64” or “AMD64” or “Intel 64” and they all argue which is the correct one… I don’t care. I’m calling it “x64”, since that’s what people around me seem to be using. You know what I mean: the 64 bit extension to the x86 instruction set that AMD came up with and then Intel licensed.

[2] Also, Linux users tend to yell a lot more when their needs and/or desires are not met. Believe me, you don’t want to know.

Saturday, January 30, 2010

Back in town

Finals… finally… over!

I have a couple of papers due in a few days, so I’m still working, but after that I’ll have about three weeks of vacation time. The first few days I’ll spend soaking my brain in warm water and sleeping, but after that, I plan to spend most of my time working on Sigil.

In other news, I’ve decided to shift Sigil’s long-term development focus.

Way back when I started thinking about creating an eBook editor (more than a year ago), I primarily wanted to make an epub editor. The format badly needed one, since no such editor existed at the time. But I also wanted to replace Book Designer for most of the formats it exports. If you’re not familiar with BD, believe me, you don’t want to be. Let’s just leave it at “it’s horrible”. BD development is also completely dead (and has been for many years), and the application isn’t open source so no one can pick up where the original developers left off. The beauty of closed-source apps…

So the plan was this: make an eBook editor with an emphasis on epub, and then slowly add export support for all the other formats. A general-purpose eBook editor.

Back then I recognized several major formats I wanted to support, those that had more than a niche market: LRF (AKA BBeB), MOBI and LIT[1]. But the scene has changed a lot since then…

LRF

First off, LRF is dead. It was proprietary to Sony’s reading devices, and Sony just switched their store over to epub. All of their Readers that support LRF support epub, too[2]. LRF’s are not being sold anymore. Some people are still clinging to the format for personal use though; mostly because all epub books are displayed left-justified on PRS-505’s. Newer Sony Readers don’t have this limitation, but there it is. It still doesn’t change the fact that the format is dead. I’m sorry, that’s just the way it is.

I can’t say I’m sad it’s gone. There’s basically zero information on the format itself: what the OSS community knows about it, it knows mostly through reverse engineering. That’s not a happy place to be.

Anyway, it’s not on Sigil’s roadmap anymore. Even Kovid has stopped fixing bugs in Calibre’s LRF output plug-in.

LIT

This is the format for Microsoft Reader. It’s pretty much dead too. Even the website I linked got this shiny new look just a couple of months ago, until then it looked like something out of the ‘90s. I guess Microsoft thinks that eBooks are cool now and wants back in. They are, but the gravy train has left. LIT is only popular on PocketPC and Windows Mobile, and now epub is chewing that away. LIT is basically OEBPS 1.0, a somewhat direct precursor to epub.

MS abandoned LIT a long time ago, and now it’s too late to play catch up.

MOBI

I’ve never actually used MOBI. I hear it’s very popular with cell phones and similar devices. But ever since Amazon bought Mobipocket SA, the format seems to be on the way out. Amazon uses a custom version (AZW) for the Kindle’s, and you can’t read Kindle books on anything but a Kindle or an Amazon-sanctioned reading application. That’s not much of an open format. They also seem to be going with something called “Topaz”[3].

The main problem for MOBI is that Mobipocket SA doesn’t allow device makers to support both MOBI DRM and some other eBook DRM on their electronic readers. And the manufacturers are seeing the writing on the wall and switching to Adobe’s Reader Mobile SDK which supports epub and PDF DRM. More money in that.

So MOBI is not in great shape, long-term. But it’s still popular enough that I’m keeping it in mind.

Conclusion

Seeing as how the other three major formats are either dead or dying—mostly thanks to epub—and the new player in town is solidifying his position quite nicely, it may be prudent to focus just on that. Sigil will from now on focus on bringing the best epub editing experience. It also means I won’t be spreading myself too thin.

Do understand that I’m only talking about MOBI, LIT and LRF exporting. Sigil will certainly one day be able to import files of those types (and others). That hasn’t changed one bit. Also, HTML and RTF export are not in jeopardy: those are very general-purpose and quite useful, so exporters for those will be written.

The current importers and exporters will be rewritten to use a plug-in architecture, so anyone who still wishes to develop export plug-ins for the above three formats will be able to do so with ease[4]. All I’m saying is that I’m personally not going to develop them. Not any time soon, at least. They’re off the roadmap.

Sidenote

Google Analytics reports a 500% traffic spike on Sigil’s project page starting last Wednesday and peaking on Thursday. Seems a lot of people suddenly felt the urge to search for “epub editor”… I wonder what could have caused that…

I personally like the device and seeing as how the iBookstore will be selling epub books, it can only mean good things for the format. I plan on getting one of these when they become available. Hopefully this will also push Adobe to improve Adobe Digital Editions, which has more than a few rendering quirks. But honestly the biggest complaint I have against it is its utterly abysmal Unicode coverage with the default fonts. I’m pretty sure the iPad won’t have these problems: the videos clearly show user-selectable fonts, and the ones we can see preloaded[5] are known to have good coverage.

Hurrah for competition.

Footnotes

[1] I’m not counting PDF since that’s not an eBook format, or RTF and HTML which are more general-purpose and have to be supported.

[2] The PRS-500 users can send their Readers for a free firmware upgrade for epub support, the PRS-505’s already has this upgrade and the others read it natively.

[3] Which I hear is actually Amazon’s proprietary implementation of epub. Go figure.

[4] And drop them in Sigil without violating the GPL if they choose not to provide the source code. I’m trying to lend a hand to the publishers that currently use Sigil.

[5] Like Times New Roman.