Monday, December 12, 2011

0.4.902 (0.5 beta) Avaliable

The first beta for 0.5 (0.4.902) is now available.

There are a few new features I'm most interested in getting feedback on. Inline spell check, translations, and the new PCRE engine. Of course crashes and major issues will be looked into and hopefully fixed before the final release.

Tuesday, November 8, 2011

Sigil and Data Loss Bugs

The majority of the data loss issues have been mitigated at this point. With a work flow of open, save as after major changes and saving after minor ones, catastrophic data loss can be worked around to the point that Sigil can and is being used on a day to day basis.

That said, there are issues with data loss in Sigil and they are a priority. I'm currently finishing up the 0.5 release (I do not have a set release date at this point) which is mainly a feature release and only addresses some of the the data loss issue. For example you can still have everything in an entire XHTML document removed by putting a malformed XML header in the document.

The issue has three components that require major work to fix. I hope to have it all completed for the 0.6 release but it's going to be some time it's ready.

The issues are:

1) Sigil currently uses Tidy to clean all XHTML to ensure it conforms (as much as it can) to the XHTML spec. I have seen Tidy remove tags it thinks are empty when they influence how the document is rendered. I want to keep Tidy as part of Sigil but I believe it should only be run when the user asks for it and any changes it makes the user should be able to revert.

2) An intermediate data store is used that requires valid XML is used. This store shuffles data between the book and code view. Due to this store requiring valid XML (valid XHTML conforms) there is the potential for data loss if it has to auto correct the XHTML. If you are in code view and have malformed structural issues with the XHTML and move out of it there is a warning dialog. This only appears when you are working on one file at a time. If you are replacing across multiple files auto correction is used and this can lead to data loss. This data store needs to be replaced with one that does not require valid XML.

3) Putting malformed content into the book view will cause the book view to try to correct it. Again auto correction can lead to data loss. This is mitigated by the malformed error dialog but many users just disable it and find that sections of their document are missing after looking at it in book view. Also, the book view is a WYSIWYG tool so it does make structural changes to the document and these may or may not be what the user expects. As with Tidy changes made by the book view need to be able to be reverted. I am thinking about ways to make the fact that the book view more obvious that it makes changes to the document. This way the user is aware that they need to use undo (doesn't currently work for book view changes) to revert the changes if they don't like them. I'm thinking about using a preview mode by default that doesn't make any changes and an edit mode to make this distinction obvious.

The above issues can be fixed but they are not quick or easy changes. I plan on making them for the 0.6 release as part of the changes necessary to support EPUB 3. However, there is the possibility that they will slip to 0.7 due to how large they are. Unfortunately, all I can say right now is I'm aware of the issue, I know what the cause is, and I have an idea of how to correct it but it's not going to happen tomorrow.

Saturday, October 8, 2011

Sigil Now Supports Translations

One of the the new features that has been implemented for 0.5 (release date yet to be determined) is support for Translations. For Sigil's first supported language Grzegorz Wolszczak has provided a Polish translation. Currently translations are loaded based upon the current system locale. There no support for choosing the language via preferences. This may come at a later time but for now I believe that using the system locale will handle the majority of user needs.
I've put together a wiki page with instructions for creating translations. This first revision is a bit basic but as people have questions I plan to update it to make it more robust.

Saturday, October 1, 2011

Sigil Keyboard Shotcuts

Thanks to Grzegorz Wolszczak Sigil now (will be part of the 0.5 release) allows users to change keyboard shortcuts for many actions. Grzegorz has been helping out a lot and helped to introduce a preferences dialog and provided user configurable keyboard shortcuts.

Sunday, September 4, 2011

Sigil's Future Direction (Post 0.4.x)

Introduction

With 0.4 my focus has been on getting the existing features in a stable state. I foresee 0.4 being around for quite some time as development shifts to accommodate new features. I wanted to be sure a relatively bug free version is available for people to use. If data loss is a constant then there wouldn't be any point in using Sigil. Now that 0.4 is done it's time to start working on what's next.

Just what is next? For the time being I've marked a number of issues on the issue tracker as Milestone-0.5. My plan is to have 0.5 just implement the most commonly requested and most interesting features. 0.5 has no vision and is just a stop gap while I familiarize myself with Sigi's code base. 0.5 is my short term plan. It's not grand but it's functional and sufficient.

Recently I posted the conclusion of my Sigil user study. The findings are Sigil is most used and most useful to power users and small professional ebook creating houses. Also, the overlap between the two is significant. Thus I want to target these two group and make Sigil even more useful for them. Keep this in mind because these two groups are who is going to shape my views of where I want to take Sigil.

Please realize that not everything I'm going to talk about is set in stone. A lot of it probably will never happend. Also, this is part plans, part what I want to do, and part rant about what Sigil does that I don't like. This is what my ideal Sigil would look like and it is what I'm going to work toward. However, nothing is set in stone.

Plugins

If you've ever used calibre or Firefox you will know that plugins are amazing. They allow for easy and quick changes and additions to be made without having to change the main application. Both calibre and Firefox have large third party plugin communities. I would like to bring this to Sigil and I want a framework where all book manipulation is available over a plugin interface.

My feelings with Sigil are plugins should make small self contained changes. Similar to calibre's heuristic processing. For instance, italicize common cases, up / down shift headings, and normalize CSS. To make plugins really useful I want to have a system where multiple plugins can be chained together and run in sequence. This would be super basic internal script functionality.

For plugins themselves I'm undecided about how they should be implemented. I don't mean API wise because that isn't even a thought at this point. I'm talking about what languages they should be able to be written in. C++ as a shared library will of course be supported because Sigil is written in C++. However, I want to Sigil to be able to load plugins written in scripting languages.

My first thought is Python because I'm very familiar with it and love to work with it. I'm also thinking about Lua and QtScript (Javascript without DOM). I don't support frameworks for every one of these languages due to the amount of maintenance required. So I want to support only one scripting language. Python is big and slow. Lua is small but doesn't have the advanced text manipulation libraries Python offers. QtScript is Javascript with is an abomination of a language. Added size of Sigil's install, execution speed, ease of supporting, knowledge by contributors and text manipulation support are all major considerations.

Editor

Currently Sigil does not respect the structure of existing files. When you open an EPUB in Sigil it restructures the file layout. It even goes as far as to rewrite each XHTML file by running it through Tidy. With 0.4.0 cleaning with Tidy can be disabled but pretty printing is still used and alters the XHTML. I absolutely hate this! If I want my XHTML or file structure changed I'll do it myself.

I want to change Sigil to not be as automatic. Restructuring and cleaning of the XHTML should be moved to plugins and run when the user requests it. This way a user can open Sigil, change the metadata, save, and the only thing that changes is the OPF with the metadata changes. Not every single piece of the EPUB.

I also hate WYSIWYG editing because it inherently must make drastic changes to the underlying code. I don't think it's a good idea to remove it though. I would prefer to have the book view default to a preview mode that is read only. There wouldn't be any changes made to the code by using book view. Read only is the default but the user should be able to have an edit toggle that will set the book view to edit mode which will work like it already does. This way a user can make changes that may not be valid or work, check them, see there is an error (say a missing tag) without losing any work. They can see the issue fix it and still be able to use WYSIWYG editing when they want.

Data Store

Right now XML (XHTML included) data is stored as a Xerces DOMDocument. This is then loaded into the book or code view depending on which one is focused. The use of a DOMDocument often leads to data loss. Putting malformed XML into a DOMDocument can have unintended consequences. Especially when then loading that into a QWebView and getting back a string.

I want to replace the DOMDocument with a plain string as the data store. This will prevent a lot of data loss, especially combined with the book view defaulting to read only. Further, this combined with not making automatic changes to the code will make the well-formed error warning unnecessary.

Not auto processing with Tidy and checking for errors automatically will allow Sigil to produce invalid EPUBs. I really don't care that this can happen. The tools (FlightCrew) will still be there to check that the file conforms to the spec. It's up to the author to ensure they're publishing valid EPUBs. An EPUB that is being actively edited doesn't have to be valid at all times. I'd rather put the onus on the person using Sigil to ensure their EPUB is correct before publishing versus having Sigil force validity at every moment.

Undo

Undo is terrible right now. Some actions cannot be undone, some can. The book view's undo is completely separate from the code view. You can't undo a replacement when doing it across all HTML files on files that aren't open in a tab. I want to see a unified single undo that allows for setting back out of any change.

Further along this line I would like some graphical display where you can look at the changes that have been made to make it easy to find exactly how far back to undo. Something like Apple's Time Machine but for the state of the book.

Conclusion

Here is where I want to take Sigil: less hand holding, less automatic changes and more advanced text manipulation though a plugin interface. The big question is, should I skip putting out a 0.5.0 release with just the Milestone-0.5.0 marked changes and get started on the above now?

Saturday, September 3, 2011

Sigil and Linux Distribution Packages

The official Linux packages for Sigil are generic packages. They're bundled in an InstallJammer installer and contain a number of libraries that Sigil depends on. This is not ideal but it's not possible to provide Linux packages for every distro.

I've created a wiki page which I'm putting together a list of Linux distributions that have their own Sigil packages. These are the best packages for users to install because they're smaller and tailored.

If your distro isn't listed and it has Sigil packages let me know and I'll add it to the list. If your distro doesn't package Sigil let them know you would like to see them package it. I'm always willing to lend a hand to get Sigil in more Linux distros.

Friday, September 2, 2011

Sigil 0.4.2 Released

Sigil 0.4.1 is complete and available. This is mainly a maintenance release and fixes a number of bugs. Specifically a few bugs related to data loss. There was one major user visible change. The well-formed error dialog can be toggled not to show. This will cause errors to be auto fixed. Use this with care because the auto fix Sigil makes might not be what you want. As always see the changelog for a complete list of changes.

Wednesday, August 31, 2011

Sigil User Study

Introduction

Since taking over as the maintainer of Sigil I have spent some time reaching out to specific people in the ebook community to ask them about Sigil. Specifically if they use Sigil? Why or why not? What do they see as Sigil's shortcomings? How do they use Sigil in their work flow? Why doesn't Sigil work in their work flow. Basically, their thoughts and opinions on Sigil.

I asked specific people privately because I didn't want to be inundated with responses. The people can be broken down into three different groups: self publishers, power users, and professionals. After talking to professionals I've come to realize that they can be broken down into small and large. The size relating to the size of the company and production volume. I spoke with about 8 people total and I tried to keep it even between the various groups.

I wanted to find out who is using Sigil, who isn't using Sigil and why so I can determine where I want to take Sigil in the future. The only ebook editing I do is cleaning up a few books here and there. Learning how people use Sigil will help me to determine the best direction to take the project.

Self Publishers

Self publishers are authors. These are people who write their book and then want to sell it as an ebook themselves. Typically these people are using Word for writing. they export their work as HTML, then import into an ebook editor for final adjustments and savings as an ebook file. The two biggest things self publishers are looking for are easy and high quality .doc or .docx import and one click send to store functionality.

Self publishers are also interested in WYSIWYG editing and don't want to know about the internals of ebooks. They are primarily writers who see ebooks one of many distribution methods. They don't care about the intricacies of EPUB for instance, they just want their work to look good and be readable by their audience.

The typical tools I hear being used by self publishers are calibre for format shifting. Atlantis Word Processor and Jutoh for formatting and base ebook creation. Atlantis and Jutoh both provide very easy to use WYSIWYG interaction and you can use these without ever seeing a line of code.

Power Users

These are people who prepare works in their spare time as a hobby. They are not motivated by money and do not sell the works they publish. Typically the works power users deal with are public domain such as Shakespeare. This group also encompasses people who do not distribute works covered by copyright but spend their time cleaning and reformatting their favorite books strictly for their own enjoyment and personal use.

Power users are comfortable using either WYSIWYG and code editors. The biggest feature requested and talked about by power users is robust regular expression support for search and replace. Many of the books power users work with have terrible and often non-existant formatting. These works typically started life as either a scanned copy of a print book or a PDF file. Both of which typically leave broken paragraphs and misspellings thought the document. Which leads to spell check being the next most common request from this group. They are trying to take a jumble of half sentences and put them back together into a visually appealing layout.

The tools used by power users are Sigil, calibre, Word or Open Office macros, and many custom scripts. Also an advanced text editor like BBEdit and Notepad++ are must have tools.

Professionals

Professionals format ebooks for one purpose, money. This is what they do for a living. An author comes to them and pays to have the company turn their work into an ebook. For a modest fee an author can have a beautiful ebook produced without any headaches or hassle. Many authors prefer paying someone to do this portion of publishing for them just like they will pay an editor to edit, a print house to print, cover artist to design a cover and so forth. Authors write and typically want to concentrate solely on writing. Many self publishers format their own ebooks out of necessity because of the cost of hiring a professional.

With both small and large professionals I'm specifically talking about ebook publishing and digitization services. I'm not talking about huge publishers like Macmillan that do everything. However, the larger publishers I talked to makes me believe their process is the same as the huge publishers. The big difference between small and large professionals are the tools they use.

Small

Small professionals tend to use either Sigil or Adobe's InDesign for a good portion of their work. Both fill a very similar role in ebook creation. The big draw of InDesign over Sigil is InDesign supports print book layout creation. It's an all in one tool. This type of professional tends to use off the shelf tools that are readily available. Sigil and InDesign are not the only exclusive tools they use but one or the other tends to be a heavily used tool in their tool box.

Large

Large professionals tend to use custom tools. They staff people who's sole job is to develop and maintain ebook creation and formatting tools. They can afford to have custom tools that integration directly into their process. They don't use off the shelf or vanilla tools. This group is all about custom everything. This allows them to quickly adapt to changes.

Professional Tools

Sigil or InDesign and custom tools are all I know. Many professionals are vague about their process and tools. Some even declined to talk to me at all. They use tools in some way that works for them but their methods and implementation are proprietary.

What Does This Mean For Sigil?

Out of all of these groups I have little desire to target self publishers. There are existing tools that do a great job of meeting this groups needs. Sigil has a WYSIWYG editor and it can certainly be improved but I don't want to tie Sigil to a particular store or stores like Amazon or B&N. Also, I want to keep Sigil as an EPUB editor and not a generic ebook editor. I believe that Sigil's strength lies in being able to manipulate the internals of the EPUB format itself. I want to target this aspect more.

Power users are the major group I want to target. Out of all of the people I spoke with power users use Sigil the most and get the most out of it. Advanced editing of an EPUB's structure and code is where I want to take Sigil. That along with advanced text manipulation. Think expansion of calibre's heuristic processing.

Small professionals are major users of Sigil and I do not want to discount them. I believe that their use of Sigil overlaps with power users enough that targeting power users will also target small publishers. I do not want to alienate small professionals and will continue to take their needs seriously. From what I've learned about small professionals tools that make code manipulation easier will be a benefit and hopefully reduce their need for other formatting tools.

The last group, large professionals, do not use Sigil. I don't believe that changing Sigil to accommodate this group will get them to use Sigil. They use their own custom tools and Sigil doesn't fit into their work flow and I don't see it ever doing so. Thus I don't see it being worth while to work toward making Sigil "the tool" for this group.

Friday, August 26, 2011

Sigil 0.4.1 Released

Sigil 0.4.1 is complete and available. This is mainly a maintenance release and fixes a number of bugs. There are a few new features mostly around the code view. As always see the changelog for a complete list of changes.

Saturday, August 20, 2011

SCM Move to Git Completed Sucessfully

Last night I moved Sigil's SCM from Mercurial (hg) to Git (git). The change was completed successfully and without any issues. 0.4.0 had locations for things like the user manual pointing to the git locations. These are now live and working again. These links are now broken in any release before 0.4.0. This is unfortunate but unavoidable.

Monday, August 15, 2011

Sigil 0.4.0 and FlightCrew 0.7.2 Released

The long awaited Sigil 0.4.0 release is now out. Along with Sigil is a new release of FlightCrew. FlightCrew version 0.7.2 is the latest version bundled with Sigil.

If you're an OS X user then one thing need to be aware of is these two releases bump the minimum OS X version to 10.6 (Snow Leopard). The Linux builds (x86 and x64) are built on Ubuntu 11.04.

A lot of work went into 0.4.0. New features and bug fixes galore. See the ChangeLog for full details. I have to thank Strahinja Marković (the original creator) for leaving the 0.4.0 in a nearly finished state. Also, Charles King for being a bug fixing monster and helping make this release great.

One major change that is going to take place next week is, I am switching the source code management (SCM) system from Mercurial (hg) to Git. This will break the update checker in previous versions. Hence waiting a week before making the change. I have already converted FlightCrew (needed to test) so FlightCrew's updater in previous version won't inform you of an update. This is unfortunate but unavoidable.

Due to the change to Git the links for the manual in Sigil are not going to work for a week. This is because I've updated the locations in Sigil to point to the new locations that won't be available until the switch to Git.

Saturday, August 6, 2011

Donation change

I've updated the donation links they now point to my PayPal account. Going forward donations will be directed toward me instead of Strahinja. This is something we had talked about and planned. I just want to warn people who have donated before because you will see a different email address than least time.

Tuesday, August 2, 2011

Thinking of Changing Sigil's SCM

Right now Sigil is using the Mercurial SCM (source control management). I'm thinking of switching to Git. Since taking over Sigil I have had one person contributing and he is okay with the change.

I'm not looking to change simply for the sake of changing. Before Sigil I have never used Mercurial. I am not well versed with it and I have spend 50% of my time fighting with Mercurial. Charles (the person contributing bug fixes) has had the same experience.

My personal preference and favorit SCM is Bazaar. If I could I would switch to it in an instant. However, Google Code does not support Bazaar. Google Code is a good platform and I like it a lot. I think that Google Code does everything except for SCM right. If it supported Bazaar it would be perfect. That said I have no plans to move away from Google Code to Launchpad.

Since I can't use Bazaar I'm left with Git. Git works well enough and I'm more familiar with it than Mercurial. One reason I'm thinking of switching is, Git is very popular. Using an SCM someone isn't familiar with will prevent them from submitting patches. I'm hoping that by moving to Git it will make Charles and my lives easier. I'm also hoping it will encourage more people to hack on Sigil.

Windows Acquired

Thanks to a Sigil user (Bryan) I now have in hand a copy of Windows 7 Ultimate. Thanks Bryan for sending me a copy of Windows! The plan is to get RC 2 builds out this weekend.

Wednesday, July 27, 2011

All 0.4 blocker bugs squashed!

Tonight I was able to fix bug 813 (see 837 for details). This is the last blocker bug for the 0.4 release. I merged a few patches there were sitting around. I've also fixed and issue with changes to metadata not being relayed to the GUI as the file having been changed. With these bugs fixed 0.4 is in a state that I feel is ready for release.

I have little experience with building releasable, binary packages for OS X and Windows. So It will be a few days (maybe weeks) before they're out. There is a strong possibility I need to buy a copy of Windows and that will take a few days to get. Also, I need to actually setup a Windows build environment. I'm going to try building using an old netbook I have (it's the only computer I can use that has a copy Windows for building packages.

I don't want to do this release piece by piece so I'm going to wait to release all packages (OS X and source) together. Also, I plan to put out an RC 2 before 0.4 final. The RC 2 will be exactly the same as 0.4 final as far as the code is concerned. RC 2 is to make sure I package Sigil properly. It won't do any good to put out 0.4 that can only be run if you have all of the development libraries Sigil uses installed independently of Sigil.

Sunday, July 24, 2011

Taking Over Sigil

The other day it was announced I am now the maintainer of the Sigil. Back in June Strahinja announced that he was looking for someone to take over the project. I highly respect the work Strahinja has done with Sigil and this is a project I don't want to see die. I myself use Sigil one occasion and when dealing with e-books I often find myself recommending its use.

After seeing the announcement by Strahinja that it was time to move on I contacted him about taking over in his stead. He agreed and has now given me control over the project. From this point on I will be handling releases, bug wrangling and everything else that goes into managing an open source project. I don't plan to remove Strahinja's access to the code repository. If he asks I will but Sigil started as his baby and if he wants to start working on it again I fully trust him and I would have no problem with this.

The good news is Sigil is not going to die tor stagnate. I am fully committed to continuing the project and bettering the application. That said things are different now than when Strahinja was here. Strahinja was pretty much a one man show when it came to fixing bugs and implementing new features in Sigil. He had a lot of time he was able to spend working on it. I unfortunately do not have nearly that amount of free time to work on Sigil. I have a day job unrelated to programming, publishing, or books in general. I can only work on Sigil in my spare time (mainly after work and weekends). I also work on other projects too which will be sharing my free time with Sigil development. I cannot match the pace of development Sigil users have come to expect so unless others step up to help me with coding development will slow considerably.

My plans for Sigil are as follows. Short term I want to get 0.4 released. Currently it's sitting at RC 1 and there are only a few small bugs I would like to fix for it. I have been spending the past few days becoming familiar with Sigil's build system. Once 0.4 is out I will need to spend some time getting to know the ins and outs of the code itself. From there I will move onto working toward the 0.5 release. Right now I'm going to commit to a hyperlink editor and spell check support for 0.5. I will need to look over the existing bugs and see what else would be a good fit for 0.5. At some point when the EPUB 3 specification is finished I will work on bringing Sigil up to speed with it. That will take place in what ever release number Sigil is at at that time.

I'm not going to give any release target dates for either 0.4 or 0.5. Each Sigil release will take the form of when it's ready it will ship. One big difference between me and Strahinja is undoubtedly how we handle version numbers. I only use this system, major.minor.revision. With major numbers 0 is for feature incomplete and an unstable API. Going from 0 to 1 simply means that I feel the application is mature, stable and has a set API. Going from 1 to any other number means it's a massive change in some way (features, functionality, UI, API...). Minor numbers are for new features. Revisions changes mean there are no new features only bug fixes. Up until 1.0 is released the majority of releases you will see will be minor release numbers. Such as 0.4 followed by 0.5 instead of 0.3.0 followed by 0.3.1 and so forth.

Aside from Sigil development there is also the Sigil development blog. I plan to use it to communicate Sigil announcements. I do want to point out that I do keep a personal blog which I also use. I will be posting all Sigil announcements on my blog as well as on the Sigil blog. However, I plan to keep posts on the Sigil blog to only Sigil content. If you want to keep up with everything I'm working on including Sigil then read my blog. If you want to keep up with only Sigil then read the Sigil blog.

As always feel free to contact me with any questions or concerns. I am always available to help if I can. MobileRead, email or blog comments are all ways to get in touch with me. However, I do ask that support type questions be directed to the appropriate section of MobileRead as I'm not an expert on all aspects of e-books. There are a lot of smart people there who can help too and often you'll get an answer faster than asking me directly.

Saturday, July 23, 2011

All good things…

As of today, the official maintainer for both Sigil and FlightCrew is John Schember (“user_none” on MobileRead). He’s a very bright and competent guy. He’s got what it takes, trust me. As a community, please give him the same consideration and respect you gave me. It will take him many weeks (if not a few months) to get fully up to speed with the codebase, so patience and a warm welcome from the community will make sure things go smoothly.

To those that are still on the fence on whether to contribute to Sigil’s development or not, get off that fence and help out John. :)

Thursday, June 16, 2011

New maintainer needed

Ok, so this is a tough post to write. A lot of you are going to take it the wrong way, I know that. It’s human nature. But try to resist that urge.

Here goes: I plan to transfer project ownership and development to a new maintainer by mid-September at the latest.

Don’t panic!

Let’s take this step by step. First, the reasons.

The mighty “Reasons”

As some of you may know already, I started Sigil while I was a CS undergrad. I made it both my Bachelor’s and my Master’s thesis so that I could justify the time investment and continue working on it.

All of that is now over. I graduated. I got a job. Life got complicated in numerous other ways. That’s three reasons right there. But there are others. For instance, I’ve been looking at this codebase for over two years now, and frankly I’ve grown weary of it. I want to work on other things in my spare time, contribute to other open-source projects. Running such a popular project comes with it’s own set of obligations and responsibilities, and I’d like to not have them anymore. I’m worn out and tired. Some of this has been covered in the “serious conversation” post.

Bottom line, come mid-September, I don’t believe I’ll be able to continue contributing to Sigil at the level both you and I are used to. It will be time for someone else to take over and lead this project to future pastures.

How’s all this going to work exactly?

I’m not just going to hand this over to the first person who raises his hand. Fuck that. If you want to be the new maintainer, you’re going to have to earn it. What do I mean by that? Well you’re going to have to convince me that you can manage this beast. There’s a distinct reason why I brought this all up three months in advance. Currently, I have all the time in the world[1]. This means I can and will dedicate a lot of time to Sigil. I plan to get a lot of stuff done in this time frame. I’ll also be more than willing and able to answer any questions a new Sigil developer might have.

If you want to become a new maintainer, here’s the process:

  1. Send me an email to sigil@gmx.com (or my personal email address, or a PM on MobileRead). Introduce yourself, your background etc. Basically say “hi”.
  2. Come up with some feature you’d like to introduce to Sigil or a bug you’d like to fix. Start small. Explain to me how you’re going to implement the change, code-wise. If you don’t have a clue what you want to work on, just say so and I’ll get you something small and easy to whet your appetite.
  3. Create a new clone of the codebase. Sigil uses Mercurial. Don’t know Mercurial? Here’s a great tutorial by Joel Spolsky.
  4. Implement the change and push it to your clone. Please do your best to follow the code style. I know it’s not written down (I’ll fix that eventually), but it shouldn’t be too hard to figure it out by just looking at the current code. Don’t worry too much about it, I’ll point out any issues if they arise. A few pointers:
    • Always keep readability in mind, it’s the single most important factor of any codebase.
    • Use long, descriptive names for functions and variables. Again, look at the current code.
    • Keep functions short and simple. Split long functions into multiple smaller ones.
    • Comment the why, not the how. If the how is really complicated, then do comment it, but try to make the code simpler first.
  5. Tell me when you’re done so I can take a look at it.
  6. I’ll merge your changes after any points of contention are resolved.
  7. Don’t forget you can ask as many questions as you want. I’ll be glad to help.

Continue making good changes to the codebase and we’ll get along swimmingly. Do great and supervision will gradually be reduced. Do awesome and become the new maintainer, eventually.

I’d love it if several people came forward wishing to contribute, even if they didn’t want the maintainer’s role at the end. The more the merrier. In fact, I’d love it if one of the major commercial users of Sigil hires a group of people to work on the project. I’d gladly transfer control to them as a group, as long as promises are made that Sigil will remain FOSS.

I’m a user of Sigil. Should I panic?

Don’t panic. Popular open-source projects don’t just die out when the original creators move on. That just doesn’t happen. In fact, new maintainers taking over is a natural progression of the FOSS model.

I’m not going to disappear either. I’ll be around and active for the next three months at least, and I’ll probably contribute small fixes/features on an irregular basis after that too.

[Related discussion can also be found in this MobileRead thread.]

Footnotes

[1] Actually nowhere near it, but let’s pretend.

Thursday, June 9, 2011

Thesis done, job secured… next?

Two important things happened. The first is that I have finished my Master’s Thesis. My mentor signed off on it and it’s currently being printed and bound. That’s a huge weight off my shoulders. Everything is on track so I should be graduating in July.

The second important event is that after months of job offers, interview prep, flying around the planet so the interviews could be conducted etc., I have finally accepted an offer. From Google, no less. I’ll be moving to Mountain View in the San Francisco Bay Area some time in September or October of this year.

I’d like to thank all the companies I’ve interviewed with/received offers from for their time and consideration.

What does this all mean for Sigil? Well it doesn’t impact my timeline much. I have pretty much the entire summer to screw around, and a lot of that time will go to Sigil dev work. After that… well that’s material for a different blog post you should see in a few days.

Thursday, May 5, 2011

Why is Sigil 0.4.0 taking so long?

You, the users of Sigil, deserve an answer to this question.

First of all, it has nothing to do with Sigil itself. The amount of work needed to fix the remaining release-blocking bugs is not that large. It’s just that I don’t have enough time to dedicate to Sigil right now.

Let’s back up a bit. Remember that in February I talked about how my university studies are pretty much over? A part of that meant that I needed to start looking for a job. There were (and there still are) a lot of options and offers on the table, but I wanted to explore opportunities of working for one of the big international tech corps.

So I wrote a resume and sent it to several companies I thought looked interesting. All of them contacted me with a desire to start the interview process.

This has required two things:

  • Time to prepare for all the technical interviews.
  • Time to fly from one part of the planet to another so the interview could be conducted, and then back again.

This doesn’t leave a lot of room Sigil work. Sorry. All schedules for 0.4.0 are in the water since as soon as I return to Zagreb from one set of interviews, I have to prepare to leave for a different set with only a few days between them to rest up and handle local affairs. Don’t forget, this all goes on top of my usual non-Sigil related workload.

I have at least another week of this. Ugh. After it’s all done and all the offers (or rejections) are in, I’ll examine all of my options and then pick a company.

Friday, April 22, 2011

Sigil 0.4.0RC1

We’re in Release Candidate territory now. You can get the new release from the downloads area. Changelog follows:

  • fixed an issue with splitting on SGF chapter markers creating the new HTML files in the wrong order (issue #828)
  • fixed a rare crash/memory corruption issue with automatic OPF updates
  • made the CSS resource path updates faster and more robust
  • updated FlightCrew so that CSS resource use is now far more robustly detected (issue #822)
  • fixed a rare hang when opening the Meta Editor on Win XP machines
  • a more accurate error message is now displayed for problems with file loading/saving (issue #772)
  • fixed an issue with incorrect font filepath updates in the CSS (issue #736)
  • fixed an issue with paths in the OPF not being URL-encoded (issue #823)

Looking at the outstanding bug reports, 0.4.0 is pretty stable now. There’s only one bug that I really want to fix before making the FINAL, and that’s issue 813. You can’t see the issue on the tracker since it’s private, but issue 837 is a duplicate of that issue and it’s public. Basically, there’s something wonky going on with FC validation on Macs when the FC engine is called from Sigil. I still haven’t been able to fix this, mostly because my access to Mac hardware is limited (and I can’t really debug a Mac issue on a Windows box, can I?). I expect to kill this before FINAL.

Anyway, use the new RC and report any bugs you find.

Thursday, March 24, 2011

Sigil 0.4.0β3

The third beta of 0.4.0 is now in the downloads area. Changelog follows:

  • added a workaround for loading broken epub files that use the incorrect mimetype for the NCX (issue #815)
  • fixed a rare issue with loading epub files with OPF's that had the XML version set to 1.1, but also had other attributes in the XML declaration (issue #812)
  • fixed an issue with entries appearing at random in the TOC after a split on SGF markers is followed up with "Generate TOC from headings" request (issue #804)
  • fixed an issue with files not appearing in the Book Browser after a split on SGF markers (issue #816)
  • fixed a regression that caused the auto-cover-setting heuristics to unset covers when they are already set (issue #806)
  • opening an HTML file now automatically builds a TOC from the headings
  • fixed a regression that caused the Add Semantics menu to stop working (issue #807)

Again, keep the bug reports coming. :)

Monday, March 21, 2011

Sigil 0.4.0ß2

New beta is now in the downloads area. Changelog follows:

  • fixed an issue with the ID of the manifest item of a resource not being updated when the resource was renamed
  • fixed an issue with chapter splits being created in the wrong reading order (issue #797)
  • fixed an issue with some HTML files disappearing after a save/load cycle if chapter splitting was performed before the save
  • fixed an issue with renaming a file and then splitting it causing duplicate ID's in the OPF manifest (issue #800)
  • fixed a validation issue with the Meta Editor not adding the correct namespace prefix to some Dublin Core metadata element attributes

Keep the bug reports coming. The more you report, the more I can fix and the happier we'll all be.

Sunday, March 20, 2011

Sigil 0.4.0β1

So the first beta of 0.4.0 is now out. You can get it from the downloads area. The changelog follows:

  • fixed an issue with CSS @import rules in the '@import "something.css"' format not being recognized and thus not updated on import
  • removed the "CustomID" basic metadata entry from the Meta Editor; those wishing to use custom ID's can now add them directly to the OPF
  • Sigil now preserves custom unique identifiers in the OPF (issue #552)
  • removed support for the Sigil-proprietary SGF format
  • the user can now edit the OPF file by hand (issue #281)
  • the user can now edit the NCX file by hand (issue #282)
  • the OPF file is now preserved on import (issue #586)
  • the NCX file is now preserved on import (issue #283)
  • the Table Of Contents editor has been replaced with a new Table of Contents sidebar; clicking on an item in this sidebar takes the user to the target location, enabling TOC navigation (issue #100)
  • a dialog now informs the user if his XHTML, NCX or OPF documents are not well-formed XML (with error location and description), thus allowing him to fix the potential problems by hand instead of leaving them to Tidy to fix (issue #519)
  • fixed a rare issue with no tab opened by default when loading epubs
  • made the sigil.sh startup script more robust (courtesy of Craig Sanders) (issue #737)

The big features everyone has been waiting for are now in. The OPF and NCX are now preserved on import, and you can edit them by hand. You can still let the GUI do everything, but you now have the power to make changes directly.

The TOC editor is gone and has been replaced by the TOC sidebar. A button at the bottom of the TOC sidebar allows the generation of the NCX form the headings in your epub. You can ignore that button if you want and just build the NCX by hand. Typing in the NCX file updates the TOC sidebar live, using a background thread to keep things fast and responsive. Clicking on an item in the TOC sidebar takes you directly to the target location. Yes, you can now navigate the epub with the TOC.

A new dialog is now presented to the user when you screw up the markup in your XHTML (and NCX and OPF) files so badly that they become ill-formed XML. Previously Tidy was used to correct such errors, but now this dialog informs you of the problem and lets you fix it by hand if you wish, with the “Fix Manually” option. By selecting the “Fix Automatically” option in the dialog, everything goes through Tidy and it will then fix it for you, just like it did so previously. Basically the dialog is there to let power users skip Tidy’s error correction and fix the problem by hand. Naturally, the dialog tells you the line/column location of the error and a brief description.

With this dialog and the previous Tidy cleaning on/off button (which controlled the dreaded “clean” option), the negative aspects of Tidy are now something you can completely avoid.

This whole NCX/OPF preservation and editing by hand deal required a very thorough rewriting of a lot of Sigil’s internals. This has certainly brought many bugs so bear that in mind.

The whole point of the beta process is to discover and report bugs in unstable versions of the software. So report any bugs you find! And don’t be surprised when you find them, because you will certainly encounter some of them in this beta. If you’re uncomfortable with that, then don’t use the beta. Wait for 0.4.0 FINAL. This is pre-release software, that’s why it’s called a beta.

There were some features that I initially planned for the first beta but that eventually didn’t make the cut. I spent days on one of them to finally realize that it simply isn’t going to work (through no fault of mine, but it’s a long story for a new blog post). The other major feature that didn’t make it in was simply pushed back to a future beta. I’d like feedback and bug reports on the massive rewrite needed to implement OPF/NCX preservation/editing sooner rather than later, so I thought it wise to push the first beta out today.

Tuesday, March 8, 2011

Sigil 0.3.4b for Windows

There was an interesting problem with the recent OpenCandy integration: Microsoft Security Essentials started flagging the Sigil Windows installers as containing adware. From discussions with the OC people, this appeared to be because the MS people didn’t like that OC was storing a certain key in the registry and then transmitting it to the OC servers. The key acted exactly like a browser cookie, and pretty much every website you’ve ever visited stores one on your computer. Other AV vendors and security researchers agreed with the OC folks that this is just fine and doesn’t constitute “adware” behavior, but the MS folks disagreed.

So after two weeks of back-and-forth, the OC folks have decided to remove this specific behavior from their SDK. For their part, MS has agreed that the new OC plugin isn’t adware even by their definition, so installers using it will not throw warnings.

I have just uploaded Sigil 0.3.4b for Windows to the downloads area, and it contains this new version of the plugin. MSE has no complaints about the new installers.

Sigil itself remains unchanged in this version, so if you have 0.3.4 installed, there is no reason to update.

Sunday, February 27, 2011

The last nail in the coffin of SGF

The next version of Sigil (0.4.0) is dropping support for opening documents in the old SGF format that was proprietary to Sigil versions 0.1.0 through 0.1.9. Sigil 0.2.0 dropped support for writing files in that format (that was a year ago), and now the second part of that transition is happening. That previous link describes what SGF is if you’re unfamiliar with it. Bottom line, if you’ve never heard of it, you never will and shouldn’t care about this change at all.

For those that still have some SGF files left, just open them with any version of Sigil <=0.3.4 and save them as epub. That’s all there is to it. All those versions of Sigil will be available for download pretty much forever, so there’s no rush.

Sunday, February 20, 2011

An analysis of EPUB3 (and, uh, a bit more)

[I swear when I’m frustrated. That makes this post obscene even by Chris Rock’s standards. Proceed with caution. Also, this was (and is) supposed to be about EPUB3, but as I kept writing it, it kept growing. Fuck it, I’ll just post it as it is.]

The IDPF published the current draft of the new EPUB3 spec a few days ago. Time to see if was worth the wait. Note that this will be a long post.

I’ve read all the sub-specs of EPUB3, and my general feeling about them is one of… “meh”. That’s the best way I can describe it. “meh” leaning towards “not good”. I jotted down notes in my notebook while I was reading them[1], and what follows is a digested summary of my views and sentiments.

Assume I agree with everything in the specs I don’t explicitly disagree with here[2]. Also, while I’ll take this opportunity to mostly rant about the bad parts of EPUB3, keep in mind that there are quite a few good parts as well.

All hail the mighty iPad

I’ll start this section by saying how I have absolutely nothing against JavaScript on the web. That would be stupid. I mean, without JS all of this web app business would not exist, and while arguments against “cloudification” of applications have weight, no matter where you stand on the debate, you can’t say that Gmail isn’t useful.

But we’re talking about books here. In EPUB2, JavaScript execution was under RFC2119 “SHOULD NOT”; for all intents and purposes, this means forbidden. It’s not a “MUST NOT”, but still.

In EPUB3, JS support is now optional. This means you can start using JS in your epub books, yippee! You can now go all Web 2.0 on you e-books. I’ll talk about why this is bad in a moment, but first I’ll like to give credit where credit is due and note that the spec text explicitly mentions  that content creators should avoid using JS if at all possible. Here’s a quote:

Scripting consequently should be used only when essential to the User experience, as it greatly increases the likelihood that content will not be portable across all Reading Systems and creates barriers to accessibility and content reusability.

Sadly, no one will listen to this. But at least the IDPF has this warning, even though it won't do shit. Now that JS support has moved from "should not" to "optional", people will go out of their way to redefine "essential to the user experience" so that it includes JS. This will break horribly. We'll get epub books created solely for iBooks and all other Reading Systems can go to hell. Progressive enhancement? We will never see it. The people who create epub books are not web developers, they work in publishing. They have no idea what writing code for the web looks like (or writing code at all for that matter), so we'll see hacks upon hacks that work on iBooks pretty much by accident and on nothing else. I've always said that the day that the epub specs start mandating JS support is the day those same specs jump the shark. We're not quite there yet, but the gates of Hell are now slightly ajar.

This is what will happen:

  1. EPUB3 brings "optional" JS support.
  2. Publishers start adding crappy JS to their books hoping it will make them "stand out", "embrace the future", "fuck goats" or whatever.
  3. We now have thousands of books with JS scripts in them that are absolutely useless but whose execution is nevertheless required, otherwise reading the book is impossible. You know, things like special navigation menus, buttons to expand example source code, footnotes in "tooltip" style windows or similar "brilliant design ideas" that stop working when you don't run the book's JS.
  4. EPUB4 now demands JS support. I mean really, you can't expect publishers to go over all those crappy epubs and rework them with progressive enhancement in mind, do you? No, no, no. They'll just lobby the IDPF to make JS support mandatory, and they'll succeed.
  5. Welcome to the web circa 2000! Ah, what a fun place that was.

But I don't blame the IDPF for moving JS support from "should not" to "optional". Actually, that's a lie. Of course I blame them. But I understand they had little choice when a lot of the people who make up the EPUB Working Group are the same people who have abused the term "HTML5" to the point where it doesn't mean anything anymore. Quite literally, nothing. It’s become a whizz-bang-pow marketing term hyped into oblivion by a fruit company. This “HTML5 is love, sex and the future of human civilization” nonsense has even pushed the WHATWG into renaming the spec to just “HTML” (even though they won’t admit the reason publicly). That’s right, the term “HTML5” now officially stands for nothing. Here’s a funny link about this “HTML5 means everything”. Bruce Lawson is specifically calling out the CSS3 == HTML5 == JavaScript idiocy, but you get the point.

Interactivity in books? My God, how ever did books survive the last five thousand years without JavaScript, <video>, <audio> and <canvas>? It boggles the mind.

Publishers: "HTML5 BOOKS? MOAR! LOOK MA IM IN TEH FUTUAR! We're totally not going to go extinct now!"

JavaScript, <video>, <audio>, <canvas> in books == "This book needs more cowbell."

I know I’m being a cynic, but I can’t help myself. The iPad came along, was declared “the savior of the publishing industry” and now everyone seems to be losing their mind.

Again, “HTML5”? Great for the web. Actually, fucking awesome for the web. For e-books? I don’t remember the last time I thought “this book really needs some video”. In fact, 99.99% of all epubs would be far better off with only the most basic HTML and maybe a few lines of CSS.

I know it’s not the IDPF’s fault this is all going to be so shamefully abused, but I still think it should have all stayed at the “SHOULD NOT” level. You want interactivity in e-books? That’s not an e-book, that’s an app. Go make one. You are not going to be able to write an interactive book and expect it to run on all the Reading Systems the same (or at all). That will not happen. Save yourself a lot of trouble and just make an app.

For those with the brilliant ideas of tooltip windows, custom navigation menus and the like which books would be far better without, just don’t do it. No, it does not look “sharp”. Or “hip”. Not even “trendy”. What it is, is stupid. You’re making a book, not a website. Please bear that in mind.

I’m sure that there are valid use cases for all of these technologies in books. A smart person using them appropriately can truly make something wondrous. Sadly, most people think they’re the smart ones. They’re usually not.

I remember how all this got started. Back in the old days, when EPUB was just an idea, this was the train of thought: “How are we going to represent electronic books? Raw, custom XML like in DocBook? Huh… maybe it would be better to use web tech like HTML. It is widely understood and there are ready-made components that will make it easier to build Reading Systems.”

So web tech was used because it lowered the barrier to entry. Instead of using DocBook’s <para>, why not use HTML’s <p>? We get free styling with CSS too.

But this changed. Now it’s not “we’re using web tech to make e-books”, it’s “we’re using e-books to package web tech”. It’s not about making books anymore, it’s about using web tech offline. You think I’m exaggerating? Do you know what term was used to “succinctly describe EPUB” during development of EPUB3? Here it comes: “website in a box”. I’m not kidding. It was used in the IDPF meetings and was even in the November 12, 2010 draft of the EPUB Overview document. Here’s a direct quote:

An EPUB publication can be thought of as a "website in a box".

No. No it shouldn’t be. Never ever.

No required glyph coverage

Honestly, the worst problem with EPUB2 was that there was no required Unicode glyph coverage. Let me explain what that means. On the surface, EPUB is all Unicode. Everything has to be either in UTF-8 or in UTF-16. “That’s great! This means I can use any letter I want!” Not quite. While you can specify any letter you want, Reading Systems aren’t required to display that letter. It would certainly be unreasonable to mandate that all RS’s have glyphs for the entire Unicode range, but there is no minimum coverage specified either, and that would be a good thing. I was hoping this would be fixed in EPUB3, but no such luck.

With the way things are, RS’s can just support ASCII and be done with it. Some support more than that, some support only that. Yes, you can get around this problem by embedding fonts with the required glyphs in your epubs, but most people don’t know they have to. See this FAQ entry for the most popular question about Sigil. I couldn’t even begin to describe the number of people who say infinitely moronic things like “Sigil doesn’t support Unicode” because the book they saved displays as a bunch of question marks in ADE and in all the hardware readers that use Adobe’s RMSDK.

It’s not just Sigil’s problem, it’s everyone’s. People have made epubs that were tested only on the iPad, and since iBooks has fonts with wider glyph coverage than ADE, some characters in those books end up as question marks over there.

There should be some minimum coverage specified. One might ask "but where do we draw the line at mandated coverage? Should CJK support be mandatory? Where is the line?" You're right, those are tough questions. That's why we have a Working Group, to answer them. Too hard to draw a line somewhere? Ok, how about adding one of those shiny RFC2119 "SHOULD" statements asking for greater coverage? It wouldn't do shit, but hey, it just might.

The problem is that nobody at the IDPF seems to give a crap about this problem. That's what we get when the vast majority of Working Group members live in ASCII land, I know, but these guys are making an international standard. Show some breadth of understanding. This lack of mandated coverage is a far bigger problem of EPUB2 than "well damn I can't put video in my books". Trust me.

Living in fairyland

There are plenty of things in these new specs that are wonderful or interesting on the surface, but will never see the light of day. Things like "container-constrained" for JavaScript (great idea!) or the “epub:trigger” element (silly idea, people will just use JS). But they will never be supported on the various Reading Systems, and if they will be, then no one will use them. People who make RS's are by-and-large hacks (exceptions do exist though) who slap some custom controls onto WebKit and call it a day. They won't modify WebKit to support epub-specific elements. That's "too hard". Am I the only one who remembers EPUB2's custom "switch" and "case" elements? Or inline XML islands? Or the whole of DTBook? The only RS that supported those was ADE (hats off to Adobe for trying, I really mean that). Everyone else just pretended it didn't exist. And not even Adobe implemented support for “oeb-page-foot” and “oeb-page-head”, and those were damn useful (on paper at least).  

History has shown that wherever the EPUB specs went beyond what popular browser engines implemented, the specs were actively ignored. It's just "too hard". It's not, of course. It "merely" requires two things: competent developers and people who give a shit. Both of these are very, very rare. Combined? Good luck. Oh yes, it requires one more thing: actual fucking work. Not just taking an existing browser engine, making it display XHTML in pages and calling that an RS. No, actual software development is required, and the most difficult kind of all: working with a huge, foreign code base. That's too much for most.

As an example, I have to work with HTML Tidy since it's an internal component of Sigil. I can't tell you how happy it makes me to know I'll have to implement an HTML5 parser for it because of EPUB3. I'm truly ecstatic about this prospect. I fucking love the very idea of it. But I'll do it because it has to be done. And for the love of God bear in mind the difference in the quality of code between something like WebKit and Tidy.

Tidy could easily be the world's most horrible code base. It's 40k lines of straight-up C, written in the most god-awful way. 800 line functions; cryptic, single-letter variable names; hacks upon hacks that step on other hacks; source comments that are either out of date, worthless[3] or usually just plain wrong. Just... absolute, worthless junk abandoned by the original devs (and those that followed them) many years ago. Nobody works on Tidy anymore, at least not with the official project.

And yet I work with it because I know I have to. WebKit source is worked on and maintained by hundreds of people and it's extremely well written. RS developers, get off your damn asses!

To tell you the truth, I've been thinking about implementing an open-source RS for both the desktop and memory- and power-constrained devices "just to show 'em how it's done". I have some sweet, sweet ideas for it. But I can bloody barely find the time to work on Sigil and FlightCrew. A third project? I can't put the gun in my mouth fast enough.

And don’t even get me started on the “quality” of the Reading Systems out there. I remember the day when we used to complain about ADE. Today, ADE is pretty much the best RS available. Do note that I said “the best available”, not “great”. Today, I’d give it a C. Everyone else gets an F on a good day, and a kick in the balls otherwise.

The worst is certainly iBooks, as any epub creator will tell you. Ask one about their opinion of iBooks, and you can rest assured that the response will be filled to the brim with “fuck”s, “shit”s, “cunt”s, “motherfucker”s and “asshole”s. Apple loves to boast about support for open standards and how they’re important. As long as we’re talking about killing Flash. The EPUB specs can go fuck themselves. It’s not that they’re lazy, incompetent or don’t feel like investing the resources to improve their support for EPUB. It’s not about “missing”  functionality. They intentionally went out and broke things in Mobile WebKit to further their agenda. If they ever tried something similar in Safari, there would be a pitchfork-mob in Cupertino.

But the number of people who make EPUB books (or work in publishing in general) compared to the number of people who develop for the web is… somewhat small. Not to mention that we as an industry are too busy sucking Apple’s dick to notice what’s going on. They can safely brick in our mouth. Oh no, Apple demands 30% off all subscriptions, in-app purchases and a lowest price? That really is the last straw. We’re now going to start sucking that dick very, very slowly in protest. That will teach ‘em!

You’d think I have something against what Apple is doing. Not at all. If someone lets you exploit them, by all means, go right ahead. Apple is screwing us only so much as we as an industry let them screw us. And now that people are starting to come around, we’re all like “OMG we have fifteen inches of Apple’s cock up our ass! What the hell happened?!”. We let it happen, that’s what. Inch by inch they kept shoving it, and we let it slide (yes, pun intended). Now we’d like them to back up a bit. Well guess what, when you have fifteen inches of cock up your ass, it’s hard to negotiate. The cock is the one setting the terms.

Weren’t we talking about EPUB3?

You’re right, I forgot. Here are some bullet points since I’m tired:

  • Greater emphasis on accessibility in the specs. Good that someone realized that the e-book movement is a godsend to people with poor vision. There was some support for this before, true, but now it's more front-and-center.
  • "xml-stylesheet" support is required? Interesting and unexpected. I doubt any RS's will actually support it though.
  • "This schema is normative. In case of conflicts between the specification prose and this schema, the schema shall be considered definitive." Hell yeah! At least now we know which is considered definitive. Trust me, you'd encounter this problem if you ever tried to implement a validator. But NVDL, RELAX NG and Schematron? May as well say "you have to use Jing". Some of us don't want to. How about providing an XML Schema schema? It's standard practice. Great, now I'll have to write my own... again...
  • Supplementary resources with <link> in <metadata>? Fancy.
  • DCMES metadata elements are being replaced by DCTERMS properties. This really is a good idea, the new system should be much more flexible. The transition period will be ugly, yes, but it's necessary. Good call on both the replacement and the transition.
  • "Although the EPUB Navigation Document is required in EPUB Publications, it is optional to include it in the spine." Yes! This will eliminate the need for those ugly "inline TOCs" people like to build where they would basically end up with two different files describing the TOC. Now the NCX is basically an XHTML document that can be styled, and if you really want it in the reading order, go right ahead and include it in the <spine>. Very nice.
  • “page-spread-left" and "page-spread-right" on <spine> <itemref>s. Nice, but how many books use two-page spreads?
  • Embedded MathML support is great. Nobody will care about the restrictions the IDPF has placed on its use. When RS's support MathML, it will be because the browser engine they use internally supports it, and that engine couldn't care less about the IDPF's MathML "restricted subset".
  • page-list nav gives support for cross-referencing an epub with the page numbers of a printed edition. This is important and as such will be used by publishers and (should be) supported by RS's… eventually.
  • landmarks nav replaces the OPF <guide>. This is also very good.
  • Media Overlays: feel free to ignore the existence of the entirety of the Media Overlays sub-spec. I really mean that, you don't even have to read it. Just pretend it's not there since nobody will ever implement it. To add insult to injury, support for it is officially optional, so nobody even needs to implement it. It's dead on arrival, much like DTBook was as a valid EPUB2 OPS syntax.

    Don't get me wrong, it would be great if RS's supported this. But they won't. Nobody ever made crazy money by catering to the visually impaired.

Canonical Fragment Identifiers

This deserves a separate section.

Canonical Fragment Identifiers are ridiculous, at least the scheme presented. They're complex to the point of absurdity. Even-numbered indices so as "not to be sensitive to XML parser handling of whitespace, entity references, and CDATA sections."? This is ridiculously over-engineered. It has support for not only pointing at elements and their textual content (worthy goal), but at pixels in raster images, logical units in vector images, temporal locations in audio an video and if I understand the exclamation mark rules, even support for crossing documents.

Look, you can't cram all that into a single scheme. You just can't. The WG should have just stopped at trying to point at textual content. The CFI scheme as written is silly and reminds me of the crap in SVG 1.2. Most people don't know that SVG requires support for things like raw sockets and file uploading. There's this desire in specification working groups to support every single use case imaginable and then some. Common sense goes out the door, and nobody is either willing or able to just say "NO!" to some of the requests. This is exactly the kind of thing people think of when they say “design by committee”.

This CFI scheme is absurd. The way it's designed, no one will support it. I know I certainly won't.

Suma summarum

That’s it. I’m all out. EPUB3 is nice, but most of it will be either a) misused or b) ignored. Neither is really the IDPF’s fault, but some of it is. The parts that people will support are nice and shiny, like page-list nav instead of the NCX or DCTERMS instead of DCMES.

What do I know. Ignore me.

Footnotes

[1] I ended up writing more than five pages, but most of the notes relate to low-level understanding of the changes from EPUB2, possible contradictions and implementation problems and the like. You know, the things I’ll need to pay attention to whilst working on Sigil and FlightCrew, but usually not things content creators need to care about.

[2] That’s probably not accurate, but let’s pretend it is. There are some things that are just not worth complaining about.

[3] "Thanks to X.Y. for reporting a problem with this function!" What kind of developer would actually write that above a function?

Thursday, February 17, 2011

Official Twitter account for Sigil

For those who missed the MobileRead thread, here’s a copy of what I said there:

I've decide to create an official Twitter account for Sigil. Why? Because on many occasions I've felt the desire to provide more up-to-date info on Sigil's development, but didn't think it was worth writing a whole blog post about.

Now it will be possible to occasionally say something like "I'm working on X, and it looks like this: [PICTURE]", or "Life got complicated, dev work stopping for a couple of weeks", or "A new release should be coming in the next few days" etc. That last example is a good one; such pre-announcements would enable people (and especially larger organizations) that are using Sigil to prepare for a new release.

The updates should be fairly low-volume (also, erratic; I'll post when I feel I have something worthwhile to share with others).

FlightCrew-related info will also end up there.

The dev blog will still be around as it's not being replaced.

There’s already a screenshot of a few new 0.4.0 features in action. Check it out.

Tuesday, February 1, 2011

And final exams are over… for good

About a week ago I had my final final exam (that sounds weird). This is it now. For the entire spring semester I have only my Master’s thesis to write, and it’s all finally over. While my friends will spend the next several months writing code for their thesis project, I’m basically free of this chore. Why? Well I’ve been working on Sigil for the last two years, and while it will never technically be finished, it is as far as the university is concerned. My internal thesis goals for it have been reached many moons ago (back in the 0.2.0 rewrite). That’s not to say I plan to stop working on it, but that I have no deadline pressures. Yippee.

I’ll be working on it (hopefully) pretty much the same way I’ve been working on it until now, and that’s “when I find the time and the inclination”. Seeing how I don’t have any immediate obligations (I can write the ~50 pages for my thesis in a week or so, and the deadline is in June), the “time” part has gone up immensely and the “inclination” part is still holding strong. So hopefully I’ll be getting some serious work done.

Actually there’s a mix of things I want to do in this fairly large block of free time. Here’s a list in approximate order in which I plan on doing them:

  1. REST! I spent the last week basically just sleeping and breathing. The obligations, pressures and sleep-deprivation of the last several months have taken their toll. This R&R goal is fairly short-term. On a related note, it’s been a while since I spent a whole 24 hour period with my girlfriend. It’s strange how I almost forgot that those make life worth living.
  2. Read some fucking novels. I make a goddamn e-book editor, it’s about time I get the chance to read a few of the e-books I’ve made for myself. I used to devour novel after novel, and while I still read voraciously, the books I read don’t exactly qualify as literature. If I see just one more textbook on machine learning, cryptography, project management, numerical analysis or discreet math in general, I’ll claw my eyes out.
  3. Read some technical books. I enjoy learning new things and deepening my understanding of the things I think I already know; besides, in this industry one needs to be constantly sharpening their skills. For instance, I’m currently going through Jon Skeet’s C# in Depth, Second Edition[1]. After that, it’s Chris Smith’s Programming F#. I know a bit of F# (and I really mean a bit), but it’s time to get serious. It’s a truly wonderful language, and I’ve been enamored with the functional paradigm since my brush with Haskell two years ago, but F# strikes me as more practical than Haskell. Also, a book on advanced Python wouldn’t kill me, but I’ve yet to find an appropriate one[2].
  4. Start looking for a job, I guess. I hate this last part…

Anyway, this change of pace will do me good. A lot of it, I expect.

Footnotes

[1] I’ve literally been waiting for the second edition to come out for years. It came out in November, and I’ve been waiting for some free time to read it ever since. Now, I’ve written tens of thousands of lines of C# code and I’d like to think I’m more than just proficient with the language, but Skeet can teach anyone something new. That includes me.

[2] I really mean “advanced” Python. I know Python fairly well; I’ve read Dive Into Python 2 many years ago and I’ve used the language on many projects since then, but I’m looking for a book from a true expert in the field that goes into the nitty-gritty. Something like Skeet’s C# in Depth, but for Python. The official Python docs only go so far. They’re also a bit… let’s say conceptually scattered.

Wednesday, January 12, 2011

Sigil 0.3.4

There was a regression in Sigil 0.3.3. This release is meant to address that. Changelog follow:

  • fixed a regression ("Not a folder" error) with opening certain epub files on Mac and Linux systems (issue #731)

This was caused by a bug in the ZipArchive library which I updated from 4.0.1 to 4.1.0 for Sigil 0.3.3. The bug has now been fixed.

The bug only affected Mac and Linux users, so Windows users don’t really have a reason to update.

Saturday, January 8, 2011

Sigil 0.3.3

The new release is in the downloads area. Changelog follows:

  • added a small "Donate" button to the toolbar and a related entry in the Help menu
  • added a .desktop file for the make install target (courtesy of Richard Gibert)
  • this time *really* worked around a Tidy bug that added blank lines to the start of <pre> and <style> elements (issue #655)
  • updated ZipArchive from 4.0.1 to 4.1.0
  • fixed a regression crash bug with loading extremely rare HTML content documents that have an internal DTD subset
  • added a workaround for a crash bug caused by invalid epubs that use obfuscated fonts but with incorrect UUID URN key syntax (issue #709)
  • integrated the FlightCrew epub validation library; a new toolbar icon triggers epub validation and displays the results (issue #28)
  • fixed a rare input truncation problem when the input file contains a unicode nbsp and also specifies standalone="yes" in the XML declaration (issue #677)

So there it is, FlightCrew is finally a part of Sigil. There’s a green checkmark in the toolbar now; use it to trigger validation. You can also double-click in the list of validation results and Sigil will take you to the location of the error, if available and possible. Why “if possible”? Well Sigil doesn’t display the OPF and NCX files to the user (yet), so if you get a validation error in one of those, it can’t take you there. There are other limitations to the “take me to the error” feature, but it should work fairly well for XHTML files.

Also, this release brings OpenCandy to the Windows installers. Read about the reasoning for that in this blog post. A new installer builder is used for Windows installers (Inno instead of InstallJammer) which should fix some of the outstanding bugs with the IJ ones. Please uninstall the old version of Sigil first! Nothing really bad should happen if you don’t, but just to be on the safe side, please do.

Several people have also suggested and asked for a donate button in the UI, so one has been added. It’s a small red heart. You’ll be able to turn it off as soon as I make an options screen. :D

I’ve also made some changes to the Mac build system, so if I screwed something up, please yell as loudly as possible… or just create an issue on the tracker about it.

Sunday, January 2, 2011

A more serious conversation

[This (very long) post contains a certain amount of frustrated swearing. You have been warned.]

Open source applications usually face one of three fates:

  1. If they’re not popular, they will be abandoned and forgotten.
  2. If they’re merely useful on occasion, they stagnate.
  3. If they’re popular and used often, they will get bloated beyond both recognition and usefulness.

I’m going to talk about the third one.

A developer has an idea for an application. There’s a certain task he needs help with, and nothing out there really meets his needs. So he decides to sit down and design an application that will perform the required functions and help him achieve his goals.

He decides that he will open source the code, for whatever reason. Maybe he’s an idealist. Maybe he’s a moron. It doesn’t matter, he just does it.

The application proves popular, and people start coming to him with requests for new features and bug reports for existing ones. Proud of what he has created and more than happy to help others, he starts spending more and more time working on his app.

He has a vision of what the application is supposed to look like; he sees the big picture and where all the features—both current and future ones—fit in. Some people disagree with this picture, but that’s their problem; he likes it. He thinks it’s the bee’s knees, the cat’s whiskers and a bottle of Johnny Blue all making sweet, sweet love together. Basically, he thinks his view of what it’s supposed to look like is fuckin’ A.

He also realizes that any piece of software can’t just keep piling up obscure features; there’s a certain threshold of diminishing returns where the new “features” start getting in the way of using the old ones even for the old-timers, and the new users have no fucking clue what they’re looking at. It all just becomes an ungodly mess of buttons, windows and menus that no one sane can use to get any sensible work done even after reading through a 500 page manual.

For a software application to stay away from bloat, the features added need to be useful for most of the users; not for just a vocal minority, and certainly not for just a few. Every new feature you add makes it slightly harder to use the old ones; it adds to the noise and eventually makes it difficult for the user to find what he wants and impossible to discover it through exploration.

All of this entails saying “no” to the requests that could otherwise be considered sensible, but that just don’t “jive” with the idea of what he’s trying to build. For instance, a request to “play FLAC encoded audio files” would make sense if he were building a music player, but makes somewhat less sense for an accounting application. Again, a request for a “counter that would track the number of noobs owned”; great for Unreal Tournament, less great for an FTP client.

But these requests aren’t the problem. The requests that are the problem are the ones that are useful to the person requesting them and maybe five other people.

I’ll elaborate shortly. Let’s imagine that our imaginary developer’s application has something to do with editing text. Now let’s imagine that this application is called… Ligis. Yeah, that’s what we’re going to call it. Sounds kinda catchy.

So someone comes along and says: “I work with romance novels every day. My publisher demands that all the text backgrounds be bright pink. Now, I could select the text in every chapter and apply the color to each of these, but my life would be much easier if the main toolbar had a ‘Make all backgrounds pink’ button. Could you add one?”.

Now, what our developer wants to say is: “What are you, fucking retarded? Of course I’m not going to add that. Stop wasting my time. GTFO”. But being impolite to strangers is not something he’s willing to do (especially not over the Internet; there's too much unkindness here as it is), so he merely responds with: “No, that would not be a useful feature”. He understands that the person who’s asking for this would really find it useful, it’s just that no one else would.

The biggest problem are the people who come in with obscure feature requests like these but that are also willing to implement them themselves. This being open source and all, they’re willing to write code to get them in the mainline releases. It’s hard to say no to these folks, but a crappy feature is a crappy feature no matter who writes it. So he still says no, but hey if you’re willing to write code, the issue tracker is full of accepted feature requests you could implement. But the other dev doesn’t feel like it; he just wants his one issue in and doesn’t have the time for something else. And our original developer can respect that, honestly. Working on something for free purely for the benefit of others instead of spending the time to, say, watch a movie or read a book requires an unfortunately rare amount of altruism.

And probably stupidity and/or naiveté.

Ligis becomes even more popular. The more popular it gets, the more feature requests and bug reports that are actually valuable start coming in.

For our intrepid developer, this reaches a point where there just isn’t enough time in the day to both work on Ligis and to fulfill his other responsibilities in life. Or he just becomes too stressed out to work on it. Or he starts hating it all. Or he just flakes out. Or whatever. Again, it doesn’t matter; for some reason, he decides to stop working on it and hands off the reins to a new maintainer.

The new maintainer does what most maintainers do: he fixes the major bugs as they’re reported and maybe adds a minor feature or two. He pushes a new release once every six months, tops. He doesn’t add major features since he doesn’t have the time it would take to implement them. Hell, that’s the reason why the original developer left.

But now random people start showing up willing to work on Ligis. The maintainer gladly accepts their help. The new devs start adding the features they want, not necessarily the features that most of the users would find useful. I mean c’mon, this is open source, if the lusers want something done they should do it themselves. The new devs certainly don’t feel any kind of obligation towards them. And why should they? The new devs make drive-by changes they need and usually disappear afterwards. Even newer devs that also need some obscure feature replace them.

This goes on and on, until Ligis becomes freakin’ huge. Now it doesn’t merely have the kitchen sink; it has the sink, the toilet, the washing machine, the hair dryer, the bathtub and that little bidet shit the French use to wash their ass in. In twenty colors each. Yes, gold and papaya too.

The new devs were just throwing shit up against the wall until there was no more wall left, just a mountain of shit. Nobody had any vision of what the whole app was supposed to look like, no sense of scope. Scope? Fuck you and the scope you rode in on. I need my backgrounds pink and I need them pink yesterday.

A user of an old version of Ligis tries out one of the newer ones and feels the need to talk to the devs.

User: “Hi guys! I like the bidets in the new version. They really bring out the splash screen. I also enjoy the new mango-cyan color scheme. I have just one question though: I’ve noticed that there’s a mountain of shit where this wall used to be. There was a door in this wall, and I used to go through it every day on my way to work. Really useful door, that. Do you know what happened to it?”

Devs: “Oh the door is still there. We just didn’t have anywhere to put all this shit so we just piled it on top of the door. We have a couple of shovels in the back if you still want to get to it. This is our usual advice to the ever-popular how-do-I-get-to-the-door question. Have you read the Manual?”

User: “Uh… can’t you just remove the shit so that people wouldn’t have to shovel their way through?”

Devs: “Like we said, the door is still there. Take this shovel and the Manual, volumes three through sixteen; they will explain how to dig out a path. Watch for the landmines, the shark and the koala. The shark is friendly but the koala bites.”

User: “But why do I have to dig this long and winding path? Why can’t I for instance remove this pile of shit right here for good?”

Devs: “That pile makes all the backgrounds pink. You can’t remove that, it’s our killer feature.”

User: “And this pile right here?”

Devs: “That’s a Lisp interpreter. Don’t touch that.”

User: “*sigh*, forget the door. Could you just fix this one obvious bug? Every time I hit the spacebar the whole computer reboots. It makes it difficult to type.”

Devs: “Yeah, we’ve been meaning to get to that one. Steve has been working on it since ‘95.”

Steve: “Still haven’t been able to crack it though. I can follow the call from the GUI layer to the new COBOL backend and through the kernel modules, but I get lost in the Space Shuttle diagrams.”

User: “Screw you guys, I’m going back to the old version.”

This is what happens when the original developers go away. True, sometimes it happens while the original developers are still very much active, but that’s a different issue entirely.

The point, please?

Why am I talking about all this?

I’m talking about it because this scenario1 literally keeps me up at night. I’m able to justify the time I put into Sigil while I’m still in grad school, but that’s slowly coming to an end. I graduate soon. College takes a lot of your energy, but it’s in ups and downs. There are weeks where you spend twelve hours a day doing school-related work, and then there are “quiet” weeks. It’s during those that I worked on Sigil. Even then, I could only justify the time investment by making Sigil both my Bachelor’s and my Master’s thesis.

Since that’s all ending, it would be great to find some funding for future development. To that end, I’d like to add an OpenCandy powered advertisement to the Sigil installers for Windows2. It’s one of those “would you like to also install software X?” recommendations you sometimes see in installers. It would be turned OFF by default, meaning that if you just click Next, Next, Next, Finish, you will only ever install Sigil and nothing else. You can read an interesting CNET article about OpenCandy here.

Some key points:

  • As mentioned, any such “do you want to also install X” question would be set to NO by default (or deselected entirely).
  • The installers wouldn’t grow in size since the recommended software wouldn’t be included in the installer, only a piece of logic that would query the OpenCandy servers for a recommendation and download a different installer if the user chose to install that extra software.
  • No ads are displayed after the installer finishes. No ads would be in Sigil.
  • No malware, no adware and no spyware would be allowed to advertise. Developers get full control over the applications that can advertise in their installers, and can block anything they disagree with. An example of the type of companies/apps that currently advertise in the OpenCandy network are Kaspersky, Bing, Winamp, StumbleUpon etc. Respectable companies.
  • This OpenCandy service seems popular in the OSS world as a funding source. WinSCP uses it, Miro uses it, Audacity uses it.
  • OpenCandy would also advertise other OSS software inside the installer. That seems nice. Sigil is becoming more and more prominent, and I’d like to use that to promote other OSS apps as well.
  • Sigil is still free and open source. No proprietary code will enter the repository or Sigil.
  • Sigil functionality is not changed in any way, regardless of whether you install the extra software or not.
  • Mac and Linux users are not affected in any way.

Basically it’s just a single ad in the Windows installer. Nothing else.

But since the Sigil user community is something absolutely awesome, I’m asking for opinions. If enough of you come forward with valid concerns and arguments against this deal, I’m saying no. To tell you the truth, I don’t expect much from this deal in terms of revenue; peanuts per month, if that. But it would give me something to point at and say: “See? I can justify the time investment”. And hey, maybe it amounts to something one day.

You can post your opinions here in the comments or in this MobileRead thread. I’m not looking for a vote, I’m looking for a discussion. Bring your eloquence.

Footnotes

[1] OK, something less insane but you get the point.

[2] They actually approached me about it. Good timing.