Not All eBooks are Created Equal

Our colleague, Ross Carter, told us about a clever comparison he devised for testing eBook readability on his iPad. We expected he’d find some differences, but were surprised by just how different eBooks can be, even eBooks of the same book! Fortunately, Ross documented his observations and posted them on his site, rosscarter.com, and we’ve reprinted it here. -Professor Walrus

Take Huckleberry Finn, for example

– by Ross Carter, WalrusInk Editor & Software Developer

Recently I decided to put Huckleberry Finn on my iPhone. The iBookstore offers numerous editions, so I downloaded some samples to decide which was the best.

The results surprised me: the best was very much better than I expected, and the worst was very much worse than I thought possible.

Here’s what I found.

Source material

Assembling the source text for Huckleberry Finn is not as simple as one might think. If we take the 1885 first edition as our source, we immediately confront the problem that the punctuation in the edition contains numerous eccentricities and downright mistakes. Let’s identify some peculiarities that will help us identify the manner in which an ePub differs from the first edition.

The title page, below the title, prints these two lines:

SCENE: THE MISSOURI VALLEY.

TIME: FORTY TO FIFTY YEARS AGO.

At the bottom of the title page is the year of publication, 1885, providing a context for the somewhat peculiar expression “forty to fifty years ago.”

The first sentence of Chapter 1 reads:

YOU don’t know about me, without you have read a book by the name of “The Adventures of Tom Sawyer,” but that ain’t no matter.

Note the small caps in the first word, the commas after me and Sawyer, and the double quotes around the book title.

Regarding punctuation, inconsistency is the rule. Consider this excerpt from the sixth paragraph of Chapter 1:

Miss Watson would say, “Dont put your feet up there, Huckleberry;” and “dont scrunch up like that, Huckleberry—set up straight;” and pretty soon she would say, “Don’t gap and stretch like that, Huckleberry—why don’t you try to behave?”

The word don’t is spelled both with and without the apostrophe, and is improperly lower case in the second instance. Later in that paragraph appears a word in italics:

she was going to live so as to go to the good place.

The foregoing observations provide a signature for the first edition; if multiple publications differ from the first edition in exactly the same way, we can bet that they share a common source. The most likely common source is Project Gutenberg, which provides the book as plain text, HTML, or already packaged as an ePub (both with and without images).

One quickly notes that the Gutenberg text differs considerably from the first edition. The year of publication is omitted, leaving ”forty to fifty years ago” devoid of context. The first sentence reads:

YOU don’t know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain’t no matter.

Small caps are omitted, the first comma is omitted and the second is replaced with a semicolon, and double quotes are omitted.

The text version reads:

Miss Watson would say,

“Don’t put your feet up there, Huckleberry;” and “Don’t scrunch up like that, Huckleberry–set up straight;” and pretty soon she would say,

“Don’t gap and stretch like that, Huckleberry–why don’t you try to

behave?”

she was going to live so as to go to the good place.

WordPress fiddles with the quotation a bit, so I will cite the differences: the quotes are straight, not curled; missing apostrophes have been added; dashes are two hyphens; lines are delimited with hard returns; there are two spaces between sentences; and of course the italic is missing. The HTML edition uses a true dash character and removes the hard returns; puts a regular space and a nonbreaking space between sentences; and, disappointingly, fails to supply the missing italic. The ePub versions contain all the mistakes of the HTML version.

These markers allow us to determine very readily whether the publisher of a Huckleberry Finn ePub has added value through careful editing, or has merely passed off the Gutenberg text.

Let’s start by looking at the worst of the lot.

Publisher: Lulu.com

One can only laugh. No one in his right mind would want to read a book that starts like this:

lulu_Huck

lulu_Huck2

This ePub bears the hallmarks of the Gutenberg edition (although the italicized book title is an appropriate correction), presented in a form that is worse than the original. The publisher has taken the Gutenberg text and removed value from it. It’s hard to believe that anyone even bothered to look at this ePub before offering it for sale. It’s like an app that crashes on launch.

This edition is priced at $8.99—the most expensive of all the editions I examined.

That’s right. $8.99. The word swindle comes to mind.

Publisher: MobileReference

MobileRef_Huck

This edition manages to present a tolerably acceptable layout. The text plainly comes from Gutenberg. The price is $0.99.

I call this a very sloppy job. Quotes are straight, dashes are two hyphens, and everything is in the same font face and size. No observable attempt was made to add value to the Gutenberg edition. Save your 99 cents.

Publisher: Digreads.com

Digireads_Huck

Dashes are single hyphens, but at least omit flanking spaces. The problem with this edition is the aggressive hyphenation caused by full justification, which yields such awkward breaks as was-n’t, want-ed, and Huckleber-ry. $2.99. Keep your money.

Publisher: The Floating Press

Quality gets a substantial bump up with this edition:

FloatingPress_Huck

The chapter heading uses a different font style and size. Dashes are correct. Front matter is sensibly presented, including a reference to the year of first publication. But quotes are still straight, and the text is still uncorrected Gutenberg.

In my view, this edition is not worth the $4.99 that the seller asks.

Publisher: LibreDigital

This one is almost undistinguishable from the Floating Press edition:

Libre_Huck

The dashes are ugly hyphens flanked by spaces. At $2.99, it’s still expensive for a simple repackaging of the Gutenberg text.

Publisher: Vigo Books

Vigo_Huck

There’s little to distinguish this edition from the previous two; it is still a Gutenberg text with straight quotes. The front matter is laid out rather nicely. $3.99 and not worth it.

Publisher: HarperCollins

Now we’ve climbed out of the sewer of Gutenberg knock-offs. Clearly, some effort went into this edition:

Collins_Huck

This is what I expect from an established publishing house: true quotes and dashes, an introductory essay, and a text that has been modernized and corrected from the original (correctly, unlike the Gutenberg text). Paragraph indention is too wide for my liking, but I really cannot cavil at all about the layout and typography. The italic she is preserved, hyphenation is sensible, and front matter is neatly presented.

But did you notice the whopper of a mistake in the first sentence? I was about to commend the publisher for setting the book title in italics, instead of in quotes as the first edition does. But what can I say about dropping Tom from Tom Sawyer? At only $1.99 I would call this edition a steal, but that awful mistake on the first page destroys my confidence in the work as a whole. Who knows? Maybe they left out a page somewhere, or a chapter. Sadly, we must keep on looking as we start to wonder, how hard can it be to publish a decent ePub of Huckleberry Finn?

Publisher: Penguin

One would think that the first name in paperbacks would have a good grip on portable formats. Well, not quite.

Penguin_Huck

The text and typography are excellent. The problem is a silly proliferation of footnotes. Really, three footnotes in the first paragraph? I have no idea what those footnotes say; they didn’t make it into the free sample, and I didn’t pay $4.99 to find out. I call them footnotes, but I guess the e in the links means they are endnotes.

If I am doing scholarly research on a text—the kind where I need to read a gloss after the first five words—I’m not going to be using an ePub on my iPhone as my source. In my view the ubiquitous footnote links make this edition as annoying as the poorly prepared editions described earlier.

Publisher: Sterling

Now let’s a take a look at what’s possible when a publisher sets out to add value to an ePub.

The first thing you notice while perusing the front matter is acknowledgments to the content creators: book design by Deborah Kerner, and illustrations by Scott McKowen. The front matter includes a book cover, copyright page, title page, table of contents, and a delightful rendition of Twain’s Notice. Here is what it looked like in the first edition:

Huck_notice_1stEd

Here is what Sterling did with it:

Huck_Notice_Sterling

Beautiful! Clearly this publisher wants the reader to enjoy the experience of this book. Let’s look at page one:

Sterling_Huck1

Wow. The typography and layout are superb. Punctuation has been modernized and corrected. Mistakes in the Gutenberg text are not to be found.

Footnotes appear, but in moderation; there are only three in all of Chapter One. If you follow a footnote link, you can tap the footnote marker to return to your place in the text.

The price is $5.99 and worth every penny. At last! A publisher got it right.

Conclusion

I have two print editions of Huckleberry Finn in my house. The Sterling edition beats them both. I can happily read the Sterling ePub and feel that I have missed nothing from the printed book experience. All the other ePubs I examined fall far short of the readability of a printed edition. I call them carrion. They somewhat resemble a book, but only as a dead and putrid remnant.

The Sterling ePub demonstrates that obtaining textual content for an ePub is only the first step in publication; it is by no means the last step. A publisher must add value to the textual content by designing the ePub that will contain it. Publishers who merely take some text and run it through an ePub converter discredit the entire ePub industry. One could easily conclude that most ePub buyers think that all ePubs are ugly.

I’ve been disappointed at the quality of ePub books I’ve bought on the iBookStore. Even well-established publishing houses seem in a hurry to get the text into the store without pausing to think whether anybody will enjoy reading it.

Certainly, the ePub format suffers from a few maddening limitations. I can’t imagine how difficult it must be to design a book when you have no control over the page size or font. Maybe that’s why most eBook publishers simply give up and consign their readers to a second-rate experience. That’s a pity, when a bit of design can provide a first-rate experience.