In the preceding post we uploaded a Microsoft Office Word file of Fighting France, by Edith Wharton. I would like to introduce you to a few tools that can be used to turn this file into an e-book, in the form of an ePub format file, that can be imported into most of today’s e-readers; it can also be read on a computer with the correct software. As well, we have included an explanation of the software required to produce an e-book from scratch yourself – either from your own original writing or from text files of classic books that you can download from the internet. e-Pub is a free and open e-book standard that can optimize text across platforms and which has considerable language support, both in text and in user interface. It should go without saying that you should only be publishing literary works that are out of copyright but that is inevitably the case in reputable download locations such as the ones that we are referencing.
WordPress.com does not presently support the upload of books in ePub format but I will be trying to contact them in the near future to see if that is an upcoming feature. I did e-mail support staff at Scribd and they advise that they expect to support the upload of ePubs by the next quarter so it should be possible to provide a link on blogs in order to download e-books that have been uploaded there.
As I’m a Windows creature, the software that I use is Windows-centric but I’ve indicated which programs are cross platform and which are exclusive to Windows and I’ve tried to source Mac programs (although I have no way of trying them to see how they work). My apologies to users of Linux for lack of documentation for their OS of choice (but hey – those guys are so smart that they’re already miles ahead of me)!
It might be best to start at the beginning (it usually is) and suppose that you might want to download a text file from an online location (such as archive.org or Project Gutenberg) and progress through the steps involved in turning it into a clean ePub file. In sequence, the steps/software involved would be:
Download – Download the text file from your location of choice and save it to a new folder on your computer – these online locations will often have ePub files available for download but they have normally been assembled from the text files available at the same location and usually have the scanning errors inherent in OCR (optical character recognition) scans which we will have to clean up before publishing them as ePub files so download the “Full Text” version from which to work.
EmailStripper – A free Windows application developed by PaperCut Software (to whom I will be eternally grateful); it was originally developed to clean up the ” < ” and ” > ” characters and broken line endings in plain text, forwarded and replied, email messages but will also clean up the original line endings in books that have been scanned with OCR software. Older books were often printed in a smaller format than modern books and had wider margins and do not transpose properly into modern word processing programs; EmailStripper will convert them to a modern format that you can work with; when pasting an entire book into EmailStripper you will have to wait a minute or two while it chugs through the book doing its reformatting magic. Mac users might try this page at about.com to see if the applications listed might work on their systems.
A screenshot of EmailStripper with the first few pages of Gertrude Jekyll’s book Roses for English Gardens (copied from the full text, available at archive.org – http://archive.org/details/rosesforenglish00mawlgoog). I copied it from the browser window and clicked on “Paste” in EmailStripper.
Here’s what you get when you click on “Strip It!” in EmailStripper.
Text editor – once I have cleaned my text in EmailStripper I paste it into Notepad (comes preinstalled on all Windows computers) to see how the formatting looks. It is usually sufficient but there have been times when I’ve had to paste back-and-forth between Notepad and EmailStripper (or even Word) to get it right (not sure why – typical computer magic). I usually then save the Notepad file, close it and then open it again to see if the formatting seems correct and stable. There are plenty of other text editors out there. (I often keep an instance of Notepad running to paste into to clear formatting when I’m pasting between programs).
I’ve clicked on “Copy” in emailStripper (to copy it to the Windows clipboard) and here it is pasted into Notepad; it looks good in terms of formatting.
Microsoft Word – Once the general formatting is correct in Notepad I then copy it and paste the whole works into Microsoft Word (I use Word 2003) and immediately save it again with a Doc extension. This will be my working editing file in which I can make use of the spell checker and formatting tools to edit the book and make it acceptable for further processing on our way to an ePub file. MS Word is also available for the Mac platform, as is Open Office (for both Windows and Mac), a free open source project, which contains a Word processor if you do not have access to MS Word. I hang on to the text file as a backup measure. If you have photos or illustrations to accompany your book, then insert them, in the appropriate location, in the word processing file; it is usually best to do this at the end of the editing process. The question of whether you should “align left” your text or “justify” it arises – in books, it has generally been justified (if that’s the expression) but I tend to simply align left. Justification has always been a design issue, and there’s no doubt that it looks better on the page, but I’ll have to look into it, and experiment with it before deciding how to proceed.
I’ve selected the text in Notepad, copied it and here it is pasted into MS Word – ready for editing. Some spelling errors are visible, including hyphenated words (line endings) in the original text that have been retained in our new copy; there are also a few artifacts (e.g. ‘^ ) caused by misinterpretation by the OCR program used in scanning the original page. Even pencil marks or smudges can cause unexpected results in the output of an OCR scan. You can also see page numbers that were scanned as well as book-title and chapter headings that often appear on each page of an original book; these would normally be part of your editing process but, for our purposes, we will continue with our exercise while retaining all of these “errors”.
I’ve taken what we’ve pasted into Word and saved it as a “Web Page, Filtered (*.htm: *.html)” – I’m using Word 2003 – you’re version of Word, or other word processor, may be different. Refer to Sigil’s documentation for information. Answer “Yes” to any warnings in Word when saving, or changing and saving, your html document. Saving as HTML will change your default onscreen view in Word to “Web Layout”. You can change it back to whatever it was previously – probably “Print Layout” – by clicking on the “View” heading of your Word toolbar.
PDF reader – You can count on a long slog editing an entire book as you will have to correct misspellings that have been committed by the OCR scanning program (many instances of $ ^ * + @ ~ ` etc) and will have to check against an original copy of the book to look for italics – these are stripped out at the time of scanning and saving as a text file. For this purpose I normally download the PDF version that is usually available at the same location where I’ve downloaded the OCR scan saved as “Full Text” (so you know that you are comparing the same edition of the book) and run the word processor and PDF reader side-by-side, page-by-page, as I’m editing. The PDF file is normally a “snapshot” or “picture” of each page of the original book and the italics can be recognized and applied in the word processor. You can also check for original spelling in the PDF source file – older names and spelling styles are common in old books and, in gardening books, are the norm. It’s up to you how you want to deal with spelling issues – it has its regional and international variations as well – but I think that it is as well to stick to the locale and era of the writer that you are processing. Finally, the issue of the n-dash or m-dash raises its ugly head; I often leave them intact if the word processor has interpreted them consistently but, deep in my heart, I would like to banish them and simply use a hyphen in all situations. To that end I often compose an article in Notepad and then paste it into MS Word and the hyphens prevail. Adobe Reader is the best known PDF reader but I prefer PDF-XChange Viewer – it is much less resource intensive, has many more bells and whistles and is also free.
Sigil – The best-known software that can turn your Word (or word processor) document into an ePub file is called Sigil. It is an open source (free) application running on Windows (32 or 64 bit), Mac and Linux. It can be downloaded here. Sigil can only open ePub or html files so the first requirement would be to save your Word (Doc) file as a new file using the htm or html extension (in “Save as type” choose Web Page, Filtered) and then close it. (A handy conversion chart for importing into Sigil is here). You would then start Sigil and open your htm file to start editing it and save it as an ePub file. There is a very useful user’s guide on Sigil’s website (both online and as a downloadable ePub) but be aware that version 6.0 of the program has recently been released and that the user guide is lagging behind and is currently in draft form; if you download it, check for it from time to time to appear in its final form. Documentation for Sigil is excellent – I’m just learning the application myself so I’ve been reading it.
I’ve closed the Word web-page document and opened it in Sigil (File – open…); the only change that I made in the original (Doc) Word document was to format the first chapter heading – CHAPTER I GARDEN ROSES NEW AND OLD – as “Heading 1” so that Sigil would see it as a chapter heading in order to create a table of contents (TOC). In Sigil, I’ve only added the metadata (title and author name) and made a few changes to accommodate the TOC and then saved it as an ePub file. If it was a real book, it could now be added to an e-reader.
Calibre – The final piece of software (Windows, OSX, Linux & portable) that you should have on your computer is Calibre. When installed on a computer it will function as both an e-book library and an e-book reader; you can read e-books with it or preview ones that you have been working on. It will also convert e-books between formats so that they can be read on different devices than those for which they were designed when published. I use it on my netbook (10″ Acer Aspire One) as well as my home desktop to read e-books – regardless of their format. I’m in the market for an e-reader but haven’t yet decided on which one to buy. Calibre should be installed on every e-book aficionado’s home computer or note/netbooks. The same goes for Sigil – between the two you have the proper tools to produce and read e-books which are, more and more, beginning to be the way of the future.
I’ve now opened Calibre and added my new ePub book to its library; it appears at the top of the main library column when it is first added. Columns can be sorted according to your taste – title, author etc.
I’ve double-clicked the entry in the library in order to open it in Calibre’s e-reader. This looks very similar to what you would see in a mobile e-reader. The vast majority of e-readers will support the ePub format; Amazon’s Kindle is an exception but I understand that they are planning to support it soon.
Sigil is a powerful program and, as such, not something that you will grasp overnight but, with some effort on your part, you should be able to master it sufficiently to convert material into ePub format that can be conveniently consumed, at your leisure, on an e-reader. I collect interesting articles from various sources on the web, dump them into MS Word Docs in a folder called Internet Docs and save them up for the day when I’ll have the time to read them. Once I’ve mastered Sigil, it should be easy enough to condense them into my own e-magazine that can be loaded onto an e-reader (with a nice long battery-life) and, hopefully, read on the beach in Mexico (dream on!!!).
I hope that this introduction to e-book creation software has inspired you to experiment with some of them and to try your hand at creating e-books or e-articles and, perhaps, to purchase an e-book reader in order to access some of the material that has already been assembled. If you are passionate about a particular subject, these software programs can give you the opportunity to rescue cherished older books on the subject from their undeserved obscurity and present them to a new audience who share the same passion but have not had, and are unlikely to have, the opportunity to access the writing and wisdom contained in them.
A good comparison of e-readers on the market can be found at Mobile Read – this excellent site also has forums relating to e-readers and all aspects of e-reading (including Sigil & Calibre) to bring you up to speed and to help you deal with the inevitable digital frustrations that will arise as you learn this craft. It is not necessary to create an account in order to read the posts in the forums but, if you wish to ask questions and receive answers from knowledgeable forum participants, then you should create an account – it only takes a few minutes and can save a lot of time spent in trying to puzzle out the solutions to your digital dilemmas on your own.
Download Sigil and Calibre now and help to launch the e-book revolution.
Update: 06/11/2012: If you have Calibre (or other e-reading software) installed on your computer, or if you own an e-reader and would like to see an excellent collection of authors that can be downloaded in ePub format, then head over to eBooks @ Adelaide (from The University of Adelaide Library in Australia). The formatting and editing on these files (that I have seen) is truly professional and the selection is excellent – an ideal way to stock up on your mobile library.