Making books with XML

Tags

, , , , , , , , ,

When I did my first book, I decided to use XML for the book contents. Considering my background in academia , I might have chosen LaTeX instead.

But I wanted the possibility to generate different output formats. Specifically, I wanted to generate html and pdf.

And I wanted to learn XML.

I chose PSGML and Emacs as my editing tools. I found this combination to be really powerful, and eventually (after a lot of practice) I could write almost as fast as when writing in a Word processor, like MS Word.

In early 2015, I found that my PSGML and Emacs combination did not work as good as before.

I did some trouble-shooting, but without luck.

I realized that it was perhaps time for something newer. I searched for alternatives, and after a while I found nxml. This turned out to be an interesting alternative, but I had to do some porting work, in order to get started.

I translated my old DTD files to RelaxNG schemas.

I added code to my .emacs file, as

(require 'nxml-mode)
(add-to-list 'auto-mode-alist '("\\.xml$" . nxml-mode))

(eval-after-load 'rng-loc
  '(add-to-list 'rng-schema-locating-files "~/rng/schemas.xml"))

The file schemas.xml is a reference to the actual schemas. As an example, taken from my own schemas.xml file, there is a reference to a file called chapter.rnc, as

  <documentElement prefix="" localName="chapter" typeId="chapter"/>
  <typeId id="chapter" uri="chapter.rnc"/>

Inside chapter.rnc we find e.g.

    element chapter_intro {
       element para_list {
          element para { external "para_elements.rnc"}*}},

which defines an element called chapter_intro, with an element called para_list inside, which in turn is a list of para elements.

The para element itself is defined in the file para_elements.rnc. As an example of contents from this file, we have the element item_list, which is a list of item elements (which in turn can contain text). The element item_list is part of a mixed element, and its Relax NG definition is

     element item_list {
         element item { text }+ } |

Using Relax NG schemas in this way, I found the actual editing to be fairly straightforward. It turns out that Emacs highlights (in red) parts of the XML file that do not comply with the schema. This is very helpful when doing the editing, and it is a feature that was not there when I used the PSGML and Emacs combination.

In PSGML, I used quite a few editing commands (which had to be learned, and practiced, to get up to speed). When using Relax NG schemas, I use only the command for completion.

The command for completion is set to Alt-Tab as default. As we know, Alt-Tab has another standard meaning! It changes focus to another program. It turns out, however, that Alt-Tab can be used anyway.

On Mac, I could use Alt-Tab as is, since the key that corresponds to Alt in Alt-Tab is the command key on Mac!

On Linux, I could use Esc Tab (the Escape key followed by Tab, i.e. the two keys are pressed in sequence, and the Escape key must be released before pressing the Tab key). This is not so surprising, since this is the alternative (old) way to use Alt-commands in Emacs.

In Emacs, Alt is often referred to as the Meta key. A Meta key combination can be obtained by pressing Alt and another key while holding down Alt, or alternatively, by pressing and releasing the Escape key before pressing the next key (Tab in this case).

The net result was that I could continue writing books in XML.

You can see some examples of generated files, as

  • Into Computers, a book about creating your own computer, in html
  • Into Programming, a book about programming, in pdf (Python) and pdf (C)
  • Into Embedded, a book about embedded systems, in epub

Books with Software

Tags

, , , , , , ,

Books about software are best enjoyed with software.

In a book with views, there may be different pieces of software associated with the different views.

However, a reader may still like to enjoy one software package, covering all the views, for a given book.

This approach has been taken for the books produced here. As a first example, a software package for the book Into Embedded has been created.

The software package can be downloaded as a zip-file from the Book Software page.

But wait, there’s more!

As you might have guessed, the actual content of the books is generated, automatically, from some kind of template. I use XML for the template and Python for the book generation program.

And since a computer program processes the XML, and generates html for the web and epub or mobi for the e-book variants, we can take advantage of this and let the computer program perform other tasks as well.

As an example, all figures showing program code are extracted from real programs. The source code of a program from which a figure shall be extracted is annotated, with markup showing where each figure starts and ends. The result of this can be seen, e.g. in Figure 2, which shows startup code for a processor.

The corresponding source file for this figure is found in the file startup.s.

While the book-generating program processes the XML, it can also make a list of all software files needed. Such a list is shown on the Book Software page. You might note that there are links, inside the list, to Figures within the book and also to the actual software files used.

The software can be tried out. It can be built and it can be executed. It requires some tools, like compilers and linkers, and also a simulator (QEMU) is used.

The software covers the Bare Metal chapter of the book, and it shows how to make a small program that can run entirely on its own, i.e. it can run without the help of an operating system.

Here is a direct download link.

And yes, there is a README-file inside (with a HOWTO for building the software, and with hints on how to download and install the necessary tools)

Have fun,

The bookmaker (a.k.a Ola)

Three (3) books about programming? – Nope, there’s only one!

three_books

The first book is about C. It starts with a description of a classical “Hello, world”-program, and briefly describes how such a program can be compiled, linked, and run. It proceeds with a description of variables and values, how values can be assigned to variables, and how variables and can be combined into expressions and computations, executed in sequence. Then, the concepts of alternative and iteration are discussed, and illustrated using if-statements, for-statements, and while statements. The combination of actions into larger pieces, called functions, is then discussed. Then, structured datatypes are treated, starting with arrays and lists, followed by the use of struct for creating data structures. The book concludes with an overview of the functionality available in C for performing mathematical and logical computations, and how the C standard library can be used for this purpose.

The second book is about Java. It starts with a description of a classical “Hello, world”-program, and briefly describes how such a program can be compiled and run. It proceeds with a description of variables and values, how values can be assigned to variables, and how variables and can be combined into expressions and computations, executed in sequence. Then, the concepts of alternative and iteration are discussed, and illustrated using if-statements, for-statements, and while-statements. The combination of actions into larger pieces, called methods, is then discussed. Then, structured datatypes are treated, starting with arrays and lists, followed by the use of objects for creating inheritable data structures with associated behavior. The book concludes with an overview of the functionality available in Java for performing mathematical and logical computations, and how the Java class library can be used for this purpose.

The third book is about Python. It starts with a description of a classical “Hello, world”-program, and briefly describes how such a program can be run. It proceeds with a description of variables and values, how values can be assigned to variables, and how variables and can be combined into expressions and computations, executed in sequence. Then, the concepts of alternative and iteration are discussed, and illustrated using if-statements, for-statements, and while statements. The combination of actions into larger pieces, called functions, is then discussed. Then, structured datatypes are treated, starting with arrays and lists, followed by the use of objects for creating inheritable data structures with associated behavior. The book concludes with an overview of the functionality available in Python for performing mathematical and logical computations, and how the Python library can be used for this purpose.


“But”, you might ask, “did you really have to write three books?”

“No, actually not”, I might answer, indicating with a little smile that I might know something you are not yet aware of.

“In fact, I only create one book.”

“???”, you might think, wondering what I am talking about.

“You see”, I would say, “I use this system called Books with Views, making it possible for me to write only one book, but it is a book containing more than one topic”.

“Hmm, that sounds interesting. Tell me more!”, you might say.

“I write the common parts for all books, and then I write the specific parts separately for each view”

“Aha. But how can I read such a thing? – it seems to have more than one dimension!”

“It is easy, you can read it on the web, using links inside the book for changing view. And you can also read it in your e-book reader, as epub (e.g. for iBooks) or as mobi (for the Kindle)

“I will try it right now!”, you now say, “and don’t forget to inform me about the release date”. Then you start your favourite browser or your favourite e-book reader, eager to see what this is all about.

After a while, you might say to yourself: “Interesting, and whatever the price it certainly looks like a bargain. I get three books for the price of one!”

Linking in and out of a book

In a recent article – EXTRA ETHER: eBooks Gone in 5 Years? – brought to my attention in a tweet from Joanna Penn, the possibility of linking in to an e-book was discussed. In a Book with Views, this is possible, not because of the views, but due to the use of a common format, in which the book is written, and which then is used as the base for generating the different book versions – a web version, e-book versions in epub-format and mobi-format, and versions suitable for print – one for each of the views.

It is possible to link in to a book, for example to a chapter, a section, but also to a specific figure, and to pages collecting meta-information, e.g. the page collecting all URLs used in the book.

Of course it is always possible to link out of a book. This helps the reader to probe further, and it has also been pointed out recently by Seth Godin in What do you do when they don’t understand?.

If a common format is used, software can of course help the author, to keep hyperlinks consistent between different book formats, and also to automatically create meta-information, like the page showing all the urls, mentioned above, or to a page listing all concepts defined and used in a book.

Connecting the views

In the drafts of books created so far, there are links inside the books between the views. The links are placed inside the book, at the end of each section. There are also links between the views in the about pages, e.g. on the about page showing the URLs used in a book. In the web versions of the books, there are also links in the sidebar of each page.

Recently I added links between the views directly after figures showing code. As an example of this later feature, you can take a look at Figure 1 in Into Programming, where you can see the links to the other views.

The links works as illustrated in this figure (click the figure to get a larger version), where some text and some code from a program, written in C in one view and written in Java in another view, is shown.

view illustration

As you can see in the above figure, parts of the text are common for both programs, and other parts of the text are different. The code of course is different, since that is the whole point of having these views, with one view for each programming language!

The connections between the views are shown in the figure as dashed red arrows, corresponding to the hyperlinks you will find inside the book.

Into Embedded – another book with views

I started a new book. It is called Into Embedded, and it is about embedded systems. It has two views (to start with), representing two processor architectures – 32-bit Intel-x86 is one view and ARM is another view.

I wanted to test the concept of making books with views with one more book, and I also wanted to get started on the work of replacing an old book called Realtidsprogrammering – written in Swedish and not available in print anymore – with a newer version. And of course it gave me a chance to procrastinate (I recently read Turning Pro, so there is hope for a better situation), and finding excuses for not creating more content for Into Programming.

You can also have a look at the book (draft with respect to contents but rather ok with respect to form) in epub-format and mobi-format, using these links:

Preparing for print

A book with views may be somewhat difficult to print. One idea is to print several books, one for each view. The format for printing can be chosen as pdf. I tried this, for the book Into Programming, by generating pdf from LaTeX, one pdf for each view (C, Java, Python).

The formatting was done using tufte-latex, which produces a rather nice layout, using typesetting inspired by the works of Edward Tufte.

You can see the results using the links in the list below. Note that these are works-in-progress, i.e. the book contents are added incrementally, at the same time as I try to figure out how to produce the actual books (web-version, epub- and mobi-versions, and now also pdf):

Links to web-version and e-book versions are found on the Books page.

Doing the Math

Sometimes it is interesting to have equations in an e-book. Of course if it is a book on mathematics, but also in other books. I made an attempt to include equations in the book Into Programming, like this.

For the web version of the book, I used MathJax, a Javascript libary that allows you to write the equations using LaTex notation. As an example, the following html code

$$
a_n = a \prod_{i = 1}^n \left (1 + \frac{p_i}{100} \right )
\quad \quad \quad (5)
$$

can be used (together with the appropriate script tag to include MathJax) to create an equation. You can see the equation as equation number 5 on the About page for equations in the book Into Programming.

For the epub- and mobi-versions of the book I used images. I did not expect the typical e-book reader software of today to handle MathJax yet. The images are created from LaTeX, using a similar notation as for the MathJax, and each equation is put into its own LaTeX file, e.g like this

\documentclass[b5paper, 9pt]{memoir}
\usepackage{amsmath}
\pagestyle{empty}
\begin{document}
\begin{displaymath}
a_n = a \prod_{i = 1}^n \left (1 + \frac{p_i}{100} \right )
\quad (5)
\end{displaymath}
\end{document}

Then, images can be created, one for each equation, using LaTex and the tool dvipng.

You can see the result, in the e-books, in the Chapter called Again and Again. The books are found, as epub and mobi, on the Books page.

Update on epub-readers

Here is a summary of the current status regarding epub-readers suitable for the book Into Programming.

The following readers have been tested Ok on an iPad 2:

  • The Bluefire reader. In contrast to what was said in an earlier post, external links are now supported in Bluefire for iPad, resulting in a quite nice reading experience.
  • The iBooks app.

For the iPhone, the book opened fine in the Stanza reader, but unfortunately not in iBooks for iPhone.

The following readers have been tested Ok on an Android phone:

  • The Aldiko reader.
  • A reader app called just that, Reader. It was pre-installed on my HTC phone, and I am not sure if it can be found in the Android Market (nowadays known as Google Play).

Links to the book are found on the Books page.

Fixing an epub problem – it now works also in iBooks

In my first attempt, the epub-version of book Into Programming did not open in iBooks on the iPad.

After some debugging i found the problem, and it is now fixed. It turned out that I had an extra blank line in the mimetype file, and that caused the problem.

It was a little bit tricky since the book opened fine in other readers, and it was also considered Ok when I did validation in Sigil.

Links to the book, in its different formats, are now collected on this blog’s Books page.