Making an XML Document

In order to create DocBook documents in XML, you'll need an XML version of DocBook. We've included one on the CD, but it hasn't been officially adopted by the OASIS DocBook Technical Committee yet. If you're interested in the technical details, Appendix B, describes the specific differences between SGML and XML versions of DocBook.

XML, like SGML, requires a specific prologue in your document. The following sections describe the features of the XML prologue.

An XML Declaration

XML documents should begin with an XML declaration. Unlike the SGML declaration, which is a grab bag of features, the XML declaration identifies a few simple aspects of the document:

<?xml version="1.0" standalone="no"?>

Identifying the version of XML ensures that future changes to the XML specification will not alter the semantics of this document. The standalone declaration simply makes explicit the fact that this document cannot “stand alone,” and that it relies on an external DTD. The complete details of the XML declaration are described in the XML specification.

A Document Type Declaration

Strictly speaking, XML documents don't require a DTD. Realistically, DocBook XML documents will have one.

The document type declaration identifies the DTD that will be used by the document and what the root element of the document will be. A typical doctype declaration for a DocBook document looks like this:

<?xml version='1.0'?>
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
                         "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">

This declaration indicates that the root element will be book and that the DTD used will be the one indentified by the public identifier -//Norman Walsh//DTD DocBk XML V3.1.4//EN. External declarations in XML must include a system identifier (the public identifier is optional). In this example, the DTD is stored on a web server.

System identifiers in XML must be URIs. Many systems may accept filenames and interpret them locally as file: URLs, but it's always correct to fully qualify them.

An Internal Subset

It's also possible to provide additional declarations in a document by placing them in the document type declaration:

<?xml version='1.0'?>
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4/EN"
                         "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [
<!ENTITY nwalsh "Norman Walsh">
<!ENTITY chap1 SYSTEM "chap1.sgm">
<!ENTITY chap2 SYSTEM "chap2.sgm">
]>

These declarations form what is known as the internal subset. The declarations stored in the file referenced by the public or system identifier in the DOCTYPE declaration is called the external subset, which is technically optional. It is legal to put the DTD in the internal subset and to have no external subset, but for a DTD as large as DocBook, that would make very little sense.

Note

The internal subset is parsed first in XML and, if multiple declarations for an entity occur, the first declaration is used. Declarations in the internal subset override declarations in the external subset.

The Document (or Root) Element

Although comments and processing instructions may occur between the document type declaration and the root element, the root element usually immediately follows the document type declaration:

<?xml version='1.0'?>
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
                         "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [
<!ENTITY nwalsh "Norman Walsh">
<!ENTITY chap1 SYSTEM "chap1.sgm">
<!ENTITY chap2 SYSTEM "chap2.sgm">
]>
<book>...</book>

The important point is that the root element must be physically present immediately after the document type declaration. You cannot place the root element of the document in an external entity.

Typing an XML Document

If you are entering SGML using a text editor such as Emacs or vi, there are a few things to keep in mind. Using a structured text editor designed for XML hides most of these issues.

XML and SGML Markup Considerations in This Book

Conceptually, almost everything in this book applies equally to SGML and XML. But because DocBook V3.1 is an SGML DTD, we naturally tend to use SGML conventions in our writing. If you're primarily interested in XML, there are just a few small details to keep in mind.

For a more detailed discussion of DocBook and XML, see Appendix B.