Introduction to DTD

Introduction

The Document Type Definition (DTD) describes the vocabulary and structure of an XML document. The term vocabulary refers to the elements and attributes valid to the XML document. The term structure refers to how these elements and attributes are combined with each other.

This page is part of a series of reference blog entries to understand XML and related technologies:

DTD Declarations

There are two ways to associate a DTD with an XML Document: internally or externally.

Internal DTD

An internal DTD is placed inside the XML document. Internal DTDs are not very common as they make the XML more verbose and less reusable. They are however useful during development.

External DTD

An external DTD lives outside of the XML document. The XML document uses the <!DOCTYPE>  tag to associate the XML document with an external file containing the DTD declaration.

This is the simple.dtd file containing the DTD declaration:

System Identifiers and Public Identifiers

There are two types of external references: system identifiers and public identifiers.

A System Identifier locates the DTD in an external file, such as local file system or Internet. It has two parts: the keyword SYSTEM, and an URI pointing at the document location.

A Public Identifier locates the file in a catalogue. It has two parts: the keyword PUBLIC, and a Formal Public Identifier (FPI). The FPI has the following format:

an example public identifier would be as follows:

It is also common to include a system identifier with a public identifier. If the DTD is not found in the catalogue, it then falls back to the external file reference.

DTD contents

DTD declarations have three basic parts:

  • Element declarations
  • Attribute declarations
  • Entity declarations

Element Declarations

An element declaration has three parts:

  • Keyword ELEMENT
  • Element name
  • Content type

The table below summarises the different types of element declarations:

An element with one child <!ELEMENT employee (name)>
An element with several children <!ELEMENT employee (name, age, join-date)>
Sequences. All children must appear in order <!ELEMENT employee (name, age, join-date)>
Choices. The element must contain either one child or the other, but not both <!ELEMENT employee (name | passport-number)>
Combining Sequences and Choices <!ELEMENT employee (name | (first-name, last-name))>
Text content <!ELEMENT name (#PCDATA)>
Empty element <!ELEMENT contact EMPTY>
Optional child <!ELEMENT employee (name, comment?)>
Zero or more times <!ELEMENT employee (name, contact*)>
One or more times <!ELEMENT employee (name, phone+)><!ELEMENT persons (adult | child)+>

Attribute Declarations

An attribute declaration has five parts:

  • Keyword ATTLIST
  • Element of the attribute
  • Attribute name
  • Attribute type
  • Attribute value declaration

The table below summarises the different types of attribute declarations.

Optional attribute <!ATTLIST employee employeeId CDATA #IMPLIED>
Required attribute <!ATTLIST contact phone CDATA #REQUIRED>
Enumerated attribute values <!ATTLIST contact contact-type (Home | Work) #IMPLIED>
Default attribute value <!ATTLIST contact contact-type (Home | Work) “Work”>
Fixed attribute value <!ATTLIST xml version CDATA #FIXED “1.0”>
Multiple attributes in same declaration
Multiple attributes in separate declaration

Entity Reference Declarations

An entity reference is an abbreviation that the XML parser substitutes when it processes the XML document. An entity reference starts with an ampersand, followed by the entity reference name, and ended by a semi-colon (e.g. &amp; )

There are five build-in entity references.

Entity ReferenceCharacter
&amp;&
&lt;<
&gt;>
&quot;
&apos;

We can however define entity references in the DTD file using the ENTITY declaration:

Empty space character <!ENTITY nbsp “&#xA0;”>
Copyright sign (©) <!ENTITY cr “&#xA9;”>
Copyright footer <!ENTITY copyright “&#xA9; All rights reserved”>
External file <!ENTITY file-contents SYSTEM “file.txt”>

Bibliography

The following two tabs change content below.

Eduard Manas

Eduard is a senior IT consultant with over 15 years in the financial sector. He is an experienced developer in Java, C#, Python, Wordpress, Tibco EMS/RV, Oracle, Sybase and MySQL.Outside of work, he likes spending time with family, friends, and watching football.

Latest posts by Eduard Manas (see all)

Leave a Reply