Introduction to XML Schema

Introduction

An XML Schema defines the structure and vocabulary of an XML document. The W3C developed XML Schema to create a more powerful and expressive XML validation language than DTD.

This page is part of a series of reference blog entries to understand XML and related technologies:

XML Schema vs DTD

These are the benefits of XML Schemas over DTD:

  • They are created using an XML language, unlike DTD that are created using a different language.
  • They support namespace-aware elements and attribute declarations.
  • They support simple and complex data types.
  • They support text validation based on build-in and user-defined data types.
  • They provide type derivation and inheritance.
  • They provide finer-grained element multiplicity constraints.
  • They are more precise and expressive than DTD.

A brief XML Schema example

This section explores how to associate an XML Schema to the a XML document.

This is our target XML file:

And this is the XML Schema declaration:

The XML Schema declaration starts with the <schema>  tag, and uses <element>  and <attribute>  tags to declare elements and attributes respectively. The  targetNamespace  attribute has the namespace of the domain that the XML Schema defines.

The XML document is associated to the XML Schema using the xsi:schemaLocation  attribute.

XML Schema Declaration

The XML document that an XML Schema describes is called an instance document.  The vocabulary and grammar that is being defined is called target vocabulary. The XML document is referred to as schema-valid if it satisfies all the constraints in the XML Schema.

There are two ways to associate an XML document to an XML Schema:

  • <xsi:noNamespaceSchemaLocation>  – the target vocabulary is not part of any namespace
  • <xsi:schemaLocation>  – the target vocabulary is part of a namespace

XML Schema declaration without target namespace

The XML document below is associated with an XML Schema without any target namespace.

First, it defines the XML Schema instance namespace:

Then, it uses  xsi:noNamespaceSchemaLocation  to define the XML Schema that validates the target vocabulary.

The simple_no_ns.xsd has a <xs:schema>  that declares the XMLSchema namespace, but does not declare any target namespace for the target vocabulary:

XML Schema declaration with target namespace

The XML document below associates an XML document with an XML Schema target namespace:

The <xsi:schemaLocation>  attribute has two values, the namespace and the URI of the XML Schema that defines that namespace.

The xmlns  attribute defines the default vocabulary in the XML document:

Finally, the XML Schema declaration of a vocabulary associated with a namespace looks like this:

The elementFormDefault  and attributeFormDefault  attributes control whether elements and attributes must be namespace-qualified within the instance document. Their default value is unqualified.  It is common for XML Schema declarations to set elementFormDefault="qualified" . That is, all elements must be fully qualified in the XML document, either by adding a namespace prefix to the element name, or by being part of the default namespace.  The attributeFormDefault  is usually left as “unqualified”. This is because attributes don’t inherit the default namespace. Qualified attributes must always be prefixed to be associated with a namespace.

Complex Types

Simple Types vs Complex Types

XML documents are composed of nested elements forming a tree-like structure. XML Schema uses the <xs:element>  tags to declare elements and their type, attributes and structure.

There are two type of elements:

  • Simple Types: they can only contain text. They can have neither attributes nor children. Unlike DTDs, text data can be restricted to a data type such as integer, date, regular expression and so on.
  • Complex Types: they can have attributes, child elements, and text.

The example below shows two elements associated with the build-in simple types xs:string  and xs:integer.

Attributes are always simple types, as they can have neither attributes nor child elements:

Complex types are declared using the xs:complexType  tag. The example below associates the element person with the complex type PersonType .

Anonymous types

Custom types can be either named or anonymous. An anonymous type is declared inside the xs:element  and cannot be reused by other elements. A named type needs to be declared top-level and can be re-used by many xs:elements.

The example below makes use of an annonymous type to declare the person type:

Empty Elements

An empty element is an element that can contain neither child elements nor text.  To create an empty element, you define a <xs:complexType>  without any <xs:sequence> , <xs:all>  or <xs:choice>  children.

Empty element can however contain attributes

XML documents represent empty elements in two ways, without any text:

, or using the abbreviated empty element tag:

Mixed content

A complex type is mixed when it contains text and elements.

The <person>  element below is mixed:

To declare that mixed complexType we only need to set the mixed="true" attribute.

Occurrence constraints

The minOccurs  and maxOccurs  attributes determine the number of times an element can appear.

If we want an element to be optional we use minOccurs="0" . If we want an element to appear unlimited times, we use maxOccurs="unbounded" .  The default value for minOccurs  and maxOccurs  is 1.

Element Groups and Order

There are three ways to specify the order of elements:

  • <xs:sequence> : each element must appear exactly one, and in the specified order
  • <xs:all> : each element must appear exactly one, but order is not important
  • <xs:choice> : only one element can appear

<xs:sequence>

The <xs:sequence>  definition requires each element to appear exactly once in the specified order. The number of times each element can appear is controlled by the minOccurs  and maxOccurs  attributes.

<xs:all>

The <xs:all>  definition requires each element to appear exactly once in any order.

<xs:all>  has several restrictions:

  • The only valid values of minOccurs  and maxOccurs  inside <xs:all>  are 0 and 1.
  • It can only contain simple element declarations. That is, it cannot contain <xs:sequence> , <xs:choice>  or other <xs:any>

This is an example <xs:any>  declaration:

<xs:choice>

The <xs:choice>  element specifies that only one of the elements declared can appear.  The <xs:choice>  element can also be combined with minOccurs  and maxOccurs . When maxOccurs  is greater than 1, we define an unordered group.

Simple Types

The W3C XML Schema Language has 44 build-in simple types.  These can be categorised as follows:

  • String data types
  • Numeric data types
  • Date data types
  • XML data types
  • Miscellaneous data types

String Data Types

xs:stringA sequence of characters or an empty string. It includes all whitespaces.
xs:normalizedStringA string that replaces all tabs, carriage returns and linefeeds by spaces.
xs:tokenA string that replaces all tabs, carriage returns and linefeeds by spaces. Consecutive spaces are collapsed into one space. Leading and trailing spaces are trimmed.

Note: A whitespace can be either a space (#x20), tab (#x9), carriage return (#xD) and line feed (#xA).

Numeric Data Types

xs:decimalIt is an arbitrary precision decimal number. Same as Java’s java.math.BigDecimal
xs:integerIt is an arbitrary big or small integer. Same as Java’s java.math.BigInteger
xs:intIt is a 32-bit integer. Same as Java’s int.
xs:longIt is a 64-bit integer. Same as Java’s long.
xs:shortIt is a 16-bit integer. Same as Java’s short
xs:byteIt is a 8-bit integer. Same as Java’s byte
xs:floatIt is a 32-bit floating-point number. Same as Java’s float
xs:doubleIt is a 64-bit floating-point number. Same as Java’s double

Date Data Types

xs:dateTimeA particular moment in time, containing a date and time part.2012-01-15T12:30:00
xs:dateA specific date in history.2012-01-15, 0001-12-31
xs:timeThe time of the day, without the date part.12:30:00, 12:30:00.000, 12:30:00.000Z, 12:30:00.000-05:00

XML Data Types

xs:IDAn XML 1.0 ID attribute type. It must be unique.
xs:IDREFA reference to an xs:ID value declared elsewhere in the document
xs:languageA valid language as per xml:langen, en-GB, en-US

Miscellaneous Data Types

xs:booleanRepresents true or false.0, 1, true, false
xs:anyURIRelative or absolute URI
xs:base64BinaryBase64 encoded binary data
xs:hexBinaryHexadecimal-encoded binary data

User-defined data Types

The XML Schema allows creating new data types based on existing types. The most common type of derivation is by restriction.

Restriction types are declared using the <restriction>  declaration. Restricted values are a subset of the base type.  The base attribute defines the base type of the restriction.  These are the main types of restrictions:

xs:minInclusive / xs:maxInclusiveDefines the lower and upper bounds of a numeric value, including the value
xs:minExclusive / xs:maxExclusiveDefines the lower and upper bounds of a numeric value, excluding the value
xs:lengthDefines the exact number of characters
xs:minLength / xs:maxLengthDefines the minimum and maximum number of characters
xs:totalDigitsDefines the maximum number of digits allowed
xs:fractionDigitsDefines the maximum number of digits allowed in the fractional part
xs:enumerationDefines a set of acceptable values
xs:whiteSpaceDefines how whitespaces are processed
xs:patternDefines a regular expression that all acceptable values must satisfy

<xs:minInclusive> and <xs:maxInclusive>

The <xs:minInclusive>  and <xs:maxInclusive>  define the lower and upper bounds of a numeric value.

In the example below defines that age can only contain a number between 0 and 150.

<xs:enumeration>

The <xs:enumeraction>  restriction lists the acceptable values.

The enumeration below restricts the valid departmental values to Accounts, IT and Sales.

<xs:whiteSpace>

The <xs:whiteSpace>  defines how to process white spaces. XML Schema considers white space the following characters: space ( #x20 ), tab ( #x9 ), carriage return ( #xD ) and line feed ( #xA ).

The <xs:whiteSpace>  has three possible values:

  • preserve  – the white space is unchanged.
  • replace  – Each tab, carriage return and line feed is replaced by a single space.
  • collapse  – Each tab, carriage return and line feed is replaced by a single space. In addition, multiple spaces are collapsed into a single space.

<xs:pattern>

The <xs:pattern>  defines a regular expression similar to the regular expressions in Perl or grep.

Attributes

Attributes are defined with the <xs:attribute>  tag. Attributes are restricted to simple types, that is, they can only contain text data. Attributes can only be declared as part of a complex type:

Attributes are optional by default. To make them mandatory we must set the attribute use="required" :

We can also set a default value that will be used if the attribute is not provided:

Sometimes we might want to fix the value of an attribute:

Attributes must always be part of complex types. The <xs:attribute>  declaration must come after all <xs:sequence> , <xs:all>  and <xs:choice>  declarations.

Example XML Document and XML Schema

The following is an example XML Document referring to an XML Schema:

And below the matching XML Schema:

Bibliography

 

The following two tabs change content below.

Eduard Manas

Eduard is a senior IT consultant with over 15 years in the financial sector. He is an experienced developer in Java, C#, Python, Wordpress, Tibco EMS/RV, Oracle, Sybase and MySQL.Outside of work, he likes spending time with family, friends, and watching football.

Latest posts by Eduard Manas (see all)

Leave a Reply