XML Schema and Namespaces
If you liked this article, go to www.descriptor.com and learn about our world-class training!
DescriptionNamespaces in XML are a way to organize names (e.g. element names) so as to avoid name conflicts. XML Schema lets you define the rules for XML content so that a given XML document is correct for your application. This article discusses how the two concepts interact.
- How to write simple XML schemas.
- The basics of how namespaces in XML work.
Namespaces in XML provide two basic functions:
- They allow you to disambiguate XML element names, so that you can intermingle XML from different vocabularies that happen to use the same names.
- They allow you to identify element and attribute names according to their namespace. In other words, given a name, the reader can see that the name belongs to a collection of other names.
To use namespaces, you create qualified names that often use a prefix to identify the namespace. Then you map the prefix to a globally unique string, such as a URL (which can be unique because domain names are unique). Thus, a prefix-qualified name consists of a prefix and a local name, as shown in Figure 1.
Figure 1: Qualified Names
To map or assign a prefix to a namespace URL, you use the xmlns attribute as shown in Listing 1:
<hr:employee xmlns:hr="http://goliath.com/hr"> <hr:name>Paul Westerberg</hr:name> <hr:address>123 Elm Street</hr:address> <hr:employee>
The XML Namespaces specification also allows for a default namespace which lets you avoid using a prefix while still qualifying names, at the expense of human readability (it makes it more difficult for human readers to determine if a given element is a namespace or not). To use a default namespace, you use the xmlns attribute without a prefix. Listing 2 shows an example of using a default namespace:
<employee xmlns="http://goliath.com/hr"> <name>Paul Westerber</name> <address>123 Elm Streen</address> </employee>
Schema itself defines and uses three namespaces, with the following URIs:
The first namespace URI, for datatypes, separates out the part of Schema used to define XML data types, such as integer, decimal and string. This separate namespace exists primarily to support non-Schema technologies (e.g. RelaxNG) that use the Schema types without using the rest of Schema.
The second namespace URI, for instances, lets XML documents reference schemas. We will see examples of it later in this article.
The third namespace URI, for Schema itself, defines the XML vocabulary so for schemas, e.g. minOccurs. Note that this namespace "includes" the datatypes namespace so that when you write a schema, you don't need to define the datatypes namespace separately.
When you use these namespace URIs, you can map them to any prefix, but it's traditional to use xsi for the instances namespace.
Referencing a Schema from an Instance Document
Now let's see how we can use the instances namespace so that an XML document (the instance) can reference one or more schemas. Once we do that, then we can parse the instance and the parser will fetch the schema and use it to validate the instance. There are two syntaxes to reference a schema, depending on whether the instance document itself uses namespaces as shown in Figure 2.
Figure 2: Referencing a Schema from an Instance
As shown in Figure 2, if the instance document has no elements or attributes in an application-defined namespace, it uses xsi:noNamespaceSchemaLocation to reference its schema. Note that the reference is itself a URL - Figure 2 uses a relative "file" URL, but it could be a remote URL, perhaps using HTTP.
But if the instance document itself has elements or attributes in a namespace, the instance uses xsi:schemaLocation to reference one or more schemas, each of which validates content from a single namespace. So that makes the syntax of the xsi:schemaLocation attribute more complex; it needs to specify at least TWO things: the namespace URI that the schema validates, and the location of the schema. As shown in Figure 2, for an instance document that uses a single namespace, the value of the xsi:schemaLocation attribute contains two substrings, separated by whitespace. http://myURI is the first substring and schema2.xsd is the second. We discuss this syntax in more detail next.
The Syntax of the xsi:schemaLocation Attribute
The xsi:schemaLocation attribute's value consists of one or more pairs of substrings, the first of which identifies an namespace URI, while the second references a schema. The substrings, and each pair themselves, must be separated by whitespace, which includes spaces, tabs and new line characters. Figure 3 shows the general syntax:
Figure 3: xsi:schemaLocation Attribute Value Syntax
If you have an instance document that uses many namespaces, the xsi:schemaLocation attribute's value can get quite complex. For example, this listing shows an XML file from an application that uses the Spring Framework:
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" xmlns:jee="http://www.springframework.org/schema/jee" xmlns:jms="http://www.springframework.org/schema/jms" xmlns:lang="http://www.springframework.org/schema/lang" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation= "http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.5.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd http://www.springframework.org/schema/jee http://www.springframework.org/schema/jee/spring-jee-2.5.xsd http://www.springframework.org/schema/lang http://www.springframework.org/schema/lang/spring-lang-2.5.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.5.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.5.xsd"> . . . </beans>
In the above code listing, note how the xsi:schemaLocation attribute's value consists of seven pairs of substrings, separated by new-line characters. The first of each pair (e.g. http://www.springframework.org/schema/aop is a namespace URI, while the second (e.g. http://www.springframework.org/schema/aop/spring-aop-2.5.xsd is the actual location of an XML schema file. Note how the latter of the pair ends in the conventional file extension for Schema, .xsd.
When you validate a document like the one above, the parser fetches each of the schemas (.xsd files), ensures that each actually validates the namespace specified, and then uses the schemas to validate the document.
The targetNamespace Attribute
A given schema can only validate a single namespace. This bears repeating: A given schema validates a single namespace. In fact, you could derive this concept by looking closely at the schemaLocation attribute in Listing 1: each schema reference requires both a namespace URI and the location of the schema file. So it follows that if a given instance document uses multiple namespaces, it must reference multiple schemas.
So when you write a schema that validates XML that's in a namespace, you must indicate which namespace the schema validates. You do so by writing a targetNamespace attribute on the schema root element as shown in Listing 3:
<xs:schema targetNamespace="http://goliath.com/ABCD" xmlns="http://goliath.com/ABCD" xmlns:xs="http://www.w3.org/2001/XMLSchema"> . . . </xs:schema>
In Listing 4, we also specify the target namespace as the default namespace in the schema - that's not required, but is a common way to write schemas.
Global and Local Elements in a Schema
When you write a schema, you declare the elements and attributes that will appear in an instance document. You can write those declarations at either global or local scope in the schema, and it turns out that this choice has a significant side effect on how the instance document uses namespaces.
Globally defined element and attribute declarations live at so-called root level in the schema - that is, their parent is the schema root element itself. Locally defined declarations are nested within other declarations as shown in Figure 4.
Figure 4: Global and Local Declarations in a Schema
The primary benefit, or rationale, for declaring globally is that you can reference globally declared elements and attributes elsewhere in the schema. In other words, globally declared elements and attributes are a form of re-use that let you factor-out commonly used declarations and re-use them as needed, thus reducing redundancy in the schema. So for example, in Figure 4, if some other declaration in the schema needed to use firstname it could not do so since that declaration is not at global scope.
Global vs Local Element and Attribute Declarations
As mentioned earlier, whether an element or attribute is declared globally or locally in the schema has a significant side-effect on how an instance document uses namespaces. We can summarize how it works with two straightforward rules:
- RULE 1: Elements and attributes declared at global scope in a schema MUST be namespace qualified in an instance document.
- RULE 2: By default, elements and attributes declared at local scope in a schema MUST NOT be namespace qualified in an instance document.
For Rule 1, by "namespace qualified", we mean that the globally declared element must be in a namespace in an instance document, either with a prefix or a default namespace.
For Rule 2, we can override this default, as we will see in a moment.
The upshot of these two rules is that the schema author has complete control over how namespaces are used in an instance document. Let's look at an example.
Namespace and Schema Example, Number One
Let's start by looking at a schema that has some elements declared globally, and some locally.
<?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://goliath.com/ABCD" xmlns="http://goliath.com/ABCD" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="student-name" type="xs:string" /> <xs:element name="student"> <xs:complexType> <xs:sequence> <xs:element ref="student-name" /> <xs:element name="student-id" type="xs:positiveInteger" /> <xs:element name="gpa" type="xs:decimal" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
In Listing 5, note that the elements student and student-name are declared globally, while student-id and gpa are local declarations. By applying Rules 1 and 2, we can predict that in an instance document:
- student-name and student must be namespace-qualified due to Rule 1.
- student-id and gpa must NOT be namespace-qualified due to Rule 2 (we have not overridden any defaults).
Listing 6 shows an instance document that follows these rules:
<?xml version="1.0" encoding="UTF-8"?> <p:student xmlns:p="http://goliath.com/ABCD" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://goliath.com/ABCD namespaces1.xsd "> <p:student-name>Sam</p:student-name> <student-id>12</student-id> <gpa>3.44</gpa> </p:student>
In Listing 6, we defined a prefix p for the target namespace of the schema in Listing 5. Then, applying Rule 1, we explicitly prefix student and student-name since they are declared globally in the schema. Applying Rule 2, student-id and gpa have no prefix, and thus are not namespace-qualified (there is no default namespace in this instance document).
If we had violated either of those rules, then the instance document would not be valid, and if we parsed it, the parser would complain.
Overriding the Default for Rule 2
Rule 2 states:
- RULE 2: By default, elements and attributes declared at local scope in a schema MUST NOT be namespace qualified in an instance document.
The implication here is that we can override Rule 2 (we can never change how Rule 1 works: globally-declared elements and attributes MUST ALWAYS be namespace-qualified). Before we see how, let's talk about why.
Using the default Rule 2 makes it easier for XML authors - only the globally-declared elements and attributes have to be qualified. And that often solves the basic problem that namespaces address: many times, name collisions only occur on top-level elements in an instance document. But occasionally, nested elements might be ambiguous and thus need namespace qualification. If that's the case, we can override Rule 2.
The form Attribute
Schema provides two different ways to override Rule 2. The first way to override Rule 2 is to the form attribute. On any locally-declared element or attribute, in a schema, you can use the form attribute to force qualification in an instance document as shown in Listing 7:
<xs:schema . . .> <xs:element name="student"> . . . <!-- locally defined element --> <xs:element form="qualified" name="gpa" type="xs:decimal" /> . . . </xs:element> </xs:schema>
So in Listing 7, we have decided that gpa element must always be namespace-qualified in an instance document, even though it's declared locally in the schema.
If you have many such locally-declared elements or attributes, it can get tedious writing individual form attributes. To address that, schema provides the elementFormDefault and attributeFormDefault attributes.
The elementFormDefault and attributeFormDefault Attributes
On the schema root element, you can establish a new default for locally-declared elements and attributes in a schema. The default for both of these (that is, if you don't specify them at all) is unqualified, thus leading to the Rule 2 we've already discussed. But as Listing 8 demonstrates, you can explicitly specify elementFormDefault or attributeFormDefault or both:
<xs:schema elementFormDefault="qualified" ...> <xs:element name="student"> . . . <!-- locally defined element --> <xs:element name="gpa" type="xs:decimal" /> . . . </xs:element> </xs:schema>
So we can predict that in an instance document, the gpa element will need to be namespace-qualified. In fact, ALL locally declared elements in this schema will need to be namespace-qualified.
Note you can combine the approaches, that is specify elementFormDefault and/or attributeFormDefault on the schema root, AND specify form on an individual element or attribute's declaration. In case of a conflict, the form attribute takes precedence. This gives the schema author complete control over how namespaces are used in an instance document. Listing 9 shows a schema that demonstrates:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://goliath.com/ABCD" xmlns="http://goliath.com/ABCD" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="student-name" type="xs:string" /> <xs:element name="student"> <xs:complexType> <xs:sequence> <xs:element ref="student-name" /> <xs:element name="student-id" type="xs:positiveInteger" /> <xs:element form="unqualified" name="gpa" type="xs:decimal" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Looking at Listing 9, we see that the schema's root element specifies elementFormDefault of qualified and that the declaration for the gpa element specifies form of unqualified. Based on that, we can predict that in an instance document, the student, student-name and student-id elements will need namespace qualification, while gpa will not. Listing 10 demonstrates such an instance document:
<?xml version="1.0" encoding="UTF-8"?> <p:student xmlns:p="http://goliath.com/ABCD" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://goliath.com/ABCD namespaces3.xsd "> <p:student-name>Sam</p:student-name> <p:student-id>12</p:student-id> <gpa>3.44</gpa> </p:student>
In this article, you've learned how schemas that validate XML content that use namespaces work, and how to control whether or not locally-declared elements and attributes are namespace-qualified in instance documents.
If you found this article helpful, and want to learn more, we recommend that you go to the Descriptor Systems Web site and see how you can obtain training on this or other topics.