2.

Validating XML Documents

 

There are two prominent features of the XML Schema world. One of the features is the creation of an xsd file, which provides a succinct description of a DataSet. The other feature is the validation of an xml file with a schema. For instance, if there is an xml file, such as an invoice representing a purchase order, then this invoice must have a certain format or syntax, or else it must be bound by a set of rules. The XML Schema, which is a substitute for the DTD or Document Type Description, represents these rules .We had dealt with them much earlier.

 

In this chapter, our primary focus shall be on learning how to validate an XML file, and on exploring the depths of the Schema Description Language.

 

DTDs belong to a world alien to XML. In other words, a DTD is not an XML document. Therefore, one has to learn two diverse syntaxes; one for XML and the other for DTDs. This results in a large number of inconsistencies.

 

The major flaw with the DTD world surfaces during the creation of new data types. For instance, it is elementary to create a data type that restricts the value of an element to 1000. However, DTD supports only 10 data types, whereas the Schema world supports more than 44 data types. Besides, the Schema world allows creation of user-defined data types.

 

A major portion of this book delineates features analogous to those of the XML Schema world.

 

It is fait accompli that almost every programmer devotes about 50 to 70 percent of his time in determining whether the data he/she is dealing with, is in the right format or not. Thus, the mainstay of XML Schema is that it grants the programmer the liberty to write the rules of data validity and thereafter, he can conveniently leave it to the XML Validator to verify whether the data satisfies these conditions or not. This leaves the programmer with plenty of time on hand to rivet his attention on the job that he is paid for, i.e. writing code that represents business applications.

 

When machines are required to transact businesses with each other, it is the XML Schema that validates the data being sent across. Before incorporating the data in its databases, the receiving machine first validates the data by scrutinizing the schema. It is not mandatory for the schema file to be present on the same machine. It could be lodging on some other site on the net and could have been created by some Standards Body.

 

An XML Schema specifies the properties of a resource, whereas the XML file specifies a set of values for the above properties.

 

Create the following three files in the xmlprg subdirectory of the root drive.

 

a.cs

using System;

using System.Xml;

using System.Xml.Schema;

public class zzz

{

public static void Main()

{

XmlTextReader r = new XmlTextReader("b.xml");

XmlValidatingReader v = new XmlValidatingReader(r);

v.ValidationType = ValidationType.Schema;

XmlSchemaCollection c;

c = v.Schemas;

c.Add(null, "b.xsd");

v.ValidationEventHandler += new ValidationEventHandler(abc);

while(v.Read()) ;

}

public static void abc(object s, ValidationEventArgs a)

{

Console.WriteLine(a.Message);

}

}

 

b.xml

<?xml version="1.0"?>

<zzz >

</zzz>

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

</xs:element>

</xs:schema>

 

In the program a.cs, the XmlTextReader 'r' represents the xml file b.xml, which is the file to be validated. The class that handles xml validation is called the XmlValidatingReader. The XmlTextReader 'r' is supplied as a parameter to the constructor of this class.

 

The XmlValidatingReader class is equipped to handle a large number of validating types like DTD, XDR (an earlier representation of XML Schema) and XSD validations.

 

Then, the ValidationType property is set to Schema, since the XML file is to be validated with the Schema. The property Schemas is of type XmlSchemaCollection, which stores the name of the XSD file that is to be utilized for validation. The Add function is used to add the validation to the Schemas property, since it is a collection object.  

 

 

In the Add function, the first parameter is the namespace URI of the schema. Since we are not using any name, a value of null is supplied for the namespace. The namespace only comes into play whenever the owner of a certain tag is to be ascertained. If the entire world got together and thought up a way by which every tag could be assigned a unique name, then there would never be a necessity to preface the tag with its namespace.

 

The second parameter is the URL of the xsd file. The xsd file incorporates the rules. In this case, the rules are recorded in the file b.xsd. Moreover, the event ValidationEventHandler is initialized to a function abc, which is called each time an error occurs. Thus, while reading the file, the Read function will apply the validation rules contained in the file b.xsd, and in the event of an error, it will call the function abc.

 

The function abc takes a parameter 'a' of type ValidationEventArgs. This class contains a member Message that describes the error.

 

To substantiate the above statements beyond doubt, effect the following changes in the file b.xml. The errors, if any, shall aid and enhance our understanding of the XML Schemas.

 

b.xml

<?xml version="1.0"?>

<zzzz>

</zzzz>

 

Error

The 'zzzz' element is not declared. An error occurred at file:///c:/xmlprg/b.xml(2, 2).

 

The error occurs because the xsd file creates an element called zzz since the name property of the element tag is zzz, whereas in the xml file, the tag is zzzz.

 

Thus, the first rule is that it is the element tag that decides the names of the tags in the xml file; or to be more precise, the name attribute. Every tag in the xml file must have a corresponding element tag with the name attribute in the xsd file. 

Now, change the tag to zzz and the error completely disappears.

 

Modify the xsd file to contain the following:

 

b.xsd

<xs:schema >

<xs:element name = "zzz">

</xs:element>

</xs:schema>

 

Error

Unhandled Exception: System.Xml.XmlException: 'xs' is an undeclared namespace. Line 1, position 2.

 

The xmlns attribute identifies the namespace that the tags belong to, if and only if they are not qualified with a namespace prefix. The attribute xmlns:xs specifies that the xs prefix is owned by or belongs to, the namespace http://www.w3.org/2001/XMLSchema.

 

The error materializes since the xs prefix has been used in the document without apprizing the framework about the owner of this prefix.

 

b.xsd

<schema >

<element name = "zzz">

</element>

</schema>

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: Expected XML Schema root. Make sure that the root element is <schema> and the namespace is 'http://www.w3.org/2001/XMLSchema'. An error occurred at file:///c:/xmlprg/b.xsd(1, 2).

 

On realising that the xs prefix was the real culprit, we decided to side-step it completely. Thus, we eliminated all its instances.

 

 

 

On doing so, the system hurled a new error, indicating that the root element should be schema located within a certain namespace.

 

Every xsd file must necessarily start with the root element of 'schema' and should also end with it. It has a likeness to life, where we cannot escape from either death or taxes. All other elements have to be located within this root element.

 

b.xsd

<vijay:schema xmlns:vijay="http://www.w3.org/2001/XMLSchema">

<vijay:element name = "zzz">

</vijay:element>

</vijay:schema>

 

The point being made here is that, it is the value of the prefix that is of prime significance and not the xsd prefix. Thus, in the above case, no errors are cast at us, since we have changed the prefix from xs to vijay, and have assigned the value of http://www.w3.org/2001/XMLSchema to vijay.

 

b.xsd

<vijay:schema xmlns:vijay="http://www.w3.org/2002/XMLSchema">

<vijay:element name = "zzz">

</vijay:element>

</vijay:schema>

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: Expected XML Schema root. Make sure that the root element is <schema> and the namespace is 'http://www.w3.org/2001/XMLSchema'. An error occurred at file:///c:/xmlprg/b.xsd(1, 2).

 

On changing the URI pointed at by vijay, the compiler throws an exception. The XML framework had expected 2001 instead of 2002. The rule of the game is that the xmlns points to the namespace or to the owner of all unqualified tags. We too can create namespace prefixes and point them to their URI or the owner in the schema tag.

b.xsd

<schema xmlns="http://www.w3.org/2001/XMLSchema">

<element name = "zzz">

</element>

</schema>

 

The above xsd file does not generate any error, as the xmlns attribute points to the correct URI and to all the unqualified tags. Henceforth, for the namespaces of the tags, we shall use the default namespace specified by the xmlns attribute.

 

b.xsd

<schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:aaa="http://www.w3.org/2001/XMLSchema">

<aaa:element name = "zzz">

</aaa:element>

</schema>

 

To place things in the right perspective, tags such as schema originate from the namespace http://www.w3.org/2001/XMLSchema, since they are not qualified by any namespace prefix, whereas tags such as 'element' are to be qualified with the aaa prefix. These tags originate from the namespace http://www.w3.org/2001/XMLSchema, as specified with the xmlns:aaa attribute. We prefer using the xs prefix in the file, despite being aware that it is not essential.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa />

</zzz>

 

What we had merely intended was to allow the user to add a tag within the tag zzz. In the XML file, we added the tag, and when we ran the program a.exe, we were greeted by the following exception:

 

Error

The 'aaa' element is not declared. An error occurred at file:///c:/xmlprg/b.xml(3, 2).

The exception was bound to occur since in the xsd file, there is no element named aaa within the element zzz.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:element name="aaa" type="xs:string" />

</xs:element>

</xs:schema>

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The 'http://www.w3.org/2001/XMLSchema:element' element is not supported in this context. An error occurred at file:///c:/xmlprg/b.xsd(3, 2).

 

Adding an element called aaa within the element zzz as pursued by us, is simply unacceptable. Hence, the above exception is generated. What actually needs to be done is that, the element aaa has to be placed within the tags of complexType.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:element name = "aaa" type="xs:string" />

</xs:complexType>

</xs:element>

</xs:schema>

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The 'http://www.w3.org/2001/XMLSchema:element' element is not supported in this context. An error occurred at file:///c:/xmlprg/b.xsd(4, 2).

 

The above exception is reported because the elements are required to be placed in a specific sequence.

 

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence>

<xs:element name = "aaa" type="xs:string" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

On entering the elements as required, the error vanishes. The element tag contains name=zzz, followed by a complexType tag that defines the tag zzz. Within the sequence tag, multiple elements of zzz can be placed. As of now, the complexType tag has no name attribute. Thus, it creates an anonymous type. If we had assigned it a name, then we could have even used it elsewhere in the file.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa> hi </aaa>

<bbb> 100 </bbb>

</zzz>

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence>

<xs:element name = "aaa"  type="xs:string" />

<xs:element name = "bbb"  type="xs:integer" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

The xsd file now encloses two elements named aaa and bbb within the element zzz. Thus, in the xml file, zzz can now contain these tags. Since aaa has the data type as string, the value "hi" is enclosed within aaa. The element of bbb has the data type of int. Therefore, it is assigned the value of 100. Now, attempt to place "bye" within bbb as follows:

 

<bbb> bye </bbb>

 

This generates the following error:

 

Error

The 'bbb' element has an invalid value according to its data type. An error occurred at file:///c:/xmlprg/b.xml(4, 13).

 

Thus, the XML framework performs error checks on the data type for each element.

 

b.xml

<?xml version="1.0"?>

<zzz>

<bbb> 100 </bbb>

<aaa> hi </aaa>

</zzz>

 

Error

Element 'zzz' has invalid child element 'bbb'. Expected 'aaa'. An error occurred at file:///C:/xmlprg/b.xml(3, 2).

 

The task of the sequence tags is to enforce the correct sequence, i.e. the tag aaa comes first, followed by the tag bbb.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa> hi </aaa>

<aaa> hi1 </aaa>

<bbb> 100 </bbb>

</zzz>

 

Error

Element 'zzz' has invalid child element 'aaa'. Expected 'bbb'. An error occurred at file:///c:/xmlprg/b.xml(4, 2).

The sequence is very exacting, with no scope for duplicate tags. Therefore, when aaa is repeated twice, it generates the error given above.

 

On doing away with the aaa tags completely, the following error is displayed:

 

Error

Element 'zzz' has invalid child element 'bbb'. Expected 'aaa'. An error occurred at file:///c:/xmlprg/b.xml(3, 2).

 

Most of the explanations rendered here have already been covered earlier. However, another revision would surely do no harm. This revision would explore the xsd elements in greater detail. Also, we shall desist from repeating the entire xsd. So, we shall only focus on the changes that are to be incorporated.

 

b.xsd

...

<xs:element name = "aaa" type="xs:string" id="a1"/>

<xs:element name = "bbb" type="xs:integer" id="a1"/>

...

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: Invalid 'id' attribute value - Duplicate id attribute. An error occurred at file:///c:/xmlprg/b.xsd(6, 2).

 

When used, the id attribute must hold unique values, for it is of type ID and it marks each entity with a distinct name. Since both the elements have been assigned the same id, an exception is thrown. The id attribute has no kinship whatsoever with the data in the xml file.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa> Hi </aaa>

<aaa> Hi </aaa>

<bbb> 100 </bbb>

</zzz>

 

b.xsd

...

<xs:element name = "aaa" type="xs:string" maxOccurs ="2"/>

<xs:element name = "bbb" type="xs:integer" maxOccurs ="4"/>

...

 

The value assigned to the maxOccurs attribute determines the maximum number of times that the element can occur in the file. Based on the values specified in the xsd file, aaa can be repeated twice, while bbb can be repeated four times.

 

The maxOccurs attribute only specifies the maximum number, and not the minimum number of occurrences. Therefore, even if bbb is present only once in the file, it does not generate any error. If you do not desire an upper limit, you should specify the value of 'unbounded' and if you want to prevent its usage, you should set its value to 0. After doing so, if we attempt at using the element, an error is generated.

 

b.xsd

<xs:element name = "aaa" type="xs:string" minOccurs ="2" />

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: minOccurs value cannot be greater than maxOccurs value. An error occurred at file:///C:/xmlprg/b.xsd(5, 2).

 

By default, the value of the maxOccurs attribute is 1. Thus, when it is not specified, the tag can appear only once. The above modification results in an exception, because the minOccurs value of 2 mandates that the tag should be present at least twice; however, this is in conflict with the value of maxOccurs, which confines the tag to making only a single appearance.

 

b.xsd

<xs:element name = "aaa" type="xs:string" minOccurs ="2" maxOccurs="4"/>

 

The singular solution to the above poser is to expressly specify the value for the minOccurs and the maxOccurs attributes. The minOccurs attribute has a value of 2, which decrees that the tag should appear at least twice in the file.

 

The maxOccurs with a value of 4 ensures that the tag occurrences do not exceed the count of 4. Bear in mind that the minOccurs value cannot possibly be larger than that of maxOccurs.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa>hi </aaa>

<bbb>100</bbb>

<bbb>100</bbb>

<aaa>hi </aaa>

</zzz>

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence minOccurs="0" maxOccurs="unbounded">

<xs:element name = "aaa" type="xs:string" minOccurs="0" />

<xs:element name = "bbb" type="xs:integer" minOccurs="0" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

The minOccurs and maxOccurs that make numerous appearances, have a likeness in their behaviour.

 

In the XML file, the aaa and bbb tags appear at random. This is because the sequence tag now allows all the elements in it, to either have no occurrences of the tags at all or to have unlimited occurrences since the maxOccurs is 'unbounded'. However, the minOccurs for each of the tags also needs to be set to the minimum value of 0 to avoid any mismatch at the sequence level.

 

b.xml

<?xml version="1.0"?>

<zzz>

<yyy>

<aaa>Bye </aaa>

<bbb>100</bbb>

</yyy>

<yyy>

<aaa>Bye </aaa>

<bbb>100</bbb>

</yyy>

</zzz>

 

b.xsd

<?xml version="1.0" encoding="utf-8"?>

<xs:schema id="zzz" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<xs:element name="zzz">

<xs:complexType>

<xs:choice maxOccurs="unbounded">

<xs:element name="yyy">

<xs:complexType>

<xs:sequence>

<xs:element name="aaa" type="xs:string" minOccurs="0" />

<xs:element name="bbb" type="xs:integer" minOccurs="0" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:choice>

</xs:complexType>

</xs:element>

</xs:schema>

 

In the xml file, we have attempted to place the tag yyy in the root tag. The tag yyy in turn encloses two tags, viz. aaa and bbb. In addition to this, the tag yyy can enjoy multiple occurrences in the file.

 

In the xsd file, there exists an element named zzz, which is the root element. It can occur only once. As is the normal case, the name of the element is followed by its definition. This definition is placed in a complexType tag. Then, depending upon the situation, either a choice or a sequence tag can be implemented.

 

A choice gives the option of choosing one option from amongst many, whereas a sequence determines the order. Here, since we have only one choice, i.e. yyy, the choice tag does not play a very prolific role. A sequence tag could be used instead. However, only one of them must be implemented, or else an exception will be thrown. The value assigned to maxOccurs allows more than one occurrence.

 

Thus, using the element named yyy, the yyy tag can be placed within zzz. In turn, this tag can carry other tags. Thus, we have a complexType followed by a sequence. The tags of aaa and bbb are placed in sequence. Hence, they can occur within the tag yyy. A value of 0 for minOccurs implies that the presence of the tags aaa and bbb remains optional. Finally, all the open tags are closed.

 

b.xsd

<xs:element name="zzz" minOccurs="0">

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The 'minOccurs' attribute cannot be present. An error occurred at file:///c:/xmlprg/b.xsd(3, 2).

 

The above exception is thrown because the root element zzz must occur at least once. Hence, the minOccurs attribute can never be used. You can see that there are numerous error checks to be made for just a single attribute!

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz">

<xs:complexType>

<xs:sequence>

<xs:element name="aaa" type="xs:string" minOccurs="0" maxOccurs="0" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa />

</zzz>

 

Error

Element cannot contain text or whitespace. Content model is empty. An error occurred at file:///c:/xmlprg/b.xml(2, 6).

Element 'zzz' has invalid child element 'aaa'. An error occurred at file:///c:/xmlprg/b.xml(3, 2).

The 'aaa' element is not declared. An error occurred at file:///c:/xmlprg/b.xml(3, 2).

Element cannot contain text or whitespace. Content model is empty. An error occurred at file:///c:/xmlprg/b.xml(3, 8).

 

All the above errors point to the xml file. In the element aaa, the implication of setting the value of minOccurs to 0 is that aaa can occur a minimum of zero times, i.e. it can remain absent. On the other hand, the implication of setting the value of maxOccurs to 0 is that it can occur a maximum of zero times, which implies that it can never be present.

 

Thus, in order to satisfy both these conditions, the element cannot be present in the xml file.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz">

<xs:complexType>

<xs:sequence>

<xs:element name="aaa" type="xs:string" minOccurs="0" maxOccurs="0" />

<xs:element name="bbb" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz>

<bbb />

</zzz>

 

Just to corroborate the above statement, we have eliminated the aaa tag from the xml file, and merely added a bbb element. This generates no error whatsoever.

 

Thus, by setting the minOccurs and maxOccurs to 0, the element aaa does exist in the complexType, but is rendered ineffective.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz">

<xs:complexType>

<xs:sequence minOccurs="0" maxOccurs="0">

<xs:element name="aaa" type="xs:string"/>

<xs:element name="bbb" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz />

 

The same rules are applicable across the board. When we add minOccurs and maxOccurs to the sequence, both with a value of zero each, the entire sequence gets disabled. Therefore, the sequence can never be used. In the elements aaa and bbb, even though the default values of minOccurs and maxOccurs are 1, they can never be used.

 

The above combination of attributes effectively disables everything that ensues. But, no XML Schema error is thrown.

b.xsd

<xs:sequence minOccurs="-1" maxOccurs="0">

 

Error

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The value for the 'minOccurs' attribute is invalid - The value for the 'minOccurs' attribute must be xs:nonNegativeInteger. An error occurred at file:///c:/xmlprg/b.xsd(4, 15).. An error occurred at file:///c:/xmlprg/b.xsd(4, 15).

 

Every attribute has a data type, which in the case of minOccurs is a nonNegativeInteger. Whenever a wrong type is used, the above generic error gets displayed. In the case of an attribute of the above type, the tag minOccurs can only have a value of 0 or more.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa>Bye </aaa>

<bbb>

<a1>no</a1>

<a2>yes</a2>

</bbb>

</zzz>

 

In the xml file, the root tag zzz encloses a tag aaa. Within this tag aaa exists another tag bbb, which in turn contains the tags of a1 and a2. This kind of layout of 'tags within tags within tags' is encountered very often in various files. In order to avert all errors in the xml file, the xsd file should be as follows:

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence>

<xs:element name = "aaa" type="xs:string"/>

<xs:element name = "bbb">

<xs:complexType>

<xs:sequence>

<xs:element name = "a1" type="xs:string"/>

<xs:element name = "a2" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

 

The sequence element first contains the element aaa, followed by another element bbb. This element bbb, unlike aaa, does not have a closing tag. Instead, it commences with the familiar complexType and sequence, which in turn has the two tags of a1 and a2. Thus, an element need not necessarily end with a /, and it can be made as complex as desired.

 

However, there appears a slight problem, which however warrants attention. We shall elucidate it with an example.

 

The bbb tag is an address tag that spans at least six tags, containing details of the street, zip code, city, state, country etc. Now, each time an address is to be inserted, all the tags have to be repeated. Would it not be a mark of high efficiency to be able to define an address tag just once and the same is re-used every time. The next example implements this concept.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence>

<xs:element name = "aaa" type="xs:string"/>

<xs:element name = "bbb" type="ttt" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:complexType name = "ttt">

<xs:sequence>

<xs:element name = "a1" type="xs:string"/>

<xs:element name = "a2" type="xs:string"/>

</xs:sequence>

</xs:complexType>

</xs:schema>

 

In the xsd file, the complexType tag is placed outside the element tag, but within the schema tag. The name attribute is set to ttt, but had we so desired, we could have named it as 'address'. We use the same tags wherein the sequence comes first, followed by the element names. The element bbb is then assigned the type of ttt, which is the name of the user-defined type. Thus, within the bbb tag, the tags of a1 and a2 can be used. As before, minOccurs and maxOccurs can be implemented to determine the frequency.

 

The advantage here is that the ttt tag can be reused with diverse elements. It can also be safely placed above the element zzz.

 

b.xml

<?xml version="1.0"?>

<zzz>

<aaa>Bye </aaa>

<bbb>

<a3> yes </a3>

<a4>

<a1>no</a1>

<a2> vijay </a2>

 </a4>

</bbb>

</zzz>

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

<xs:sequence>

<xs:element name = "aaa" type="xs:string"/>

<xs:element name = "bbb" type="uuu" />

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:complexType name = "ttt">

<xs:sequence>

<xs:element name = "a1" type="xs:string"/>

<xs:element name = "a2" type="xs:string"/>

</xs:sequence>

</xs:complexType>

<xs:complexType name = "uuu">

<xs:sequence>

<xs:element name = "a3" type="xs:string"/>

<xs:element name = "a4" type="ttt"/>

</xs:sequence>

</xs:complexType>

</xs:schema>

 

The types can be used unceasingly. The xsd file has the tag zzz, which consists of the tag aaa with a data type of 'string'. Then, comes tag bbb, whose type is uuu, which implies that it shall embody whatever uuu contains. This user defined type uuu commences with the tag a3 of data type string, followed by the tag a4, which in turn incorporates the definition of type ttt. Thus, the tag a4 will contain the tags of a1 and a2 since they define the type ttt.

 

This substantiates the fact that types are extremely useful and can be reused as and when required. The salient point to remember is that while creating an XML schema, the complexType tag has to be used to create types that are meant for re-use. Further, within the complextype, the element type must be employed to create the final data type. A complexType tag cannot be reused if it is not assigned a name.

 

XML offers a wide range of simple types such as, string, integer, etc., which can be coalesced to build types that can be used later for representing real-life business objects.

 

b.xml

<?xml version="1.0"?>

<zzz>

</zzz>

<yyy>

</yyy>

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name = "zzz">

<xs:complexType>

</xs:complexType>

</xs:element>

<xs:element name = "yyy">

<xs:complexType>

</xs:complexType>

</xs:element>

</xs:schema>

 

Error

Element cannot contain text or whitespace. Content model is empty. An error occurred at file:///c:/xmlprg/b.xml(2, 6).

 

Unhandled Exception: System.Xml.XmlException: There are multiple root elements. Line 4, position 2.

 

The above error erupts since only a single root tag is permitted in an xml file. This is one reason why not more than a single element tag is allowed in an xml schema file. In the file, there are two independent elements of zzz and yyy, which are both made as the root elements in the xml file.

 

Thus, bear in mind that a main element tag must always be present, which in turn can contain as many element tags as it fancies. Also, the complexType should be placed outside the element tag.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz" type="xs:string"/>

<xs:element name="aaa" type="xs:string"/>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz />

 

 

b.xml

<?xml version="1.0"?>

<aaa />

 

The above xsd file is similar to the earlier program, but as before, it gives no xsd error. This is because only one of the two elements can be specified as a root tag in the xml file. If both zzz and aaa elements are used in a single xml file, it would result in the same error as observed earlier.

 

b.xml

<?xml version="1.0"?>

<zzz aa="hi" />

 

b.xsd

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<xs:element name="zzz">

<xs:complexType>

<xs:attribute name="aa" type="xs:string" />

</xs:complexType>

</xs:element>

</xs:schema>

 

The root tag can also contain attributes. To substantiate this, the zzz tag is modified to contain an attribute aa that accepts a string. In the xsd file, the attribute tag is entered with the property name set to 'aa' and the type set to 'string'. This tag is then posted within the complexType tag. It cannot get any simpler than this!

 

Thus, the root tag is capable of having multiple elements, as well as, myriad attribute tags.

 

b.xml

<?xml version="1.0"?>

<zzz />

 

In the above XML file, the attribute aa has been omitted. Despite this, no error message is displayed, since attributes are optional.

 

b.xml

<?xml version="1.0"?>

<zzz aa="10000" />

 

b.xsd

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<xs:attribute name="aa">

<xs:simpleType>

<xs:restriction base="xs:integer" />

</xs:simpleType>

</xs:attribute>

<xs:element name="zzz">

<xs:complexType>

<xs:attribute ref="aa" />

</xs:complexType>

</xs:element>

</xs:schema>

 

The attribute tag has a 'ref' tag. This refers to another attribute that has been created elsewhere. Therefore, the properties constituting the 'ref' attribute now become part of the main attribute. Thus, the element zzz circumscribes all the properties contained in aa.

 

This user-defined attribute named aa is enclosed in a simpleType tag. This tag, also known as an element, defines a simple type that determines the constraints or rules that apply to the attributes or elements.

 

In the above case, the restriction tag is used, which establishes the restrictions that need to be imposed on the simpleType. The base attribute specifies the names of built-in data type that the attribute possesses.

 

As per the xsd file, the base attribute derives from the integer class. Thus, a user-defined type now purports itself as a built-in type integer, though this behavior can be altered to deviate from the base type.

 

 

In the above xsd file, aa is created as a simple type that accepts a data type integer. So, specifying a value of 'hi' in place of 1000, would result in the following error:

 

Error

The 'aa' attribute has an invalid value according to its data type. An error occurred at file:///c:/xmlprg/b.xml(2, 6).

 

b.xsd

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >

<xs:attribute name="aa">

<xs:simpleType>

<xs:restriction base="xs:integer">

<xs:maxInclusive value="1000"/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

<xs:element name="zzz">

<xs:complexType>

<xs:attribute ref="aa" />

</xs:complexType>

</xs:element>

</xs:schema>

 

The above xsd file builds a specific error check for the value of the attribute aa. Assigning the value of 10000 to the attribute generates the following error:

 

Error

The 'aa' attribute has an invalid value according to its data type. An error occurred at file:///c:/xmlprg/b.xml(2, 6).

 

The error that occurs can be attributed to the maxInclusive element. This element prohibits all attributes from having values exceeding 1000. Here, since the value of the attribute aa exceeds 1000, the error is imposed. The maxInclusive is called a data facet, since it adds a constraint or restricts the value of the base type.

 

 

b.xsd

<xs:restriction base="xs:integer">

<xs:maxInclusive value="1000"/>

<xs:minInclusive value="100"/>

</xs:restriction>

 

The minInclusive property stipulates the minimum value that the simpleType can have, whereas the maxInclusive determines the maximum value that the element can assume.

 

Thus, in the above case, the value of the attribute is confined to the range of 100 to 1000.

 

The XML Schemas are pressed into action while validating xml documents. Therefore, the xml files that are being validated, such as b.xml, are termed as 'instance documents' by the XML specifications.

 

The simple definition for a complex type is that it contains other elements or tags or attributes within it. We shall use the word 'element' instead of 'tag', in order to adhere to the rules of the XML documentation. A simple type does not carry any sub-elements. However, attributes are always made up of simple types. This is because an attribute cannot contain other elements and attributes.

 

The Schema world has a large number of built-in simple types, which are employed for creating complex types. Thus, complex types may contain elements and attributes, whereas simple types do not contain either elements or attributes. There is also a major disparity between the creation of complex types and simple types. Moreover, their usage also varies.

 

Normally, a complex type declaration contains other elements and attribute declarations, as well as references to elements that have been created elsewhere. The declaring of a complex type not only creates a type, but associates a name with a set of constraints that decide where it can appear in the schema. For instance, the sequence element ensures that the elements appear in the order specified in the declaration.

 

Any type created within an element is local to that element. If the type is created outside of an element under the schema element, it is called a global type. It can be used within the 'ref' attribute. All the entities in the world of schemas such as schemas, elements, types, etc. must belong to a namespace.

 

This namespace is assigned the name http://www.w3.org/2001/XMLSchema, which is a URL. This name is reserved. Thus, it cannot be replaced with any other name. Certain things in life have to be reserved!

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

</xs:schema>

 

b.xml

<?xml version="1.0"?>

 

Error

Unhandled Exception: System.Xml.XmlException: The root element is missing.

 

The above exception occurs due to the absence of the root element in the xml file. As has been mentioned at the very outset of this chapter, every xml file requires a root element and it is the schema that defines the root element . However, no element has been defined in the xsd file. Resultantly, the root element is absent from the XML file.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz" type="xs:string">

<xs:complexType>

</xs:complexType>

</xs:element>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz/>

Exception

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The type element cannot be present with either simpleType or complexType. An error occurred at file:///c:/xmlprg/b.xsd(2, 2).

 

We cannot have an element with a type attribute followed by a complexType that actually creates the type for the element. The complexType element that follows does not override the string type of element zzz. The same rule applies to a simpleType.

 

Thus, there must be a clear demarcation between a type attribute and the complexType element.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz" type="xs:string"/>

<xs:complexType>

</xs:complexType>

</xs:schema>

 

Exception

Unhandled Exception: System.Xml.Schema.XmlSchemaException: The required attribute 'name' is missing. An error occurred at file:///c:/xmlprg/b.xsd(2, 2).

 

Any complexType element that is a child of the schema element must have a name attribute. This rule is not applicable if it is the child of an element. The above error results due to the fact that an anonymous type has been created under the schema parent, thereby denying any utility of the type.

 

b.xsd

....

<xs:complexType name="c1">

....

 

The error vamooses when the type is assigned a name. Further, initiate a root element in the file.

 

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz" type="xs:string"/>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz>

hi

</zzz>

 

The type attribute determines the type of content that can be placed within the tags.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz">

<xs:complexType>

</xs:complexType>

</xs:element>

</xs:schema>

 

Error

Element 'zzz' has invalid child element '#PCDATA'. An error occurred at file:///c:/xmlprg/b.xml(2, 6).

Element cannot contain text or whitespace. Content model is empty. An error occurred at file:///C:/XMLBOO~1/try/b.xml(2, 6).

 

The above error is generic in nature. Therefore, it surfaces very often.

 

The error emanates from the fact that the complexType does not specify any type for the content that can be placed between the elements zzz. Thus, if we specify the number of requisite attribute elements, but do not specify the type for the content that is to be placed between the tags, the same error would meet the eye.

 

b.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="zzz">

<xs:complexType>

<xs:attribute name="aa" type="xs:string" />

</xs:complexType>

</xs:element>

</xs:schema>

 

b.xml

<?xml version="1.0"?>

<zzz aa="hi"/>

 

The above xml file gives no errors as the tag zzz has the attribute of aa of  type 'string', but devoid of all content.