2.
Validating XML Documents
There are two
prominent features of the XML Schema world. One of the features is the creation
of an xsd file, which provides a succinct description of a DataSet. The other
feature is the validation of an xml file with a schema. For instance, if there
is an xml file, such as an invoice representing a purchase order, then this
invoice must have a certain format or syntax, or else it must be bound by a set
of rules. The XML Schema, which is a substitute for the DTD or Document Type
Description, represents these rules .We had dealt with them much earlier.
In this chapter,
our primary focus shall be on learning how to validate an XML file, and on
exploring the depths of the Schema Description Language.
DTDs belong to a
world alien to XML. In other words, a DTD is not an XML document. Therefore,
one has to learn two diverse syntaxes; one for XML and the other for DTDs. This
results in a large number of inconsistencies.
The major flaw
with the DTD world surfaces during the creation of new data types. For
instance, it is elementary to create a data type that restricts the value of an
element to 1000. However, DTD supports only 10 data types, whereas the Schema
world supports more than 44 data types. Besides, the Schema world allows
creation of user-defined data types.
A major portion
of this book delineates features analogous to those of the XML Schema world.
It is fait
accompli that almost every programmer devotes about 50 to 70 percent of his
time in determining whether the data he/she is dealing with, is in the right
format or not. Thus, the mainstay of XML Schema is that it grants the
programmer the liberty to write the rules of data validity and thereafter, he
can conveniently leave it to the XML Validator to verify whether the data
satisfies these conditions or not. This leaves the programmer with plenty of
time on hand to rivet his attention on the job that he is paid for, i.e.
writing code that represents business applications.
When machines are
required to transact businesses with each other, it is the XML Schema that
validates the data being sent across. Before incorporating the data in its
databases, the receiving machine first validates the data by scrutinizing the
schema. It is not mandatory for the schema file to be present on the same
machine. It could be lodging on some other site on the net and could have been
created by some Standards Body.
An XML Schema
specifies the properties of a resource, whereas the XML file specifies a set of
values for the above properties.
Create the
following three files in the xmlprg subdirectory of the root drive.
a.cs
using System;
using System.Xml;
using System.Xml.Schema;
public class zzz
{
public static void Main()
{
XmlTextReader r = new XmlTextReader("b.xml");
XmlValidatingReader v = new XmlValidatingReader(r);
v.ValidationType = ValidationType.Schema;
XmlSchemaCollection c;
c = v.Schemas;
c.Add(null, "b.xsd");
v.ValidationEventHandler += new ValidationEventHandler(abc);
while(v.Read()) ;
}
public static void abc(object s, ValidationEventArgs a)
{
Console.WriteLine(a.Message);
}
}
b.xml
<?xml version="1.0"?>
<zzz >
</zzz>
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
</xs:element>
</xs:schema>
In the program
a.cs, the XmlTextReader 'r' represents the xml file b.xml, which is the file to
be validated. The class that handles xml validation is called the
XmlValidatingReader. The XmlTextReader 'r' is supplied as a parameter to the
constructor of this class.
The
XmlValidatingReader class is equipped to handle a large number of validating
types like DTD, XDR (an earlier representation of XML Schema) and XSD
validations.
Then, the
ValidationType property is set to Schema, since the XML file is to be validated
with the Schema. The property Schemas is of type XmlSchemaCollection, which
stores the name of the XSD file that is to be utilized for validation. The Add
function is used to add the validation to the Schemas property, since it is a
collection object.
In the Add
function, the first parameter is the namespace URI of the schema. Since we are
not using any name, a value of null is supplied for the namespace. The
namespace only comes into play whenever the owner of a certain tag is to be
ascertained. If the entire world got together and thought up a way by which
every tag could be assigned a unique name, then there would never be a necessity
to preface the tag with its namespace.
The second
parameter is the URL of the xsd file. The xsd file incorporates the rules. In
this case, the rules are recorded in the file b.xsd. Moreover, the event
ValidationEventHandler is initialized to a function abc, which is called each
time an error occurs. Thus, while reading the file, the Read function will
apply the validation rules contained in the file b.xsd, and in the event of an
error, it will call the function abc.
The function abc
takes a parameter 'a' of type ValidationEventArgs. This class contains a member
Message that describes the error.
To substantiate
the above statements beyond doubt, effect the following changes in the file
b.xml. The errors, if any, shall aid and enhance our understanding of the XML
Schemas.
b.xml
<?xml version="1.0"?>
<zzzz>
</zzzz>
Error
The 'zzzz' element is not declared. An error occurred at
file:///c:/xmlprg/b.xml(2, 2).
The error occurs
because the xsd file creates an element called zzz since the name property of
the element tag is zzz, whereas in the xml file, the tag is zzzz.
Thus, the first
rule is that it is the element tag that decides the names of the tags in the xml
file; or to be more precise, the name attribute. Every tag in the xml file must
have a corresponding element tag with the name attribute in the xsd file.
Now, change the
tag to zzz and the error completely disappears.
Modify the xsd
file to contain the following:
b.xsd
<xs:schema >
<xs:element name = "zzz">
</xs:element>
</xs:schema>
Error
Unhandled Exception: System.Xml.XmlException: 'xs' is an
undeclared namespace. Line 1, position 2.
The xmlns
attribute identifies the namespace that the tags belong to, if and only if they
are not qualified with a namespace prefix. The attribute xmlns:xs specifies
that the xs prefix is owned by or belongs to, the namespace
http://www.w3.org/2001/XMLSchema.
The error
materializes since the xs prefix has been used in the document without
apprizing the framework about the owner of this prefix.
b.xsd
<schema >
<element name = "zzz">
</element>
</schema>
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
Expected XML Schema root. Make sure that the root element is <schema> and
the namespace is 'http://www.w3.org/2001/XMLSchema'. An error occurred at
file:///c:/xmlprg/b.xsd(1, 2).
On realising that
the xs prefix was the real culprit, we decided to side-step it completely.
Thus, we eliminated all its instances.
On doing so, the
system hurled a new error, indicating that the root element should be schema
located within a certain namespace.
Every xsd file
must necessarily start with the root element of 'schema' and should also end
with it. It has a likeness to life, where we cannot escape from either death or
taxes. All other elements have to be located within this root element.
b.xsd
<vijay:schema
xmlns:vijay="http://www.w3.org/2001/XMLSchema">
<vijay:element name = "zzz">
</vijay:element>
</vijay:schema>
The point being
made here is that, it is the value of the prefix that is of prime significance
and not the xsd prefix. Thus, in the above case, no errors are cast at us,
since we have changed the prefix from xs to vijay, and have assigned the value
of http://www.w3.org/2001/XMLSchema to vijay.
b.xsd
<vijay:schema
xmlns:vijay="http://www.w3.org/2002/XMLSchema">
<vijay:element name = "zzz">
</vijay:element>
</vijay:schema>
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
Expected XML Schema root. Make sure that the root element is <schema> and
the namespace is 'http://www.w3.org/2001/XMLSchema'. An error occurred at
file:///c:/xmlprg/b.xsd(1, 2).
On changing the
URI pointed at by vijay, the compiler throws an exception. The XML framework
had expected 2001 instead of 2002. The rule of the game is that the xmlns
points to the namespace or to the owner of all unqualified tags. We too can
create namespace prefixes and point them to their URI or the owner in the
schema tag.
b.xsd
<schema
xmlns="http://www.w3.org/2001/XMLSchema">
<element name = "zzz">
</element>
</schema>
The above xsd
file does not generate any error, as the xmlns attribute points to the correct
URI and to all the unqualified tags. Henceforth, for the namespaces of the
tags, we shall use the default namespace specified by the xmlns attribute.
b.xsd
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:aaa="http://www.w3.org/2001/XMLSchema">
<aaa:element name = "zzz">
</aaa:element>
</schema>
To place things
in the right perspective, tags such as schema originate from the namespace
http://www.w3.org/2001/XMLSchema, since they are not qualified by any namespace
prefix, whereas tags such as 'element' are to be qualified with the aaa prefix.
These tags originate from the namespace http://www.w3.org/2001/XMLSchema, as
specified with the xmlns:aaa attribute. We prefer using the xs prefix in the
file, despite being aware that it is not essential.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa />
</zzz>
What we had
merely intended was to allow the user to add a tag within the tag zzz. In the
XML file, we added the tag, and when we ran the program a.exe, we were greeted
by the following exception:
Error
The 'aaa' element is not declared. An error occurred at
file:///c:/xmlprg/b.xml(3, 2).
The exception was
bound to occur since in the xsd file, there is no element named aaa within the
element zzz.
b.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:element name="aaa" type="xs:string"
/>
</xs:element>
</xs:schema>
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The 'http://www.w3.org/2001/XMLSchema:element' element is not supported in this
context. An error occurred at file:///c:/xmlprg/b.xsd(3, 2).
Adding an element
called aaa within the element zzz as pursued by us, is simply unacceptable.
Hence, the above exception is generated. What actually needs to be done is
that, the element aaa has to be placed within the tags of complexType.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:element name = "aaa" type="xs:string"
/>
</xs:complexType>
</xs:element>
</xs:schema>
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The 'http://www.w3.org/2001/XMLSchema:element' element is not supported in this
context. An error occurred at file:///c:/xmlprg/b.xsd(4, 2).
The above
exception is reported because the elements are required to be placed in a
specific sequence.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence>
<xs:element name = "aaa" type="xs:string"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
On entering the
elements as required, the error vanishes. The element tag contains name=zzz,
followed by a complexType tag that defines the tag zzz. Within the sequence
tag, multiple elements of zzz can be placed. As of now, the complexType tag has
no name attribute. Thus, it creates an anonymous type. If we had assigned it a
name, then we could have even used it elsewhere in the file.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa> hi </aaa>
<bbb> 100 </bbb>
</zzz>
b.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence>
<xs:element name = "aaa" type="xs:string" />
<xs:element name = "bbb" type="xs:integer" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The xsd file now
encloses two elements named aaa and bbb within the element zzz. Thus, in the
xml file, zzz can now contain these tags. Since aaa has the data type as
string, the value "hi" is enclosed within aaa. The element of bbb has
the data type of int. Therefore, it is assigned the value of 100. Now, attempt
to place "bye" within bbb as follows:
<bbb> bye </bbb>
This generates
the following error:
Error
The 'bbb' element has an invalid value according to its data
type. An error occurred at file:///c:/xmlprg/b.xml(4, 13).
Thus, the XML
framework performs error checks on the data type for each element.
b.xml
<?xml version="1.0"?>
<zzz>
<bbb> 100 </bbb>
<aaa> hi </aaa>
</zzz>
Error
Element 'zzz' has invalid child element 'bbb'. Expected
'aaa'. An error occurred at file:///C:/xmlprg/b.xml(3, 2).
The task of the
sequence tags is to enforce the correct sequence, i.e. the tag aaa comes first,
followed by the tag bbb.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa> hi </aaa>
<aaa> hi1 </aaa>
<bbb> 100 </bbb>
</zzz>
Error
Element 'zzz' has invalid child element 'aaa'. Expected
'bbb'. An error occurred at file:///c:/xmlprg/b.xml(4, 2).
The sequence is
very exacting, with no scope for duplicate tags. Therefore, when aaa is
repeated twice, it generates the error given above.
On doing away
with the aaa tags completely, the following error is displayed:
Error
Element 'zzz' has invalid child element 'bbb'. Expected
'aaa'. An error occurred at file:///c:/xmlprg/b.xml(3, 2).
Most of the
explanations rendered here have already been covered earlier. However, another
revision would surely do no harm. This revision would explore the xsd elements
in greater detail. Also, we shall desist from repeating the entire xsd. So, we
shall only focus on the changes that are to be incorporated.
b.xsd
...
<xs:element name = "aaa" type="xs:string"
id="a1"/>
<xs:element name = "bbb"
type="xs:integer" id="a1"/>
...
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
Invalid 'id' attribute value - Duplicate id attribute. An error occurred at
file:///c:/xmlprg/b.xsd(6, 2).
When used, the id
attribute must hold unique values, for it is of type ID and it marks each
entity with a distinct name. Since both the elements have been assigned the
same id, an exception is thrown. The id attribute has no kinship whatsoever
with the data in the xml file.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa> Hi </aaa>
<aaa> Hi </aaa>
<bbb> 100 </bbb>
</zzz>
b.xsd
...
<xs:element name = "aaa" type="xs:string"
maxOccurs ="2"/>
<xs:element name = "bbb"
type="xs:integer" maxOccurs ="4"/>
...
The value
assigned to the maxOccurs attribute determines the maximum number of times that
the element can occur in the file. Based on the values specified in the xsd
file, aaa can be repeated twice, while bbb can be repeated four times.
The maxOccurs
attribute only specifies the maximum number, and not the minimum number of
occurrences. Therefore, even if bbb is present only once in the file, it does
not generate any error. If you do not desire an upper limit, you should specify
the value of 'unbounded' and if you want to prevent its usage, you should set
its value to 0. After doing so, if we attempt at using the element, an error is
generated.
b.xsd
<xs:element name = "aaa" type="xs:string"
minOccurs ="2" />
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
minOccurs value cannot be greater than maxOccurs value. An error occurred at
file:///C:/xmlprg/b.xsd(5, 2).
By default, the
value of the maxOccurs attribute is 1. Thus, when it is not specified, the tag
can appear only once. The above modification results in an exception, because
the minOccurs value of 2 mandates that the tag should be present at least
twice; however, this is in conflict with the value of maxOccurs, which confines
the tag to making only a single appearance.
b.xsd
<xs:element name = "aaa" type="xs:string"
minOccurs ="2" maxOccurs="4"/>
The singular
solution to the above poser is to expressly specify the value for the minOccurs
and the maxOccurs attributes. The minOccurs attribute has a value of 2, which
decrees that the tag should appear at least twice in the file.
The maxOccurs
with a value of 4 ensures that the tag occurrences do not exceed the count of
4. Bear in mind that the minOccurs value cannot possibly be larger than that of
maxOccurs.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa>hi </aaa>
<bbb>100</bbb>
<bbb>100</bbb>
<aaa>hi </aaa>
</zzz>
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence minOccurs="0"
maxOccurs="unbounded">
<xs:element name = "aaa" type="xs:string"
minOccurs="0" />
<xs:element name = "bbb" type="xs:integer"
minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The minOccurs and
maxOccurs that make numerous appearances, have a likeness in their behaviour.
In the XML file,
the aaa and bbb tags appear at random. This is because the sequence tag now
allows all the elements in it, to either have no occurrences of the tags at all
or to have unlimited occurrences since the maxOccurs is 'unbounded'. However,
the minOccurs for each of the tags also needs to be set to the minimum value of
0 to avoid any mismatch at the sequence level.
b.xml
<?xml version="1.0"?>
<zzz>
<yyy>
<aaa>Bye </aaa>
<bbb>100</bbb>
</yyy>
<yyy>
<aaa>Bye </aaa>
<bbb>100</bbb>
</yyy>
</zzz>
b.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="zzz" xmlns=""
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="zzz">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="yyy">
<xs:complexType>
<xs:sequence>
<xs:element name="aaa" type="xs:string"
minOccurs="0" />
<xs:element name="bbb" type="xs:integer"
minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
In the xml file,
we have attempted to place the tag yyy in the root tag. The tag yyy in turn
encloses two tags, viz. aaa and bbb. In addition to this, the tag yyy can enjoy
multiple occurrences in the file.
In the xsd file,
there exists an element named zzz, which is the root element. It can occur only
once. As is the normal case, the name of the element is followed by its
definition. This definition is placed in a complexType tag. Then, depending
upon the situation, either a choice or a sequence tag can be implemented.
A choice gives
the option of choosing one option from amongst many, whereas a sequence
determines the order. Here, since we have only one choice, i.e. yyy, the choice
tag does not play a very prolific role. A sequence tag could be used instead.
However, only one of them must be implemented, or else an exception will be
thrown. The value assigned to maxOccurs allows more than one occurrence.
Thus, using the
element named yyy, the yyy tag can be placed within zzz. In turn, this tag can
carry other tags. Thus, we have a complexType followed by a sequence. The tags
of aaa and bbb are placed in sequence. Hence, they can occur within the tag
yyy. A value of 0 for minOccurs implies that the presence of the tags aaa and
bbb remains optional. Finally, all the open tags are closed.
b.xsd
<xs:element name="zzz" minOccurs="0">
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The 'minOccurs' attribute cannot be present. An error occurred at
file:///c:/xmlprg/b.xsd(3, 2).
The above exception
is thrown because the root element zzz must occur at least once. Hence, the
minOccurs attribute can never be used. You can see that there are numerous
error checks to be made for just a single attribute!
b.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz">
<xs:complexType>
<xs:sequence>
<xs:element name="aaa" type="xs:string"
minOccurs="0" maxOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz>
<aaa />
</zzz>
Error
Element cannot contain text or whitespace. Content model is
empty. An error occurred at file:///c:/xmlprg/b.xml(2, 6).
Element 'zzz' has invalid child element 'aaa'. An error occurred
at file:///c:/xmlprg/b.xml(3, 2).
The 'aaa' element is not declared. An error occurred at
file:///c:/xmlprg/b.xml(3, 2).
Element cannot contain text or whitespace. Content model is
empty. An error occurred at file:///c:/xmlprg/b.xml(3, 8).
All the above
errors point to the xml file. In the element aaa, the implication of setting
the value of minOccurs to 0 is that aaa can occur a minimum of zero times, i.e.
it can remain absent. On the other hand, the implication of setting the value
of maxOccurs to 0 is that it can occur a maximum of zero times, which implies
that it can never be present.
Thus, in order to
satisfy both these conditions, the element cannot be present in the xml file.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz">
<xs:complexType>
<xs:sequence>
<xs:element name="aaa" type="xs:string"
minOccurs="0" maxOccurs="0" />
<xs:element name="bbb"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz>
<bbb />
</zzz>
Just to
corroborate the above statement, we have eliminated the aaa tag from the xml
file, and merely added a bbb element. This generates no error whatsoever.
Thus, by setting
the minOccurs and maxOccurs to 0, the element aaa does exist in the complexType,
but is rendered ineffective.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz">
<xs:complexType>
<xs:sequence minOccurs="0"
maxOccurs="0">
<xs:element name="aaa"
type="xs:string"/>
<xs:element name="bbb" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz />
The same rules
are applicable across the board. When we add minOccurs and maxOccurs to the
sequence, both with a value of zero each, the entire sequence gets disabled.
Therefore, the sequence can never be used. In the elements aaa and bbb, even
though the default values of minOccurs and maxOccurs are 1, they can never be
used.
The above
combination of attributes effectively disables everything that ensues. But, no
XML Schema error is thrown.
b.xsd
<xs:sequence minOccurs="-1"
maxOccurs="0">
Error
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The value for the 'minOccurs' attribute is invalid - The value for the
'minOccurs' attribute must be xs:nonNegativeInteger. An error occurred at
file:///c:/xmlprg/b.xsd(4, 15).. An error occurred at
file:///c:/xmlprg/b.xsd(4, 15).
Every attribute
has a data type, which in the case of minOccurs is a nonNegativeInteger.
Whenever a wrong type is used, the above generic error gets displayed. In the
case of an attribute of the above type, the tag minOccurs can only have a value
of 0 or more.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa>Bye </aaa>
<bbb>
<a1>no</a1>
<a2>yes</a2>
</bbb>
</zzz>
In the xml file,
the root tag zzz encloses a tag aaa. Within this tag aaa exists another tag
bbb, which in turn contains the tags of a1 and a2. This kind of layout of 'tags
within tags within tags' is encountered very often in various files. In order
to avert all errors in the xml file, the xsd file should be as follows:
b.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence>
<xs:element name = "aaa"
type="xs:string"/>
<xs:element name = "bbb">
<xs:complexType>
<xs:sequence>
<xs:element name = "a1"
type="xs:string"/>
<xs:element name = "a2"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The sequence
element first contains the element aaa, followed by another element bbb. This
element bbb, unlike aaa, does not have a closing tag. Instead, it commences
with the familiar complexType and sequence, which in turn has the two tags of
a1 and a2. Thus, an element need not necessarily end with a /, and it can be
made as complex as desired.
However, there
appears a slight problem, which however warrants attention. We shall elucidate
it with an example.
The bbb tag is an
address tag that spans at least six tags, containing details of the street, zip
code, city, state, country etc. Now, each time an address is to be inserted,
all the tags have to be repeated. Would it not be a mark of high efficiency to
be able to define an address tag just once and the same is re-used every time.
The next example implements this concept.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence>
<xs:element name = "aaa"
type="xs:string"/>
<xs:element name = "bbb" type="ttt" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name = "ttt">
<xs:sequence>
<xs:element name = "a1" type="xs:string"/>
<xs:element name = "a2"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
In the xsd file,
the complexType tag is placed outside the element tag, but within the schema
tag. The name attribute is set to ttt, but had we so desired, we could have
named it as 'address'. We use the same tags wherein the sequence comes first,
followed by the element names. The element bbb is then assigned the type of
ttt, which is the name of the user-defined type. Thus, within the bbb tag, the
tags of a1 and a2 can be used. As before, minOccurs and maxOccurs can be
implemented to determine the frequency.
The advantage
here is that the ttt tag can be reused with diverse elements. It can also be
safely placed above the element zzz.
b.xml
<?xml version="1.0"?>
<zzz>
<aaa>Bye </aaa>
<bbb>
<a3> yes </a3>
<a4>
<a1>no</a1>
<a2> vijay </a2>
</a4>
</bbb>
</zzz>
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
<xs:sequence>
<xs:element name = "aaa"
type="xs:string"/>
<xs:element name = "bbb" type="uuu" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name = "ttt">
<xs:sequence>
<xs:element name = "a1"
type="xs:string"/>
<xs:element name = "a2"
type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name = "uuu">
<xs:sequence>
<xs:element name = "a3"
type="xs:string"/>
<xs:element name = "a4" type="ttt"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
The types can be
used unceasingly. The xsd file has the tag zzz, which consists of the tag aaa
with a data type of 'string'. Then, comes tag bbb, whose type is uuu, which
implies that it shall embody whatever uuu contains. This user defined type uuu
commences with the tag a3 of data type string, followed by the tag a4, which in
turn incorporates the definition of type ttt. Thus, the tag a4 will contain the
tags of a1 and a2 since they define the type ttt.
This
substantiates the fact that types are extremely useful and can be reused as and
when required. The salient point to remember is that while creating an XML
schema, the complexType tag has to be used to create types that are meant for
re-use. Further, within the complextype, the element type must be employed to
create the final data type. A complexType tag cannot be reused if it is not
assigned a name.
XML offers a wide
range of simple types such as, string, integer, etc., which can be coalesced to
build types that can be used later for representing real-life business objects.
b.xml
<?xml version="1.0"?>
<zzz>
</zzz>
<yyy>
</yyy>
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name = "zzz">
<xs:complexType>
</xs:complexType>
</xs:element>
<xs:element name = "yyy">
<xs:complexType>
</xs:complexType>
</xs:element>
</xs:schema>
Error
Element cannot contain text or whitespace. Content model is
empty. An error occurred at file:///c:/xmlprg/b.xml(2, 6).
Unhandled Exception: System.Xml.XmlException: There are
multiple root elements. Line 4, position 2.
The above error
erupts since only a single root tag is permitted in an xml file. This is one
reason why not more than a single element tag is allowed in an xml schema file.
In the file, there are two independent elements of zzz and yyy, which are both
made as the root elements in the xml file.
Thus, bear in
mind that a main element tag must always be present, which in turn can contain
as many element tags as it fancies. Also, the complexType should be placed
outside the element tag.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz"
type="xs:string"/>
<xs:element name="aaa"
type="xs:string"/>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz />
b.xml
<?xml version="1.0"?>
<aaa />
The above xsd
file is similar to the earlier program, but as before, it gives no xsd error.
This is because only one of the two elements can be specified as a root tag in
the xml file. If both zzz and aaa elements are used in a single xml file, it
would result in the same error as observed earlier.
b.xml
<?xml version="1.0"?>
<zzz aa="hi" />
b.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:element name="zzz">
<xs:complexType>
<xs:attribute name="aa" type="xs:string"
/>
</xs:complexType>
</xs:element>
</xs:schema>
The root tag can
also contain attributes. To substantiate this, the zzz tag is modified to
contain an attribute aa that accepts a string. In the xsd file, the attribute
tag is entered with the property name set to 'aa' and the type set to 'string'.
This tag is then posted within the complexType tag. It cannot get any simpler
than this!
Thus, the root
tag is capable of having multiple elements, as well as, myriad attribute tags.
b.xml
<?xml version="1.0"?>
<zzz />
In the above XML
file, the attribute aa has been omitted. Despite this, no error message is
displayed, since attributes are optional.
b.xml
<?xml version="1.0"?>
<zzz aa="10000" />
b.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:attribute name="aa">
<xs:simpleType>
<xs:restriction base="xs:integer" />
</xs:simpleType>
</xs:attribute>
<xs:element name="zzz">
<xs:complexType>
<xs:attribute ref="aa" />
</xs:complexType>
</xs:element>
</xs:schema>
The attribute tag
has a 'ref' tag. This refers to another attribute that has been created
elsewhere. Therefore, the properties constituting the 'ref' attribute now
become part of the main attribute. Thus, the element zzz circumscribes all the
properties contained in aa.
This user-defined
attribute named aa is enclosed in a simpleType tag. This tag, also known as an
element, defines a simple type that determines the constraints or rules that
apply to the attributes or elements.
In the above
case, the restriction tag is used, which establishes the restrictions that need
to be imposed on the simpleType. The base attribute specifies the names of
built-in data type that the attribute possesses.
As per the xsd
file, the base attribute derives from the integer class. Thus, a user-defined
type now purports itself as a built-in type integer, though this behavior can
be altered to deviate from the base type.
In the above xsd
file, aa is created as a simple type that accepts a data type integer. So,
specifying a value of 'hi' in place of 1000, would result in the following
error:
Error
The 'aa' attribute has an invalid value according to its data
type. An error occurred at file:///c:/xmlprg/b.xml(2, 6).
b.xsd
<?xml version="1.0" encoding="utf-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema" >
<xs:attribute name="aa">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:maxInclusive value="1000"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:element name="zzz">
<xs:complexType>
<xs:attribute ref="aa" />
</xs:complexType>
</xs:element>
</xs:schema>
The above xsd
file builds a specific error check for the value of the attribute aa. Assigning
the value of 10000 to the attribute generates the following error:
Error
The 'aa' attribute has an invalid value according to its data
type. An error occurred at file:///c:/xmlprg/b.xml(2, 6).
The error that
occurs can be attributed to the maxInclusive element. This element prohibits
all attributes from having values exceeding 1000. Here, since the value of the
attribute aa exceeds 1000, the error is imposed. The maxInclusive is called a
data facet, since it adds a constraint or restricts the value of the base type.
b.xsd
<xs:restriction base="xs:integer">
<xs:maxInclusive value="1000"/>
<xs:minInclusive value="100"/>
</xs:restriction>
The minInclusive
property stipulates the minimum value that the simpleType can have, whereas the
maxInclusive determines the maximum value that the element can assume.
Thus, in the
above case, the value of the attribute is confined to the range of 100 to 1000.
The XML Schemas
are pressed into action while validating xml documents. Therefore, the xml
files that are being validated, such as b.xml, are termed as 'instance
documents' by the XML specifications.
The simple
definition for a complex type is that it contains other elements or tags or
attributes within it. We shall use the word 'element' instead of 'tag', in
order to adhere to the rules of the XML documentation. A simple type does not
carry any sub-elements. However, attributes are always made up of simple types.
This is because an attribute cannot contain other elements and attributes.
The Schema world
has a large number of built-in simple types, which are employed for creating
complex types. Thus, complex types may contain elements and attributes, whereas
simple types do not contain either elements or attributes. There is also a
major disparity between the creation of complex types and simple types.
Moreover, their usage also varies.
Normally, a
complex type declaration contains other elements and attribute declarations, as
well as references to elements that have been created elsewhere. The declaring
of a complex type not only creates a type, but associates a name with a set of
constraints that decide where it can appear in the schema. For instance, the
sequence element ensures that the elements appear in the order specified in the
declaration.
Any type created
within an element is local to that element. If the type is created outside of
an element under the schema element, it is called a global type. It can be used
within the 'ref' attribute. All the entities in the world of schemas such as
schemas, elements, types, etc. must belong to a namespace.
This namespace is
assigned the name http://www.w3.org/2001/XMLSchema, which is a URL. This name
is reserved. Thus, it cannot be replaced with any other name. Certain things in
life have to be reserved!
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
</xs:schema>
b.xml
<?xml version="1.0"?>
Error
Unhandled Exception: System.Xml.XmlException: The root
element is missing.
The above
exception occurs due to the absence of the root element in the xml file. As has
been mentioned at the very outset of this chapter, every xml file requires a
root element and it is the schema that defines the root element . However, no
element has been defined in the xsd file. Resultantly, the root element is
absent from the XML file.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz"
type="xs:string">
<xs:complexType>
</xs:complexType>
</xs:element>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz/>
Exception
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The type element cannot be present with either simpleType or complexType. An
error occurred at file:///c:/xmlprg/b.xsd(2, 2).
We cannot have an
element with a type attribute followed by a complexType that actually creates
the type for the element. The complexType element that follows does not
override the string type of element zzz. The same rule applies to a simpleType.
Thus, there must
be a clear demarcation between a type attribute and the complexType element.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz"
type="xs:string"/>
<xs:complexType>
</xs:complexType>
</xs:schema>
Exception
Unhandled Exception: System.Xml.Schema.XmlSchemaException:
The required attribute 'name' is missing. An error occurred at
file:///c:/xmlprg/b.xsd(2, 2).
Any complexType
element that is a child of the schema element must have a name attribute. This
rule is not applicable if it is the child of an element. The above error
results due to the fact that an anonymous type has been created under the
schema parent, thereby denying any utility of the type.
b.xsd
....
<xs:complexType name="c1">
....
The error vamooses
when the type is assigned a name. Further, initiate a root element in the file.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz"
type="xs:string"/>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz>
hi
</zzz>
The type
attribute determines the type of content that can be placed within the tags.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz">
<xs:complexType>
</xs:complexType>
</xs:element>
</xs:schema>
Error
Element 'zzz' has invalid child element '#PCDATA'. An error
occurred at file:///c:/xmlprg/b.xml(2, 6).
Element cannot contain text or whitespace. Content model is
empty. An error occurred at file:///C:/XMLBOO~1/try/b.xml(2, 6).
The above error
is generic in nature. Therefore, it surfaces very often.
The error
emanates from the fact that the complexType does not specify any type for the
content that can be placed between the elements zzz. Thus, if we specify the
number of requisite attribute elements, but do not specify the type for the
content that is to be placed between the tags, the same error would meet the
eye.
b.xsd
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="zzz">
<xs:complexType>
<xs:attribute name="aa" type="xs:string"
/>
</xs:complexType>
</xs:element>
</xs:schema>
b.xml
<?xml version="1.0"?>
<zzz aa="hi"/>
The above xml
file gives no errors as the tag zzz has the attribute of aa of type 'string', but devoid of all content.