8
The DTD
Validations
in XML
So far, we have only read an XML
file, without catering to special cases, wherein, either an entity has been
used, or data has to be validated as per the element. The XmlTextReader class is
the most optimum choice for reading an XML file, barring the cases where data
has to be validated, or in cases where an entity has to be replaced with a
value. For such purposes, the XmlValidatingReader class is more suited. This
class is derived from XmlReader, and it conducts three types of validations-
DTD, XDR and XSD schema validations.
This class is used when the
primary task is either to conduct data validations or to resolve general
entities or to provide support for default entities.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlValidatingReader r = null;
XmlParserContext p;
p = new XmlParserContext(null, null, "vijay", null, null, "<!ENTITY pr '100'>","","", XmlSpace.None);
r = new XmlValidatingReader ("<vijay mukhi='great' price='Rs ≺'></vijay>", XmlNodeType.Element, p);
r.ValidationType = ValidationType.None;
r.MoveToContent();
while (r.MoveToNextAttribute())
{
Console.WriteLine("{0} = {1}", r.Name, r.Value);
}
r.Close();
}
}
Output
mukhi = great
price = Rs 100
To create the object p of type
XmlParserContext, the constructor with nine parameters of XmlParserContext
class is called. The nine parameters are as follows:
• The first parameter refers to the NameTable type. It has a value of null.
• The second parameter refers to NamespaceManager type. It also has a value of null.
• The third Parameter is the DocType, i.e. the root tag 'vijay'.
• The fourth parameter is the pubid for the external DTD file.
• The fifth parameter is the sysid for the external DTD file.
• The sixth parameter is the internal DTD, where an ENTITY declaration <!ENTITY pr '100'> has been created. This simply states that the word 'pr' is preceded by a '&' and followed by a semi-colon must be replaced with the string '100'.
• The seventh parameter in sequence is the location from where the fragment is to be loaded, i.e. the base URI.
• The eighth parameter stands for the xml:lang scope.
• The ninth parameter stands for the xml:space scope.
The parameters to the
constructor of XmlValidatingReader class are similar to those of the
XmlTextReader, which we had encountered earlier. This class is derived from the
XmlTextReader as well as the IXmlLineInfo interface.
There are five different values
that a Validationtype can be initialized to:
1. The first is Auto, which validates only when the DTD or schema information is found.
2. The second is DTD, which validates based on the instructions found in the DTD.
3. The third option, which creates an XML 1.0 non-validation parser, validates the default attributes and resolves entities without using the DOCTYPE. Thus, if the root tag is changed from 'vijay' to 'vijay1', no errors will be generated. Placing the ValidationType statement within comments will generate the following exception:
"Unhandled Exception: System.Xml.Schema.XmlSchemaException: The root element name must match the DocType name. An error occurred at (1, 2)."
4. The fourth option is XSD, which validates as per the XSD schemas.
5. The fifth option is XDR, which validates as per the XDR schemas. In our program we have set this property to a value of None.
Once the required properties are
set, the MoveToContent function is used to move to the first element, 'vijay'.
The next function, MoveToNextAttribute returns a value of True when there are
attributes remaining to be read. Otherwise, it returns a value of False. In our
case, it is similar to the MoveToFirstElement function.
The while loop repeats twice,
since there are two attributes. The Name and Value properties for the first
attribute are displayed as 'mukhi' and 'great'. This is very similar to what we
have observed in the earlier program. The name for the second attribute is
displayed as 'price'. However, its value is not the same, because it has an
entity ≺. The XmlValidatingReader replaces the entity pr with the string
'100', prior to displaying the value.
Therefore, the output is displayed as 'price' and 'Rs. 100'.
a.cs
using System;
using System.IO;
using System.Xml;
using System.Xml.Schema;
class zzz
{
public static void Main()
{
XmlTextReader r = new XmlTextReader("b.xml");
XmlValidatingReader v = new XmlValidatingReader(r);
v.ValidationType = ValidationType.DTD;
v.ValidationEventHandler += new ValidationEventHandler (abc);
while(v.Read());
}
public static void abc(object s, ValidationEventArgs a)
{
Console.WriteLine("Severity:{0}", a.Severity);
Console.WriteLine("Message:{0}", a.Message);
}
}
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay1 >
<vijay>
</vijay>
Output
Severity:Error
Message:The root element name must match the DocType name. An error occurred at file:///c:/csharp/b.xml(3, 2).
Severity:Error
Message:The 'vijay' element is not declared. An error occurred at file:///c:/csharp/b.xml(3, 2).
In the above program, to begin
with, an object r that looks like XmlTextReader is created, and then, it is
passed to the constructor of XmlValidatingReader, while object v is being
created. The ValidationType of the object v is modified to DTD. The
ValidationEventHandler event is set to the function abc, which gets called
whenever an error occurs. Under the aegis of the Read function, the entire XML
file is validated, using the while loop, and the function abc is notified
whenever an error is chanced upon.
In the function abc, the values
contained in the properties - Severity and Message, of the ValidationEventArgs
parameter 'a', are printed. The Severity property reveals whether it is an
error or warning, whereas, the Message property contains the precise text of the
error or warning.
In the above case, an error is
generated because the DOCTYPE expects the root element to be 'vijay1', whereas,
it has been specified as 'vijay'. When no error message is displayed, it may be
inferred that no errors have been found.
The
DTD
Using the above C# program, we
shall now create our own DTD file. Therefore, we shall modify only the b.xml
and b.dtd files.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay SYSTEM "b.dtd" >
<vijay />
b.dtd
<!ELEMENT vijay >
A DTD is generally very
protracted. So, an internal DTD is rarely used. If it is used, its contents
have to be placed within [] brackets. To use an external DTD, we use the words
SYSTEM followed by the name of the DTD file, which is b.dtd, in this case.
In b.dtd, an element 'vijay' is
created by inserting the reserved characters '<!', followed by ELEMENT, and
finally by the element name 'vijay'. When we run the C# program 'a', the
following error is generated:
Output
Unhandled Exception: System.Xml.XmlException: This is an invalid content model. Line 1, position 17.
An error in the DTD file has
resulted in the generation of an un-handled exception. The error occurred due
to an incomplete ELEMENT statement.
b.dtd
<!ELEMENT vijay EMPTY>
The addition of the word EMPTY
salvages the situation. By specifying the word EMPTY, it is amply clear that
the element named 'vijay' is an empty element.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE vijay SYSTEM "b.dtd" >
<vijay>
</vijay>
Output
Severity:Error
Message:Element 'vijay' has invalid child element '#PCDATA'. An error occurred at file:///c:/csharp/b.xml(3, 8).
The DTD file states, with
absolute clarity, that the ELEMENT 'vijay' is EMPTY. However, an open tag
<vijay> and a close tag </vijay>have been added to the XML file.
Therefore, an error message is generated, which, as usual, is unintelligible.
Instead of using tags such as
'vijay', let us consider a DTD that has been implemented in real life. This one
is used for the WML, or the Wireless Markup Language. The rules or syntax of
WML are available as a DTD.
In our book titled 'WML and
WMLScript', we have endeavoured to elucidate the concept of a DTD. You are at
liberty to refer to the book. However, we must caution you that, the approach
and the explanation used here is entirely at variance with the one used in the
earlier book.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
</wml>
b.dtd
<!ELEMENT wml EMPTY>
Output
Severity:Error
Message:Element 'wml' has invalid child element '#PCDATA'. An error occurred at file:///C:/csharp/b.xml(3, 6).
The word 'vijay' has merely been
replaced by the word 'wml'. The error generated is akin to the earlier one. At
this juncture, we introduce a 'card' into the DTD file.
b.dtd
<!ELEMENT wml (card)>
Output
Severity:Error
Message:Element 'wml' has incomplete content. Expected 'card'. An error occurred at file:///c:/csharp/b.xml(4, 3).
Every WML document must commence
with the root tag 'wml'. In the DTD file, we have placed the word 'card' within
round brackets, along with wml. This signifies that the wml tag must contain a
tag or an element called 'card'. Since there is no card in the XML file, an
error is reported, stating that a card is expected, and on account of its
unavailability, the wml element is incomplete.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
</wml>
Output
Severity:Error
Message:The 'card' element is not declared. An error occurred at file:///c:/csharp/b.xml(4, 2)
We add the card tag as a single
tag to our XML file, in an endeavour to eliminate the error. But, as we have
not specified 'card' as a valid element in the DTD file, yet another error
message is displayed. Unless 'card' appears as an ELEMENT in the DTD file, it
is not possible to use it in the XML file. Therefore, we now include 'card' as
an EMPTY element in b.dtd
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
Now, all the errors just vanish.
In the DTD file, we had affirmed that the element 'card' shall be empty i.e. it
will not have any content.
The XML file depicted below
displays an error, because the 'card' tag is not a single tag any longer.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
</card>
</wml>
Output
Severity:Error
Message:Element 'card' has invalid child element '#PCDATA'. An error occurred at file:///C:/csharp/b.xml(4, 7).
The error message displayed here
is very similar to the one seen with the wml tag.
The element 'wml' has an invalid child element '#PCDATA'
A slight modification to the XML
file is desirable, before we endeavour to eliminate the error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
hi
</card>
</wml>
Output
Severity:Error
Message:Element 'card' has invalid child element 'Text'. An error occurred at file:///c:/csharp/b.xml(4, 7).
Inserting the word 'hi' between
the card tags results in a slightly altered error messages. In place of PCDATA,
we get to see Text. Resorting to the following modifications to the DTD file,
both the error messages can be eliminated.
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card (#PCDATA)>
To eradicate the errors, the
EMPTY word is replaced with #PCDATA, enclosed within round brackets. The word
PCDATA is an acronym for Parseable Character Data. In plain English, it
represents text that can be entered from the keyboard. Thus, we are at liberty
to write as many lines of text as we want, within the card tag. Even if the
word 'hi' is removed from within the tags, no error is generated.
Our DTD expects a root tag or
starting tag of wml. Only a card tag can be inserted amidst within this tag,
which is capable of containing limitless content. Insertion of anything else in
this tag is a sure recipe for disaster.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
</card>
<card>
</card>
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(6, 2).
The above error has occurred
because, the DTD clearly specifies that the root tag wml must have one, and
only one, occurrence of the tag called 'card' within it. Here, we have created
two tags, thereby, causing the error.
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (#PCDATA)>
The * symbol, placed after the
round brackets, is indicative of the fact that, it can be replaced with zero to
infinite values. Thus, the XML file can now either have zero or countless card
elements. If you do not give credence to this statement of ours, you may either
delete all the card elements from the XML file, or add numerous cards. Either
way, no error will be generated.
b.dtd
<!ELEMENT wml (card)+>
<!ELEMENT card (#PCDATA)>
Replacing the symbol * with a +
transforms the meaning from 'zero to infinity' to 'one to infinity'. The only
difference between the * symbol and the + symbol is that, the + sign mandates
at least one occurrence of the element whereas, the * signs makes it optional.
Thus, in the aboveXMLfile, at least a single card element is required.
b.dtd
<!ELEMENT wml (card)?>
<!ELEMENT card (#PCDATA)>
The last of the special
characters is the symbol ? that
specifies the number of elements to be from 'zero or one'. Thus in the XML file,
we may have either one card element or none at all. The presence of two or more
cards will generate an error. You should try out various possible combinations
for each of the symbols *, + and?.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> hi </p>
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)>
<!ELEMENT p (#PCDATA)>
No error is generated because,
in the DTD file, we have now stated that, the card element can have a tag p,
which can contain any text. We have, however, done away with the provision of
placing any text within the card tag.
Add in a new modification to the
file.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> <b/> </p>
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)>
<!ELEMENT p (br | b)>
<!ELEMENT br EMPTY>
<!ELEMENT b EMPTY>
The DTD appears extensively
complicated. The p tag is now competent of containing only two tags, br and b.
Text is not allowed any more. The | sign signifies the OR condition, which
implies that either tag b or tag br is allowed. The two aforesaid tags are
defined as EMPTY tags. To summarise, our DTD states that the p tag can contain
a single tag of either b or br.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
<p> <b/> <br/></p>
</card>
</wml>
Output
Severity:Error
Message:Element 'p' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(5, 11).
All is not well, because we are allowed
to place either a 'b' or a 'br' at a time, but not both together. To remedy the
situation, we place a * symbol after the p tag.
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (p)*>
<!ELEMENT p (br | b)*>
<!ELEMENT br EMPTY>
<!ELEMENT b EMPTY>
The above DTD provides us the
flexibility of having multiple p tags within n number of cards. These, in turn,
may have as many b or br tags as desired.
By replacing the b tag with
#PCDATA, a p tag is in a position to accommodate multiple br tags, as well as an
indefinite amount of text.
<!ELEMENT p (br | #PCDATA)*>
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<head />
</wml>
b.dtd
<!ELEMENT wml (card,head)>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
The above DTD file permits the
wml tag to contain a card tag, which is then to be strictly followed by a head
tag. The comma signifies that one tag is to be followed by the other. If we
refrain from using the head tag in the XML file, the following error message
will be generated:
Output
Severity:Error
Message:Element 'wml' has incomplete content. Expected 'head'. An error occurred at file:///C:/csharp/b.xml(5, 3).
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<head />
<card />
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected 'card'. An error occurred at file:///c:/csharp/b.xml(4, 2).
If the order of the tags is
interchanged, an error is thrown. The card tag must be followed by the head
tag. Besides, there is a restriction imposed that there can be only one
insertion of each tag. If there are multiple insertions, it will result in an
error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<card />
<head />
</wml>
b.dtd
<!ELEMENT wml (card+,head?)>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
When the plus sign is inserted
after the card, it allows the use of more that one card tag in the file. The ?
sign denotes 'zero or one' insertions of the head tag. Thus, we can have more
than one card tag and have either a single head tag or none at all. If the head
tag is present, it must be placed after the card tag, since the order of the
tags is sacrosanct.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
<head />
<card />
</wml>
Output
Severity:Error
Message:Element 'wml' has invalid content. Expected ''. An error occurred at file:///c:/csharp/b.xml(6, 2).
The Draconian restrictions
imposed by the DTD file prohibit us from altering the sequence of the above
tags. The card tag has to come first, followed by the head tag. We cannot
interchange a head tag with a card tag.
So, the only solution to this problem is to abide by the stipulated
sequence.
b.dtd
<!ELEMENT wml (card+,head?,template*)*>
<!ELEMENT card EMPTY>
<!ELEMENT head EMPTY>
<!ELEMENT template EMPTY>
In the DTD file, we have added a
* symbol to the entire set of tags, which make up the wml element. The set
consists of the following individual elements in a sequential order:
• More than one card tags.
• Zero or one head tag.
• Zero to many template tags.
This set can constitute of
numerous permutations and combinations of the above conditions, in the
specified order. Thus, the card and head can appear together, or the card can
appear by itself without the head tag, or the template tag may not be present
at all, and so on. Every occurrence,
however, needs to begin with a card tag.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa CDATA #IMPLIED>
In the above example, the card
tag has an attribute called aa initialized to 'hi'. To implement an attribute,
we include the word ATTLIST, which is a short form for 'a list of attributes',
in the DTD file. This is followed by the name of the tag that the attribute is
associated with. Then, the actual name of the attribute aa is specified,
followed by the datatype it will hold, which is character data, in our case.
The last parameter, #IMPLIED permits the attribute aa to be optional.
Therefore, even if you remove it, no error will be generated.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card />
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa CDATA #IMPLIED bb CDATA #REQUIRED>
Output
Severity:Error
Message:The required attribute 'bb' is missing. An error occurred at file:///c:/csharp/b.xml(4, 2).
The error message clearly
mentions that the attribute bb is missing. The #REQUIRED demands the presence
of attribute bb, along with the card, whenever the card tag is used. Further,
the attributes are to be placed one after the other. However, the order of
placement is not significant.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card bb="no"/>
</wml>
No errors are generated since
the attribute bb, which is mandatory, has been specified. You can avoid aa,
since it is implied.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="no"/>
</wml>
b.dtd
<!ELEMENT wml (card)>
<!ELEMENT card EMPTY>
<!ATTLIST card aa (hi | bye ) "bye">
Output
Severity:Error
Message:'no' is not in the enumeration list. An error occurred at file:///c:/csharp/b.xml(4, 7).
The values assigned to attributes
can be restricted to specific values. This can be achieved by specifying the
values along with ATTLIST in the DTD file and using the OR sign (|) as the separator. The attribute aa can
only be assigned the value of either 'hi' or 'bye'. Specifying any other value
would result in an error.
If the attribute is not
initialized, it assumes the default value of 'bye'.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
The error disappears because the
attribute has been assigned a value of 'hi'.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card EMPTY>
<!ATTLIST card aa ID #IMPLIED>
We have created an attribute aa,
with a data type of ID. This does not result in any error.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
<card aa="hi"/>
</wml>
Output
Severity:Error
Message:'hi' is already used as an ID. An error occurred at file:///c:/csharp/b.xml(5, 7).
The card tag can be used
multiple times, due to the presence of the * sign in the DTD file. By
associating the type of ID to the attribute aa, it is guaranteed that the same
value of 'hi' is not assigned to the attribute. The error message conveys that
'hi' has already been assigned as an ID to the attribute aa, and hence, it
cannot be used again.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card aa="hi"/>
<card aa="hi1"/>
</wml>
If we assign a different value
to the attribute, the error is dispensed with. Thus, a data type of ID
guarantees that the attribute shall never have a duplicate value.
b.xml
<?xml version="1.0" ?>
<!DOCTYPE wml SYSTEM "b.dtd" >
<wml>
<card>
Hi &sonal;
</card>
</wml>
b.dtd
<!ELEMENT wml (card)*>
<!ELEMENT card (#PCDATA)*>
<!ENTITY sonal "hi" >
Entities have been touched upon earlier. Here, the word 'sonal' will be replaced with 'hi'. This is called an Entity Reference. The DTD file requires an ENTITY word with the variable 'sonal', and the value 'hi'.