7
XML Classes
eXtensible Markup Language i.e.
XML is a subset of the Standard Generalized Markup Language (SGML), which is an
ISO standard numbered ISO 8879. SGML was perceived to be remarkably colossal and
extremely convoluted to be put to any pragmatic use. Thus, a subset of this
language, XML, was developed to work seamlessly with both SGML and HTML. XML
may be considered as a restricted form of SGML, since it conforms to the rules
of an SGML document.
XML was created in the year 1996
under the auspices of the World Wide Web Consortium (W3C), under the
chairmanship of Jon Bosak. This group spelt out 10 ground rules for XML, with
'ease of use' as its fundamental philosophy. From thereon, the expectations
reached a threshold wherein, XML was expected to eradicate world poverty and
generally rid the world of all its tribulations. To be precise, XML was
overvalued, way beyond realistic levels. There are people who appear to be
extremely infatuated by XML, even though they may not have read through a
single rule or specification of the language.
The specifications of XML laid
down by its three primary authors- Tim Bray, Jean Paoli and C. M.
Sperberg-McQueen, are accessible at the web site http://www.w3.org/XML.
XML documents consists entities
comprising of Characters or Markups. An XML file is made up of a myriad
components, which shall be unravelled one at a time, after we have discerned
the basic concepts of this language. We commence this chapter by introducing a
program that generates an XML file.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.Flush();
a.Close();
}
}
In this program, we use a class
called XmlTextWriter, which comes from the System.Xml namespace. An instance
'a' of the XmlTextWriter class is created, by passing two parameters to the
constructor:
• The first parameter, b.xml, is a string and represents the name of the file to be created. If the file exists in the current directory, it gets deleted and then recreated, but with zero bytes.
• The second parameter is null. It represents the Encoding type used.
Unicode is a standard whereby
each character is assigned 16 bits. All
the languages in the world can now be easily represented by this standard. In
the .Net world, we are furnished with classes whose methods facilitate
conversion of arrays and strings made up of Unicode characters, to and from
arrays made up of bytes alone.
The System.Text namespace has a
large number of Encoding implementations, such as the following:
• The ASCII Encoding encodes the Unicode characters as 7-bit ASCII.
• The UTF8 Encoding class encodes Unicode characters using UTF-8 encoding.
UTF-8 stands for UCS
Transformation Format 8 bit. It supports all Unicode characters. It is normally
accessed as code page 65001. UTF-8 is the default value and represents all the
letters from the English alphabet. Here, since we have specified the second
parameter as null, the default value of UTF-8 encoding is taken.
If we execute the program at
this stage, you would be amazed by the fact that no file by the name of b.xml
will be displayed. To enable this to happen, a function named Flush needs to be
called.
Each time we ask the class
XmlTextWriter to write to a file, it may not oblige immediately, but may place
the output in a buffer. Only when the buffer becomes full, will it write to the
file. This approach is pursued to avoid the overhead of accessing the file on
the disk repetitively. This improves efficiency. The Flush function flushes the
buffer to the file stream, but it does not close the file.
The Close function has to be
employed to execute the twin tasks of flushing the buffer to the file, and
closing the file. It is sagacious to call Flush, and then call Close, even
though Close is adequate to carry out both these tasks.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
Here, we have called a function
called WriteStartDocument from the XmlTextWriter class, which does not take any
parameters. It produces the line <?xml version="1.0"?>, in the
file b.xml.
Any line that begins with
<?xml is called an XML declaration. Every entity in XML is described as a
node. Every XML file must begin with an XML Declaration node. There can be only
one such node in our XML file and it must be placed on the first line.
Following it is an attribute called version, which is initialized to a value of
1.0.
The XML specifications lucidly
stipulate that there would be no attribute called version in the next version
of the software. Even if there is, its value would be indeterminate. In other
words, in the foreseeable future, the only mandatory attribute would be
version=1.0.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.WriteDocType("vijay", null, null ,null);
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?><!DOCTYPE vijay>
The next vital declaration is
the DOCTYPE declaration. Every XML file must have one DOCTYPE declaration, as
it specifies the root tag. In our case, the root tag would be 'vijay'.
An XML file is made up of tags,
which are words enclosed within angular brackets. The file also contains rules,
which bind the tags. The next three parameters of the function WriteDocType are
presently specified as null. You may refer to the documentation to decipher the
remaining values, since these may be used in place of null. If this does not
appeal to you, you may have to hold your horses, till we furnish the
explanation at an appropriate time.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
In the earlier example, all the
nodes were displayed on the same line. We would indubitably desire that every
node be displayed on a new line. The property Formatting in XmlTextWriter, is
used to accomplish this task. Formatting can be assigned only one of the
following two values: Indented or None. By default, the value assigned is None.
The Indented option indents the
child elements by 2 spaces. The magnitude of the indent may be altered, by
stipulating a new value for the Indentation field. In our program, we want the
indent to be 3 spaces deep. Hence, we stipulate the value as 3. As is evident,
all nodes do not get indented. For example, the DOCTYPE node does not get
indented; instead, it is placed on a new line.
The IndentChar property may be
supplied with the character that is to be employed for indentation. By default,
a space character is used for this purpose.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay />
The function WriteStartElement
accepts a single parameter, which is the tag name, to be written to the XML
file. This is an oft-repeated instruction, to be iterated in almost every
program, since an XML file basically comprises of tags. A tag normally has a
start point and an end point, and it confines entities within these two
extremities. However, there are tags that do not accept any entities. Such tags
end with a / symbol.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("wife","sonal");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay wife="sonal" />
The newly added function
WriteAttributeString accepts two parameters, which it writes in the form of a
name-value pair. Thus, along with 'vijay', we see the attribute named 'wife',
having a value of 'sonal'. An attribute is analogous to an adjective of the
English language, in that, it describes the object. In our case, it describes
the tag 'vijay'. It divulges additional information about the properties of a
tag.
XML does not interpret the
contents of these tags. The word 'wife' or the value 'sonal', have no special
significance for XML, which is absolutely unconcerned about the information
provided within the tags.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("wife","sonal");
a.WriteElementString("surname", "mukhi");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay wife="sonal">
<surname>mukhi</surname>
</vijay>
An element represents entities
within a tag. We have a tag surname containing the value 'mukhi'. We can have
multiple tags within the root tag.
We have been reiterating the
fact that we need to adhere to specific rules. You may steer clear of the
beaten path and interchange the following two newly added functions as follows:
a.WriteElementString("surname", "mukhi");
a.WriteAttributeString ("wife","sonal");
As a fallout of this
interchange, the following exception will be thrown:
Unhandled Exception: System.InvalidOperationException: Token StartAttribute in state Content would result in an invalid XML document.
This exception is triggered off
due to the fact that the attribute must be specified first. Then, and only
then, should the child tags within the tag, be specified.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("wife","sonal");
a.WriteAttributeString ("friend","two");
a.WriteElementString("surname", "mukhi");
a.WriteElementString("books", "67");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay wife="sonal" friend="two">
<surname>mukhi</surname>
<books>67</books>
</vijay>
To summarize, the WriteDocType function
specifies the root tag, the WriteStartElement the tag, the
WriteAttributeString, the attributes for the active tag and WriteElementString
function, a tag within a tag. We can enumerate as many attributes as we desire.
They will eventually be clustered together. The WriteElementString function is
also capable of creating as many tags, as are needed under a tag.
In the file b.xml, we see two
attributes and two tags, under the root tag 'vijay'.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("friend","two");
a.WriteStartElement("mukhi");
a.WriteAttributeString ("wife","sonal");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay friend="two">
<mukhi wife="sonal" />
</vijay>
In the above example, 'vijay' is
the root tag, with the attribute 'friend', which is assigned a value of 2. It
also has a child tag 'mukhi' having the attribute of 'wife' initialized to
'sonal'. Both the tags, 'vijay' and 'mukhi', are created using the function
WriteStartElement. Unlike function WriteElementString, which creates a start
and end tag, WriteStartElement creates only a start tag.
A tag too can be endowed with
attributes. The active tag is the last inserted by the WriteStartElement
function. Functions such as WriteAttributeString, act on the active tag. Thus,
we notice that the attribute of 'wife' has the tag 'mukhi' and not 'vijay'.
Finally, since the tag 'mukhi' is devoid of any contents, it ends with a /
symbol on the same line.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("friend","two");
a.WriteStartElement("mukhi");
a.WriteAttributeString ("wife","sonal");
a.WriteFullEndElement();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay friend="two">
<mukhi wife="sonal">
</mukhi>
</vijay>
The function WriteFullEndElement
marks the end of the active tag. Therefore, the single tag 'mukhi', does not
end with a / symbol on the same line. It has an ending tag instead. Both these
possibilities are equally valid in this case. But, if the tags embody any
contents, then both the start and the end tags are mandatory. In such
situations, a single empty tag would just not suffice.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
//a.WriteComment("comment 1");
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteStartDocument();
a.WriteComment("comment 1");
a.WriteDocType("vijay", null, null ,null);
a.WriteComment("comment 2");
a.WriteStartElement("vijay");
a.WriteAttributeString ("wife","sonal");
a.WriteComment("comment 3");
a.WriteElementString("surname", "mukhi");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!--comment 1-->
<!DOCTYPE vijay>
<!--comment 2-->
<vijay wife="sonal">
<!--comment 3-->
<surname>mukhi</surname>
</vijay>
Every programming language
extends the facility of writing comments, even though it may be a seldom used
feature. Programmers insert comments amidst their code to document or explain
the functioning of their programs. At times, comments assist in deciphering the
code from the programmer's perspective. Practically, it may be easier to teach
an elephant how to tap-dance, than to convince a programmer to write comments.
In the XML world, comments begin
with <!-, and end with -->. This is somewhat similar to the HTML syntax.
In fact, the rules of HTML are written in XML.
Comments are like a liquid,
since they can be moulded to fit-in anywhere, except on the first line of a
program. The first line in an XML file has to be a declaration. If you dispense
with the comments given with the function WriteComment, an exception will be
thrown with the following message:
Unhandled Exception: System.InvalidOperationException: WriteStartDocument should be the first call.
Thus, functions such as
WriteComment, can be used to insert comments anywhere in the code, primarily
for the purpose of documentation, which would enable even an alien from outer
space to decipher the code better.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteProcessingInstruction ("sonal", "mukhi=no");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay>
<?sonal mukhi=no?>
</vijay>
A line beginning with <?, Is
called a Processing Instruction (PI). This line is inserted using the function
WriteProcessingInstruction, and is passed two parameters:
• the first is the name of the processing instruction.
• the second is the text that is to be inserted for the processing instruction.
A Procession Instruction is used
by XML to communicate with other programs during the performance of certain
tasks. XML does not have the wherewithal to execute instructions. It therefore
delegates this task to the XML processor. The processor is a program that is
able to recognise an XML file. When it encounters the processing instruction,
and if it is able to understand it, it executes it. In cases where it cannot
comprehend it, the processor simply ignores the instruction. This is the
methodology by which XML communicates with external programs.
In our program, the instruction
'sonal' is ignored, as it does not provide any meaningful input to the
processor.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteString("mukhi");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay>mukhi</vijay>
An XML file mainly consists of
strings and tags. The WriteString function is very extensively exploited, since
it writes content/strings between tags.
In the above example, the text
'mukhi' is enclosed within the tags of 'vijay'. Even though we have not explicitly
asked the XmlTextWriter class to close the tag, the ending tag has been used
because there exists some content after the opening tag.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString ("friend","two");
a.WriteString("hi");
//a.WriteAttributeString ("friend","three");
a.WriteStartElement("mukhi");
a.WriteAttributeString ("friend","two");
a.WriteString("bye");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay friend="two">hi<mukhi friend="two">bye</mukhi></vijay>
The function WriteString can be
inserted almost anywhere in the program. The first WriteString function writes
'hi' between the tags of 'vijay', while the second WriteString function writes
'bye' between the tags of 'mukhi'. The WriteString is aware of the active tag.
Therefore, it inserts the text accordingly. Here also, if we uncomment the
line, a.WriteAttributeString("friend","three"), the
following exception will be generated.
Unhandled Exception: System.InvalidOperationException: Token StartAttribute in state Content would result in an invalid XML document.
XML is very strict and
meticulous in the sense that, it expects a certain order to be maintained, or
else, it throws an exception. For instance, an element or a tag has to be
created first. Only then, can all the attributes be written; and finally, the
text or content has to be supplied. We are not permitted to write the text
first and enter the attributes later. In the XmlTextWriter class, there is no
going back. It is a one-way path, which only moves in the forward direction.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteCharEntity ('A');
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay>A</vijay>
During our exploratory journey
of XML, we shall discuss a large number of characters that are 'reserved'. They
have a special significance and cannot be used literally. These Unicode characters
have to be written in a hex format. The function WriteCharEntity performs this
task. It accepts a char or a Unicode character as a parameter and returns a
number in hex, prefaced with the &# symbol.
For those who do not understand
hexadecimal and consider it Greek and Latin, 41 hex is equal to ASCII 65, which
is the ASCII value for the capital letter A. You can pass different characters
to this function and see their equivalent hex values.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteCData("mukhi & <sonal>");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay><![CDATA[mukhi & <sonal>]]></vijay>
The above program introduces a
new function called WriteCData, which creates a node called CDATA. The
parameter passed to this function is placed as it is, but is enclosed within
square brackets.
A CDATA section is used whenever
we want to use characters such as <, >, & and the likes, in their
literal sense, which would otherwise be mistaken for Markup characters. Thus, in
the above program, the CDATA section that contains the symbol &, interprets
it as the literal character &, and not as a special character. Also,
<sonal> is not recognized as a tag in this section. A CDATA section
cannot be nested within another CDATA section.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteString("<A>&");
a.WriteCData("<A>&");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay><A>&
<![CDATA[<A>&]]>
</vijay>
This program illustrates certain
characters that are special to XML. These are the obvious characters, such as
<, > and &, since they are used whilst an XML file is being created.
Thus, whenever XML comes across the following symbols, it replaces them with
the symbols depicted against each:
• < is replaced with '<'
• > is replaced with '>'
• & is replaced with '&'.
If the same string that contains
the above mentioned special characters is placed within a CDATA statement, gets
written verbatim, without any conversions.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteEntityRef("Hi");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay>&Hi;</vijay>
The entity ref is very
straightforward to understand. The string passed to the function WriteEntityRef
is placed in the XML file, preceded by a '&' sign and followed by a
semi-colon. An entity ref in XML is equivalent to a variable. It is included to
provide flexibility to the program.
Thus in the above code, a
variable called 'hi' is created. The task of stating what 'hi' signifies, can
be defined in the XML file.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteRaw("<A>&");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay><A>&</vijay>
The WriteRaw function writes the
characters passed to it, without carrying out any conversions. The above XML
file is obviously erroneous, as no end tag has been specified for the tag A.
Also, no name has been specified after the & sign.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
Boolean b = true;
a.WriteElementString("Logical", XmlConvert.ToString(b));
Int32 c = -2147483648;
a.WriteElementString("SmallInt", XmlConvert.ToString(c));
Int64 d = 9223372036854775807;
a.WriteElementString("Largelong", XmlConvert.ToString(d));
Single e = ((Single)22)/((Single)7);
a.WriteElementString("Single", XmlConvert.ToString(e));
Double f = 1.79769313486231570E+308;
a.WriteElementString("Double", XmlConvert.ToString(f));
DateTime h = new DateTime(2001, 07, 08 ,22, 0, 30, 500);
a.WriteElementString("DateTime", XmlConvert.ToString(h));
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay>
<Logical>true</Logical>
<SmallInt>-2147483648</SmallInt>
<Largelong>9223372036854775807</Largelong>
<Single>3.142857</Single>
<Double>1.7976931348623157E+308</Double>
<DateTime>2001-07-08T22:00:30.5000000+05:30</DateTime>
</vijay>
The above example contains a
plethora of data types such as, boolean, int, double and Data Time.
The XmlConvert class has a large
number of static functions that help us convert one data type to another. One
such function is the ToString function. For types such as int or long, the
smallest and the largest values are used, in order to check the veracity of the
ToString function.
The ToString function is overloaded
to handle many more data types than we have shown. The point here is that, it
is possible for us to convert any data type into a string and write it to disk.
This factor gains immense importance when data is being received from a
database, and requires to be converted into a string in an XML file.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteStartAttribute("hi", "mukhi", "xxx:yyy");
a.WriteString("1-861003-78");
a.WriteEndAttribute();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay hi:mukhi="1-861003-78" xmlns:hi="xxx:yyy" />
In the above example, we have
introduced the WriteStartAttribute function. As is apparent from its name, it
starts an attribute. The first parameter to this function is 'hi', which is the
namespace, to which the prefix of the attribute belongs. The second parameter
'mukhi' is the name of the attribute.
The names assigned to attributes
and tags may not always result in a unique name. A programmer may inadvertently
create a tag or an attribute with a name that already exists. How then does XML
decide what the tag denotes?
To help resolve such potential
conflicts, each tag or entity is prefaced with a name known as the namespace.
This is followed by a colon sign. Normally, meaningful names are assigned,
rather than words like 'hi'. Prefixes or namespaces like xmlns, are reserved by
XML. The concept of namespaces in XML is identical to the concept of namespaces
in C#.
The third parameter is a Uniform
Resource Identifier (URI). This parameter reveals greater details about the
location of the namespace. It informs XML that somewhere within the document,
additional information about the namespace 'hi' is available. In this case it
is at xxx:yyy. As the WriteStartAttribute function does not specify any value
for the attribute, the WriteString function is employed to assign the value
1-861003-78, to the attribute 'mukhi' in the namespace 'hi'.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString("xmlns", "bk", null, "sonal:wife");
string p = a.LookupPrefix("sonal:wife");
a.WriteStartAttribute(p, "mukhi", "sonal:wife");
a.WriteString("sonal");
a.WriteEndAttribute();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay xmlns:bk="sonal:wife" bk:mukhi="sonal" />
Here, the function
WriteAttributeString is called with four parameters. The first, as always, is
the name of the namespace, i.e. xmlns. The second is the name of the attribute
i.e. bk, which is suffixed to the name of the namespace, as xmlns:bk. The third
parameter is the namespace URI. In the earlier program, we had specified the
value of xxx:yyy for the URI. For this program, since the namespace xmlns is a
reserved namespace, the URI parameter is specified as null. The last parameter
is the value of the attribute.
As a consequence, the above
function takes the form of an attribute consisting of xmlns:bk=sonal:wife. The
next function LookupPrefix, accepts a namespace URI and returns the prefix. As
the parameter supplied to this function is sonal:wife, the prefix returned is
bk, which is stored in p.
The WriteStartAttribute then
uses the following:
• 'bk' as the namespace,
• 'mukhi' as the name of the attribute, and
• 'sonal:wife' as the namespace URI.
Thus, the attribute 'mukhi' is prefaced
with the namespace 'bk'. Finally, the WriteString function assigns the value of
'sonal' to the attribute bk:mukhi.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString("xmlns", "bk", null, "sonal:wife");
a.WriteAttributeString("jjj", "bk", "kkk", "sonal:wife");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay xmlns:bk="sonal:wife" jjj:bk="sonal:wife" xmlns:jjj="kkk" />
In this version of the WriteAttributeString
function, the namespace is jjj and the attribute name is bk, with the value
sonal:wife. Thus, the attribute becomes jjj:bk=sonal:wife. The third parameter
to the function is the namespace URI, which is now assigned a value of kkk, instead
of null.
Thus, one more attribute
xmlns:jjj gets added, which indicates that the namespace URI is kkk. We notice
that this attribute does not get added for the xmlns namespace. We have chosen
the attribute name 'bk' again, just to demonstrate that they belong to
different namespaces. Therefore, this bk is considered to be a different
attribute from the earlier bk.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteStartAttribute(null,"sonal", null);
a.WriteQualifiedName("mukhi", "http://vijaymukhi.com");
a.WriteEndAttribute();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay sonal="n1:mukhi" xmlns:n1="http://vijaymukhi.com" />
In the WriteStartAttribute
function, only the second parameter out of the three parameters, has a value 'sonal,
which is the name of the attribute. The first parameter, which is the name of
the namespace and the third parameter, which is the URI of the namespace, are
both assigned null values.
The next function,
WriteQualifiedName assigns a value to the attribute 'sonal'. This function
takes two parameters, the value 'mukhi' and the namespace URI for the value.
The value 'mukhi' gets prefaced
by a namespace n1, which is created dynamically by XML. The name n1 belongs to
the reserved xmlns namespace and the URI to n1 is specified in the second
parameter, http://vijaymukhi.com. The method WriteQualifiedName, then looks up
the prefix within the scope for the given namespace.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.WriteStartElement("vijay");
a.WriteAttributeString("xmlns","mukhi",null,"xxx:yyy");
a.WriteString("Hi ");
a.WriteQualifiedName("sonal","xxx:yyy");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<vijay xmlns:mukhi="xxx:yyy">Hi mukhi:sonal</vijay>
In this example, we first create
an attribute 'mukhi' in the reserved namespace xmlns. This attribute is then rendered
a value of xxx:yyy. The WriteString function writes 'Hi' as the content and
then, the WriteQualifiedName writes the string 'sonal'. However, since 'sonal'
is a Qualified name, it is prefaced by 'mukhi' and not by xxx:yyy, because
'mukhi' is equated to xxx:yyy.
The prefix in the scope for the
namespace is given precedence.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.WriteStartElement("vijay");
a.WriteElementString("vijay","mukhi");
a.WriteElementString("vijay","sonal","mukhi");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<vijay>
<vijay>mukhi</vijay>
<vijay xmlns="sonal">mukhi</vijay>
</vijay>
As we have just observed, the
WriteElementString function had only two parameters in the earlier program.
However, here it has three parameters. The first and the third parameters are
the same, i.e. the tag name and the value. The newly inducted second parameter
indicates the namespace 'sonal'. The tag in the first parameter 'vijay', has
the namespace of sonal. Thus, the XML file contains the tag with the attribute
of xmlns=sonal.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter (Console.Out);
a.WriteStartDocument();
a.WriteStartElement("vijay");
a.Close();
}
}
Output
<?xml version="1.0" encoding="IBM437"?><vijay />
The XmlTextWriter class can
write to different entities, using the constructor that accepts a single
parameter. The Console class has a static property out of datatype TextWriter
that represents the console. Thus, the output is now displayed on the console.
By default, the encoding attribute is assigned a value of IBM437.
One of the primary reasons for
designing XML was to introduce validation of the tags in order to produce a
well-evolved XML file.
There are a few validations that
need to be performed in an XML file, such as:
• It should be ensured that the basic rules of XML as well as our indigenous rules are followed.
• Certain tags should be placed only within specified tags and cannot be used independently.
• The number of times a tag is being used can be regulated, since it cannot be used infinite times.
• A check should be placed on the name and the number of times an attribute is used within a tag.
All such rules that need to be
enforced are enunciated in XML parlance and then, placed in a DTD or a Document
Type Description. The DTD may either be placed in a separate file or may be
made part of the DOCTYPE declaration. In the XML file shown below, the DTD is
internal.
Thus, a DTD stores the grammar
that is permissible in an XML file. The entity refs are also defined in a DTD.
One of the reasons why HTML is also reffered to as XHTML is that, the rules of
well-formed html are available in the form of a DTD.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
String s = "<!ELEMENT vijay (#PCDATA)>";
a.WriteDocType("vijay", null, null, s);
a.WriteStartElement("vijay");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay[<!ELEMENT vijay (#PCDATA)>]>
<vijay />
The WriteDocType function
accepts four parameters. The first parameter is the starting or root tag 'vijay'.
Hence, it must contain a value. The last parameter is the subset (as referred
to by the documentation), which follows the root tag 'vijay'. If you observe
the DOCTYPE statement carefully, you will notice that an extra pair of square
brackets [], have been added.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.WriteDocType("vijay", null, "a.dtd", null);
a.WriteStartElement("vijay");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay SYSTEM "a.dtd">
<vijay />
The third parameter to
WriteDocType function specifies the name of the DTD file. In other words, it
states the URI of the DTD. The second parameter is assigned the value of null.
Hence, the word SYSTEM is displayed before the name of the file, in the XML
file.
Whenever XML wishes to ensure
the validity of an XML file, it ascertains the rules from a.dtd. If both
internal and external DTDs are present, both of them are checked. However, the
internal DTD is accorded priority.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.WriteDocType("vijay", "mmm", "a.dtd", null);
a.WriteStartElement("vijay");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay PUBLIC "mmm" "a.dtd">
<vijay />
In the earlier program, SYSTEM
was added in the XML file, since the second parameter had been specified as
null. However, in this program, the second parameter is not null. Hence, the
word PUBLIC gets added. Thereafter, the string or the id specified in the
second parameter is added. And then, the dtd in the third parameter is
specified.
Therefore, it is either the
PUBLIC identifier or the SYSTEM identifier, which would be present. The XML
program or the processor scanning the XML file, uses the PUBLIC identifier to
retrieve the content for the entities that use the URI. If it fails, it falls
back upon to the SYSTEM literal.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument(false);
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0" standalone="no"?>
The WriteStartDocument can take
a boolean parameter that adds an attribute which could either be 'standalone =
yes' or 'standalone=no', depending upon the value specified. This attribute
determines whether the DTD is in an external file or it is internal to the XML
file. If the standalone has a value of 'yes', it is suggestive of the fact that
there is no external DTD, and therefore, all the grammatical rules have to be
placed within the XML file itself.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
System.Console.WriteLine(a.WriteState);
a.WriteStartDocument();
System.Console.WriteLine(a.WriteState);
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
System.Console.WriteLine(a.WriteState);
a.WriteStartElement("vijay");
System.Console.WriteLine(a.WriteState);
a.WriteAttributeString ("wife","sonal");
System.Console.WriteLine(a.WriteState);
a.WriteStartAttribute("hi", "mukhi", "xxx:yyy");
System.Console.WriteLine(a.WriteState);
a.WriteString("1-861003-78");
a.WriteElementString("surname", "mukhi");
a.Flush();
System.Console.WriteLine(a.WriteState);
a.Close();
System.Console.WriteLine(a.WriteState);
}
}
Output
Start
Prolog
Prolog
Element
Element
Attribute
Content
Closed
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay wife="sonal" hi:mukhi="1-861003-78" xmlns:hi="xxx:yyy">
<surname>mukhi</surname>
</vijay>
The XmlTextWriter object can be
in any one of six different states. The WriteState property reveals its current
state. When an XmlTextWriter Object is created, it is in the Start state, as may
be evident from the fact that, no write method has been called so far. After
the Close function, the Writer is in the Closed state. When the
WriteStartDocument and WriteDocType functions are called, they reach the Prolog
state, because the prolog is being written.
The WriteStartElement function
actually starts writing to the XML file, thereby, morphing to the Element
state. The element start tag 'vijay' begins the XML file. The next function
WriteAttributeString does not change the state, since the element in focus
still is 'vijay'. The WriteStartAttribute function needs the WriteString to
complete the attribute. Thus, after the WriteStartAttribute function executes,
the Text Writer assumes the Attribute mode. The surname attribute becomes the
content in the XML file. Hence, the state changes to Content mode.
This goes on to prove that the
TextWriter can possibly be in any one of the above six states, depending upon
the entities written to the file. While the TextWrtier is in the Attribute
state, it cannot switch to an element state to write an element. Therefore, it
throws an exception.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.Namespaces = false;
a.WriteStartDocument();
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString("jjj", "bk", "kkk", "sonal:wife");
a.Flush();
a.Close();
}
}
Output
Unhandled Exception: System.ArgumentException: Cannot set the namespace if Namespaces is 'false'.
at System.Xml.XmlTextWriter.WriteStartAttribute(String prefix, String localName, String ns)
at System.Xml.XmlWriter.WriteAttributeString(String prefix, String localName, String ns, String value)
at zzz.Main()
The TextWriter class has a
Namespaces property that is read-write, and it has a default value of true. The
Namespace property is turned off, by setting this property to false. The above
runtime exception is thrown because, we have attempted to introduce a namespace
jjj, in the WriteAttributeString function.
a.cs
using System;
using System.Xml;
public class zzz {
public static void Main() {
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.QuoteChar = '\'';
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteAttributeString("jjj", "bk");
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay jjj='bk' />
Various facets of XML can be
modified. By using the property QuoteChar, we can modify the default quoting
character, from double inverted commas to single inverted commas. Since a
single quote cannot be enclosed within a set of single quotes, we use the
backslash to escape it. All attributes can now be placed in single quotes
instead of double quotes.
a.cs
using System;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextWriter a = new XmlTextWriter ("b.xml", null);
a.WriteStartDocument();
a.Formatting = Formatting.Indented;
a.Indentation = 3;
a.WriteDocType("vijay", null, null ,null);
a.WriteStartElement("vijay");
a.WriteStartAttribute("hi", "mukhi", "xxx:yyy");
a.WriteString("1-861003-78");
a.WriteEndAttribute();
a.WriteEndElement();
a.WriteEndDocument();
a.Flush();
a.Close();
}
}
b.xml
<?xml version="1.0"?>
<!DOCTYPE vijay>
<vijay hi:mukhi="1-861003-78" xmlns:hi="xxx:yyy" />
Good programming style
necessitates every 'open' to have a corresponding 'close'. Thus, the Begin functions
for an Element, Attribute and Document have corresponding Close functions too.
However, if we do not End them, they close by default and no major calamity
befalls them. We are using them in the above program as an abandon caution.
The WriteEndDocument function
puts the Text Writer in the Start mode.
Reading
an XML file
b.xml
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE vijay SYSTEM "a.dtd" [<!ENTITY baby "No">]>
<vijay aa="no">
<!--comment 2--><?sonal mukhi=no?>
Hi&baby;
<![CDATA[,mukhi>]]><aa>bb</aa>
</vijay>
> copy con a.dtd
Enter
^Z
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextReader r;
r = new XmlTextReader("b.xml");
while (r.Read())
{
Console.Write("{0} D={1} L={2} P={3} ", r.NodeType, r.Depth, r.LineNumber, r.LinePosition );
Console.Write(" name={0} value={1} AC={2}",r.Name,r.Value,r.AttributeCount);
Console.WriteLine();
}
}
}
Output
XmlDeclaration D=0 L=1 P=3 name=xml value=version="1.0" standalone="yes" AC=2
Whitespace D=0 L=1 P=39 name= value= AC=0
DocumentType D=0 L=2 P=11 name=vijay value=<!ENTITY baby "No"> AC=1
Whitespace D=0 L=2 P=54 name= value= AC=0
Element D=0 L=3 P=2 name=vijay value= AC=1
Whitespace D=1 L=3 P=16 name= value= AC=0
Comment D=1 L=4 P=5 name= value=comment 2 AC=0
ProcessingInstruction D=1 L=4 P=19 name=sonal value=mukhi=no AC=0
Text D=1 L=4 P=36 name= value=Hi AC=0
EntityReference D=1 L=5 P=4 name=baby value= AC=0
Whitespace D=1 L=5 P=9 name= value= AC=0
CDATA D=1 L=6 P=10 name= value=,mukhi> AC=0
Element D=1 L=6 P=21 name=aa value= AC=0
Text D=2 L=6 P=24 name= value=bb AC=0
EndElement D=1 L=6 P=28 name=aa value= AC=0
Whitespace D=1 L=6 P=31 name= value= AC=0
EndElement D=0 L=7 P=3 name=vijay value= AC=0
In this program, we read an XML
file and display all the nodes contained therein. To avoid any errors from
being displayed, you should create an empty file by the name of a.dtd.
We have a class called
XmlTextReader that accepts a filename as a parameter. We pass the filename
b.xml to it. This file contains most of the entities present in an XML file.
The Read function in this class picks up a single node or XML entity at a time.
It returns true, if there are more nodes to be read, or else, it returns false.
Thus, when there are no more nodes to be read from the file, the while loop
ends. The Read function scans the active node and displays its contents in the
loop.
The NodeType property displays
the name of the nodetype. As an XML file normally starts with a declaration, the
NodeType property displays the NodeType as XMLDeclaration, using the ToString
function.
The Depth property gets
incremented by one, every time an element or a tag is encountered. At the
Declaration statement, the depth is 0. At the EndElement or at the end of the
tag, its value reduces by one. Thus, the Depth property reveals the number of
open tags in the file and it can be used for indentation.
The Line Number indicates the
line on which the statement is positioned, while the LinePosition property displays
the position on the line at which the statement begins. The Name property in
the class reveals the name of the tag, XML. The output displayed by this
property depends upon the active node type. On acute observation, you shall
notice that the word XML is not preceded by the symbol <? in the output.
The value property relates to
the name property, in this case, to XmlDeclaration. It displays the entire
gamut of attributes to the node. As there exist two attributes, version and
standalone, the property AttributeCount displays a value of 2.
If the enter key is pressed
after the node declaration, it is interpreted as a Whitespace character.
Whitespace characters are separators, which could consist of an enter, space et
al. The Position property specifies the character position as 39.
The XmlDeclaration has to be the
first node in an XML file, and it cannot have any children. The DOCTYPE
declaration, which is known as a DocumentType Node, displays the name as vijay,
which is the root node. The value is displayed as <!ENTITY baby
"No">, which includes everything except the SYSTEM and a.dtd.
Thus, in the case of a DocumentType Node, value is the internal DTD.
We shall encounter the
Whitespace Node very frequently. Hence, we shall not discuss it hereinafter.
The Attribute Count will be displayed in the next program. This node can have
the Notation and Entity as child nodes.
The next node in sequence is our
very first element or tag 'vijay', which is the same value that was displayed
earlier, with the name property for the DocumentType Node. The Value property
for this element shows null, since tags are devoid of Values. Instead, they
have Attributes.
The attribute Count displays a
value of one. At the following Whitespace node, the Depth property gets incremented
by one. This is the only way to ascertain whether we are at the root node or
not. We now stumble upon a comment, which has no name. The value displayed is
the value of the comment. And yet again, the <!-characters are not displayed
along with the value.
Thereafter, a processing
instruction (PI) is encountered. No whitespace is displayed between the comment
and the PI, since we have not pressed the Enter key. 'Sonal' becomes the name
of the program that runs 'vijay'. The rest turns into the value property having
no attributes. TextNode is displayed next because the text 'Hi' is displayed in
the XML file. This node too is not assigned any name and the value is depicted
as 'Hi'.
What follows the text is an
Entity Reference. It is assigned the name 'baby' and is devoid of the ampersand
sign. Its value is null and it does not have any attributes. The CDATA section
is given the name as null. The value is assigned the content of the CDATA,
after stripping away the square brackets.
The value of the Depth property
is incremented by 1. The Text Node follows the element aa. This node does not
have any name and it displays the value as 'bb'. In the following program, we
explore the various attributes.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextReader r;
r = new XmlTextReader("b.xml");
r.WhitespaceHandling = WhitespaceHandling.None;
while (r.Read())
{
Console.Write("{0} D={1} L={2} P={3}",r.NodeType,r.Depth,r.LineNumber,r.LinePosition);
Console.Write(" name={0} value={1} AC={2}",r.Name,r.Value,r.AttributeCount);
Console.WriteLine();
if (r.HasAttributes)
{
for ( int i =0; i < r.AttributeCount; i++)
{
r.MoveToAttribute(i);
System.Console.WriteLine("Att {0}={1}",r.Name,r[i]);
}
}
}
}
}
Output
XmlDeclaration D=0 L=1 P=3 name=xml value=version="1.0" standalone="yes" AC=2
Att version=1.0
Att standalone=yes
DocumentType D=0 L=2 P=11 name=vijay value=<!ENTITY baby "No"> AC=1
Att SYSTEM=a.dtd
Element D=0 L=3 P=2 name=vijay value= AC=1
Att aa=no
Comment D=1 L=4 P=5 name= value=comment 2 AC=0
ProcessingInstruction D=1 L=4 P=19 name=sonal value=mukhi=no AC=0
Text D=1 L=4 P=36 name= value=
Hi AC=0
EntityReference D=1 L=5 P=4 name=baby value= AC=0
CDATA D=1 L=6 P=10 name= value=,mukhi> AC=0
Element D=1 L=6 P=21 name=aa value= AC=0
Text D=2 L=6 P=24 name= value=bb AC=0
EndElement D=1 L=6 P=28 name=aa value= AC=0
EndElement D=0 L=7 P=3 name=vijay value= AC=0
A property called
WhiteSpaceHandling is initialized to None, as a result of which, the node Whitespace
is not visible in the output.
The XmlTextReader has a member
HasAttributes, which returns a True value if the node has attributes and False
otherwise. Alternatively, we could also have used the property AttributeCount
to obtain the number of attributes that the node contains.
If the node has attributes, a
'for statement' is used to display all of them. In the loop, we first use the
function MoveToAttribute to initially activate the attribute. This is achieved
by passing the number as a parameter to the function. Bear in mind that the
index starts from Zero and not One.
Thereafter, the Name property is
used to display the name of the attribute. If the attribute is not activated,
the Name property displays the name of the node. This explains the significance
of the MoveToAttribute function.
As you would recall, the
XmlTextReader class has an indexer for the attributes, and like all indexers,
it is zero based, i.e. r[0] accesses the value of the first attribute. This is
how we display the details of all attributes of the node.
For the node DOCTYPE, the SYSTEM
becomes the name of the attribute and the value becomes the name of the DTD
file. For an element, the attributes are specified in name-value pairs.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static XmlTextReader r;
public static void Main()
{
r = new XmlTextReader("b.xml");
int declaration=0, pi=0, doc=0, comment=0, element=0, attribute=0, text=0, whitespace=0,cdata=0,endelement=0,
entityr=0,entitye=0,entity=0,swhitespace=0,notation=0;
while (r.Read())
{
Console.Write("{0} D={1} L={2} P={3}",r.NodeType,r.Depth,r.LineNumber,r.LinePosition);
Console.Write(" name={0} value={1} AC={2}",r.Name,r.Value,r.AttributeCount);
Console.WriteLine();
if (r.HasAttributes)
{
for ( int i =0; i < r.AttributeCount; i++)
{
r.MoveToAttribute(i);
System.Console.WriteLine("Att {0}={1}",r.Name,r[i]);
}
}
switch (r.NodeType)
{
case XmlNodeType.XmlDeclaration:
declaration++;
break;
case XmlNodeType.ProcessingInstruction:
pi++;
break;
case XmlNodeType.DocumentType:
doc++;
break;
case XmlNodeType.Comment:
comment++;
break;
case XmlNodeType.Element:
element++;
if (r.HasAttributes)
attribute += r.AttributeCount;
break;
case XmlNodeType.Text:
text++;
break;
case XmlNodeType.CDATA:
cdata++;
break;
case XmlNodeType.EndElement:
endelement++;
break;
case XmlNodeType.EntityReference:
entityr++;
break;
case XmlNodeType.EndEntity:
entitye++;
break;
case XmlNodeType.Notation:
notation++;
break;
case XmlNodeType.Entity:
entity++;
break;
case XmlNodeType.SignificantWhitespace:
swhitespace++;
break;
case XmlNodeType.Whitespace:
whitespace++;
break;
}
}
Console.WriteLine ();
Console.WriteLine("XmlDeclaration: {0}",declaration);
Console.WriteLine("ProcessingInstruction: {0}",pi);
Console.WriteLine("DocumentType: {0}",doc);
Console.WriteLine("Comment: {0}",comment);
Console.WriteLine("Element: {0}",element);
Console.WriteLine("Attribute: {0}",attribute);
Console.WriteLine("Text: {0}",text);
Console.WriteLine("Cdata: {0}",cdata);
Console.WriteLine("EndElement: {0}",endelement);
Console.WriteLine("Entity Reference: {0}",entityr);
Console.WriteLine("End Entity: {0}",entitye);
Console.WriteLine("Entity: {0}",entity);
Console.WriteLine("Whitespace: {0}",whitespace);
Console.WriteLine("Notation: {0}",notation);
Console.WriteLine("Significant Whitespace: {0}",swhitespace);
}
}
Output
XmlDeclaration D=0 L=1 P=3 name=xml value=version="1.0" standalone="yes" AC=2
Att version=1.0
Att standalone=yes
Whitespace D=0 L=1 P=39 name= value=
AC=0
DocumentType D=0 L=2 P=11 name=vijay value=<!ENTITY baby "No"> AC=1
Att SYSTEM=a.dtd
Whitespace D=0 L=2 P=54 name= value=
AC=0
Element D=0 L=3 P=2 name=vijay value= AC=1
Att aa=no
Whitespace D=1 L=3 P=16 name= value=
AC=0
Comment D=1 L=4 P=5 name= value=comment 2 AC=0
ProcessingInstruction D=1 L=4 P=19 name=sonal value=mukhi=no AC=0
Text D=1 L=4 P=36 name= value=
Hi AC=0
EntityReference D=1 L=5 P=4 name=baby value= AC=0
Whitespace D=1 L=5 P=9 name= value=
AC=0
CDATA D=1 L=6 P=10 name= value=,mukhi> AC=0
Element D=1 L=6 P=21 name=aa value= AC=0
Text D=2 L=6 P=24 name= value=bb AC=0
EndElement D=1 L=6 P=28 name=aa value= AC=0
Whitespace D=1 L=6 P=31 name= value=
AC=0
EndElement D=0 L=7 P=3 name=vijay value= AC=0
Whitespace D=0 L=7 P=9 name= value=
AC=0
XmlDeclaration: 0
ProcessingInstruction: 1
DocumentType: 0
Comment: 1
Element: 1
Attribute: 0
Text: 2
Cdata: 1
EndElement: 2
Entity Reference: 1
End Entity: 0
Entity: 0
Whitespace: 6
Notation: 0
Significant Whitespace: 0
The above program is a
continuation from where we left off in the previous program. The initial
portion of the code is identical. A colossal case statement is introduced in
the program to check the NodeType.
For each Node Type, there is a
corresponding variable, whose value is incremented by 1 whenever the Node Type
matches. Then, the values contained in these variables are displayed. For
inexplicable reasons, the NodeType property does not return the following node
types - Document, DocumentFragment, Entity, EndEntity, or Notation.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextReader r = new XmlTextReader("b.xml");
r.WhitespaceHandling = WhitespaceHandling.None;
while (r.Read())
{
if (r.HasValue)
Console.WriteLine("{0} {1}={2}", r.NodeType, r.Name, r.Value);
else
Console.WriteLine("{0} {1}", r.NodeType, r.Name);
}
}
}
Output
XmlDeclaration xml=version="1.0" standalone="yes"
DocumentType vijay=<!ENTITY baby "No">
Element vijay
Comment =comment 2
ProcessingInstruction sonal=mukhi=no
Text =
Hi
EntityReference baby
CDATA =,mukhi>
Element aa
Text =bb
EndElement aa
EndElement vijay
The HasValue property simply
identifies whether a Node can contain a value or not. There are nine nodes that
can possess values. These nodes are Attribute, CDATA, Comment, DocumentType,
ProcessingInstruction, Significant Whitespace, Whitespace, Text and
XmlDeclaration. All the above nodes must have a value, but they need not
necessarily have a name.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main()
{
XmlTextReader r = new XmlTextReader("b.xml");
r.MoveToContent();
string s = r["mukhi"];
Console.WriteLine(s);
s = r.GetAttribute("sonal");
Console.WriteLine(s);
s = r[2];
Console.WriteLine(s);
}
}
b.xml
<vijay mukhi="no" sonal="yes" aaa="bad" />
Output
no
yes
bad
The MoveToContent function moves
to the first element in the XML file.
In this program, we display the attributes
using different methods. In the first approach, the indexer is passed a string,
which is the name of the attribute 'mukhi'. It receives 'no' as the return
value.
In the second approach, the
indexer is passed the integer value 2 as a parameter, to access the value of
the third attribute, which is 'bad'.
Alternatively, the WriteAttribue
function could have been given the string 'sonal' as a parameter, to return the
value of the attribute as 'yes'. Thus, there are multiple means to achieving
the same objective.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz
{
public static void Main() {
XmlTextReader r = new XmlTextReader("b.xml");
r.MoveToContent();
string s ;
s = r.GetAttribute("aa:bb");
Console.WriteLine(s);
s = r.GetAttribute("bb");
Console.WriteLine(s);
s = r.GetAttribute("bb","sonal:mukhi");
Console.WriteLine(s);
s = r.GetAttribute("bb","sonal:mukhi");
Console.WriteLine(s);
s = r.GetAttribute("bb","aa");
Console.WriteLine(s);
s = r.GetAttribute("xmlns:aa");
Console.WriteLine(s);
}
}
b.xml
<vijay xmlns:aa="sonal:mukhi" aa:bb="no" />
Output
no
no
no
sonal:mukhi
The MoveToContent function is
used in this program, instead of the Read function. In the file b.xml, we have
an attribute bb in the namespace aa. It is initialized to a value of 'no'. The
namespace aa has a URI, sonal:mukhi, because of the xmlns declaration. Thus,
the full name of the attribute becomes aa:bb i.e. prefix, followed by the
colon, followed by the actual name. As a result, specifying aa:bb results in
the display of 'no', but only specifying bb as a parameter to GetAttribute
results in a null value.
The full name of an attribute
includes the name of the namespace too. So, we can use the second form of the
GetAttribute function that has an overload of two parameters, where the second
parameter is the name of the URI and not the namespace. Hence, it is acceptable
to call the function with the URI sonal:mukhi, but if we use the namespace aa,
no output will be produced.
The last GetAttribute utilizes
the full name xmlns:aa to retrieve the URI for the element. Thus, we can use
this variant of the GetAttribute function with the URI instead of the
namespace:name.
a.cs
using System;
using System.IO;
using System.Xml;
public class zzz {
public static void Main() {
XmlTextReader r = new XmlTextReader("b.xml");
r.WhitespaceHandling=WhitespaceHandling.None;
r.MoveToContent();
r.MoveToAttribute("cc");
Console.WriteLine(r.Name + " " + r.Value);
Console.WriteLine(r.ReadAttributeValue());
Console.WriteLine(r.Name + " " + r.Value);
}
}
b.xml
<vijay aa="hi" bb="bye" cc="no" />
Output
cc no
True
No
In this example, we directly
focus on the attribute that we are interested in, i.e. cc. The name and value
properties in XMLTextReader display 'cc' and 'no' respectively. As there are
numerous attributes of the node remaining to be read, the ReadAttribute
function returns True. This function is normally used to read text or entity
reference nodes that constitute the value of the attribute.
The Name property of the XmlTextReader however becomes null after the function ReadAttributeValue is called.