8
WML Bytecodes
This may prove to be the
most interesting chapter that you’ve read so far. Many of the programmers today
(usually referred to coders, as this is all they do - write code, without
understanding it) are not really aware that 'what you write is not what gets
stored (WYWINWGS). It is understandable if you have never heard this term
before, because it is used by only the real top shot programmers (which is the
purpose of this book, and the ones that follow.)
WYWINWGS has been around
ever since using English-like programming languages became popular. Who had the
time to sit and explain to some dumb bunny that
ThisNewSuperFunctionThatYouDefined would take too any characters to store, so
it was just shortened down to a couple of bytes - BAh! And whenever it was used
again, the program would look up a chart and figure out that you wanted
something called BAh!
What we look at here, is
the byte codes of the a few of the programs that we have written. And what they
actually translate to.
No. You are not expected to
write programs this way, but understanding how a program is stored, will give
you a better understanding of how the program works.
In our first example, we
have not tried anything too fancy. All we want to do is to introduce you to
what identifies the program file. Every file saved, always has a header that
denotes the type of file that is stored. It identifies the program that created
it and hence, the type of information that follows the header.
We have referred to the
specifications available on the wapforum site: www.wapforum.org.
w1.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card>
</card>
</wml>
Compile this file and check
the size of the .wmlc file. Dir command will display 7 bytes. Using one of the
well know hex editors ‘Hexshop’ we found out that the bytes in a11.wmlc were as
follows.
01 04 6A
00 7F 27 01
01
A binary WML document
contains elements. Each element may
have zero or more attributes. Also they can have their own content. For eg.
card has id , title and many more attributes. Also it overlaps the other tags.
A p tag can have align, mode and can contain only text. All wmlc files begin with a version number
Version 1.1 is encoded as 0x01. The version byte is the major version - 1 in
the upper four bytes and the minor version as is in the lower four bytes.
Ver
1.1 = 0x01 |
(1-1)
= 0 |
|
1 |
|||||
0 |
0 |
0 |
0 |
|
0 |
0 |
0 |
1 |
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
Ver
2.7 = 0x17 |
(2-1)
=1 |
|
7 |
|||||
0 |
0 |
0 |
1 |
|
0 |
1 |
1 |
1 |
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
04
The next byte
represents the document public
identifiers. 4 is the value given to “-//WAPFORUM//DTD WML 1.1//EN”
6A
A binary XML format
contains a representation of the XML document character encoding. The default
charset is UTF-8 ie 6A. A value of zero indicates an unknown document encoding.
00
A binary XML/WML document
must include a string table immediately
after the charset. This byte consists of a number, excluding the length byte.
If the length is zero, there are no more strings following it.
Tags are called tokens and
they are split into a set of overlapping code space. Each code space is further
split into a series of 256 code pages.
Within the tag byte :
7th bit indicates whether
attributes follow the tag code. If the bit is 0, then the tag contains no
attributes. If it is 1, the tag is followed by one ore more attributes
6th bit indicates whether
the tag begins with an element containing content. If it is 0, there is not
content and no end tag either. If it is one, the tag is followed by content and
is terminated by the end tag.
5-0 indicates the tag.
Attribute |
Content |
5-0
- tags |
|||||
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
The bytecodes
01 Version 1.1
04 DTD type
6A Utf8 string
00 String table
7F 3F wml
Attr |
Cont |
3 |
|
f |
||||
0 |
1 |
1 |
1 |
|
1 |
1 |
1 |
1 |
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
27 27 card - there
is no end tag for card as the 6th bit is 0
Attr |
Cont |
2 |
|
7 |
||||
0 |
0 |
1 |
1 |
0 |
1 |
1 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
01 end of wml
Attr |
Cont |
0 |
|
1 |
||||
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
Lets take the next example
where we have an attribute for card. Card contains <p> as its content.
The p element encloses bye.
w2.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p>
bye
</p>
</card>
</wml>
01 Version 1.1
04 DTD type
6A Utf8
00 No strings
7F 3f wml
Attr |
Cont |
3 |
|
f |
||||
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
E7 27 card has
attributes and encloses p.
Attr |
Cont |
2 |
|
7 |
||||
1 |
1 |
1 |
0 |
0 |
1 |
1 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
36 title
03 string
68 h
69 i
00 0
01 end - title
60 20 p
Attr |
Cont |
2 |
|
0 |
||||
0 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
03 string
20 space
62 b
79 y
65 e
20 space
00 0
01 end - p
01 end - card
01 end - wml
Every element has its own
end byte. Also notice the string byte changing from 0 to 3 to indicate string
data following. Every string ends in a space or 0.
Here we have given two
strings within <p>
w3.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p>
bye
good
</p>
</card>
</wml>
01 Version 1.1
04 DTD type
6A Utf- 8
00 string
7F 3F wml
E7 27 card
36 title
03 string
68 h
69 i
00 0
01 end title
60 20 p
03 string
20 space
62 b
79 y
65 e
20 space
67 g
6F o
6F o
64 d
20 space
00 null
01 end p
01 end card
01 end wml
w4.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p>
<b>bye</b> <br/>
good
</p>
</card>
</wml>
The byte codes for the wml
file are as follows. We have introduced b for the bold tag and br for line
break.
01 version no
04 dtd type
6A utf8
00 string
7F 3F wml
E7 27 card
36 title
03 string
68 h
69 i
00 0
01 end title
60 20 p
Attr |
Cont |
2 |
|
0 |
||||
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
64 24
b
Attr |
Cont |
2 |
|
4 |
||||
1 |
1 |
1 |
0 |
0 |
1 |
0 |
0 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
65
03 string
62 b
79 y
65 e
00 0
01 end - b
26 br
03 string
20 space
67 g
6F o
6F o
64 d
20 space
00 0
01 end p
01 end card
01 end wml
In the next program, we
have replaced b with i. This is the only change made here.
w5.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p>
<i>bye</i> <br/>
good
</p>
</card>
</wml>
01
04
6A
00
7F 3F wml
E7 27 card
36 title
03 string
68 h
69 i
00 0
01 end title
60 20 p
6D 2D i
Attr |
Cont |
2 |
|
D |
||||
1 |
1 |
1 |
0 |
1 |
1 |
0 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
03 string
62 b
79 y
65 e
00 0
01 end i
26 br
03 string
20 space
67 g
6F o
6F o
64 d
20 space
00 0
01 end p
01 end card
01 end wml
The following file shows you
the bytecodes for u which stands for underline.
w6.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p>
<u>bye</u> <br/>
good
</p>
</card>
</wml>
01 major , minor version
04 dtd type
6A utf8
00 strings
7F 3F wml
E7 card
36 title
03 string
68 h
69 i
00 0
01 end title
60 20 - p
7D 3D - u
Attr |
Cont |
3 |
|
D |
||||
0 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
03 string
62 b
79 y
65 e
00 0
01 end u
26 br
03 string
20 space
67 g
6F o
6F o
64 d
20 space
00 0
01 end p
01 end card
01 end wml
Similarly, if your replace
the u tag with the other tags, the codeword changes accordingly
em 69 - 29
strong 79 - 39
small 78 -
38
The actual code for em is
29 and not 69. As we have seen before, the content bit goes on if the tag
contains further content. Hence 29 becomes 69.
w13.wml
<?xml version="1.0"?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card title="hi">
<p align="center">
hi
</p>
</card>
</wml>
01 major, minor version
04 dtd type
6A utf8
00 string
7F 3F wml
E7 27 card
36 title
03 string
68 h
69 i
00 0
01 end title
E0 20 p
Attr |
Cont |
2 |
|
0 |
||||
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
|
8 |
4 |
2 |
1 |
8 |
4 |
2 |
1 |
07 align=center
01 end align
03 string
20 space
68 h
69 i
20 space
00 0
01 end p
01 end card
01 end wml
This is the last program in this series and this section . We could have continued further but decided to stop here. You are now familiar with what Compile does. The Virtual Machine in the micro browser has to interpret these bytes and act accordingly. You can visit the WapForum site www.wapforum.org and download the technical specification to guide you further.