The IL Disassembler
.
This book explains the internal
workings of a disassembler. The programs given in the book produces an output
similar to the one written by Microsoft i.e. ildasm. The only difference is
that the source code of ildasm is not available. Our main objective in this
book is to write innumerable programs, which ultimately focus on understanding
the disassembler in a simplistic form. The final program has been tested
against 5000 .net files.
Without getting into any
more discussions, lets start with the disassembler right away. The output
produced by our program will be tested with that of ildasm simultaneously. This
is more to verify the results and keep us on the right track.
a.cs
public class zzz
{
public static void Main()
{
}
}
>ildasm /all
/out:a.txt a.exe
Program a.cs is the smallest
C# program which on compiling gives the smallest .Net executable, a.exe. If you
fail to understand the above C# program or have forgotten how to compile a C#
program, we request you to stop reading this book now. This book assumes that
you know nothing about a disassembler but you must have a basic understanding
of the C# programming language.
Once the executable is
created, proceed further to write the first program in the series of the
disassembler.
Program1.csc
using System;
using System.IO;
public class zzz
{
int [] datadirectoryrva;
int [] datadirectorysize;
int subsystem;
int stackreserve ;
int stackcommit;
int datad;
int sectiona;
int filea;
int entrypoint;
int ImageBase;
FileStream mfilestream ;
BinaryReader mbinaryreader
;
long sectionoffset;
short sections ;
string filename;
int [] SVirtualAddress ;
int [] SSizeOfRawData;
int [] SPointerToRawData ;
public static void Main
(string [] args)
{
try
{
zzz a = new zzz();
a.abc(args);
}
catch ( Exception e)
{
Console.WriteLine(e.ToString());
}
}
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
}
public void
ReadPEStructures(string [] args)
{
filename = args[0];
mfilestream = new FileStream(filename ,FileMode.Open);
mbinaryreader = new BinaryReader
(mfilestream);
mfilestream.Seek(60,
SeekOrigin.Begin);
int startofpeheader =
mbinaryreader.ReadInt32();
mfilestream.Seek(startofpeheader,
SeekOrigin.Begin);
byte sig1,sig2,sig3,sig4;
sig1 =
mbinaryreader.ReadByte();
sig2 = mbinaryreader.ReadByte();
sig3 =
mbinaryreader.ReadByte();
sig4 =
mbinaryreader.ReadByte();
//First Structure
short machine =
mbinaryreader.ReadInt16();
sections =
mbinaryreader.ReadInt16();
int time =
mbinaryreader.ReadInt32();
int pointer =
mbinaryreader.ReadInt32();
int symbols =
mbinaryreader.ReadInt32();
int headersize=
mbinaryreader.ReadInt16();
int characteristics =
mbinaryreader.ReadInt16();
sectionoffset =
mfilestream.Position + headersize;
//Second Structure
int magic =
mbinaryreader.ReadInt16();
int major = mbinaryreader.ReadByte();
int minor =
mbinaryreader.ReadByte();
int sizeofcode =
mbinaryreader.ReadInt32();
int sizeofdata =
mbinaryreader.ReadInt32();
int sizeofudata =
mbinaryreader.ReadInt32();
entrypoint =
mbinaryreader.ReadInt32();
int baseofcode = mbinaryreader.ReadInt32();
int baseofdata =
mbinaryreader.ReadInt32();
ImageBase =
mbinaryreader.ReadInt32();
sectiona=
mbinaryreader.ReadInt32();
filea =
mbinaryreader.ReadInt32();
int majoros =
mbinaryreader.ReadInt16();
int minoros =
mbinaryreader.ReadInt16();
int majorimage =
mbinaryreader.ReadInt16();
int minorimage =
mbinaryreader.ReadInt16();
int majorsubsystem=
mbinaryreader.ReadInt16();
int minorsubsystem =
mbinaryreader.ReadInt16();
int verison =
mbinaryreader.ReadInt32();
int imagesize = mbinaryreader.ReadInt32();
int sizeofheaders=
mbinaryreader.ReadInt32();
int checksum =
mbinaryreader.ReadInt32();
subsystem =
mbinaryreader.ReadInt16();
int dllflags =
mbinaryreader.ReadInt16();
stackreserve =
mbinaryreader.ReadInt32();
stackcommit =
mbinaryreader.ReadInt32();
int heapreserve =
mbinaryreader.ReadInt32();
int heapcommit =
mbinaryreader.ReadInt32();
int loader =
mbinaryreader.ReadInt32();
datad =
mbinaryreader.ReadInt32();
datadirectoryrva = new
int[16];
datadirectorysize = new
int[16];
for ( int i = 0 ; i <=15
; i++)
{
datadirectoryrva[i] =
mbinaryreader.ReadInt32();
datadirectorysize[i] =
mbinaryreader.ReadInt32();
}
if ( datadirectorysize[14]
== 0)
throw new
System.Exception("Not a valid CLR file");
mfilestream.Position =
sectionoffset ;
SVirtualAddress = new
int[sections ];
SSizeOfRawData = new
int[sections ];
SPointerToRawData = new
int[sections ];
for ( int i = 0 ; i <
sections ; i++)
{
mbinaryreader.ReadBytes(12);
SVirtualAddress[i] =
mbinaryreader.ReadInt32();
SSizeOfRawData[i] = mbinaryreader.ReadInt32();
SPointerToRawData[i] =
mbinaryreader.ReadInt32();
mbinaryreader.ReadBytes(16);
}
}
public void
DisplayPEStructures()
{
Console.WriteLine();
Console.WriteLine("// Microsoft (R) .NET Framework IL
Disassembler. Version 1.0.3328.4");
Console.WriteLine("// Copyright (C) Microsoft Corporation
1998-2001. All rights reserved.");
Console.WriteLine();
Console.WriteLine("//
PE Header:");
Console.WriteLine("//
Subsystem:
{0}",subsystem.ToString("x8"));
Console.WriteLine("//
Native entry point address:
{0}",entrypoint.ToString("x8"));
Console.WriteLine("//
Image base:
{0}",ImageBase.ToString("x8"));
Console.WriteLine("//
Section alignment:
{0}",sectiona.ToString("x8"));
Console.WriteLine("//
File alignment:
{0}",filea.ToString("x8"));
Console.WriteLine("//
Stack reserve size:
{0}",stackreserve.ToString("x8"));
Console.WriteLine("//
Stack commit size:
{0}",stackcommit.ToString("x8"));
Console.WriteLine("//
Directories:
{0}",datad.ToString("x8"));
DisplayDataDirectory(datadirectoryrva[0]
, datadirectorysize[0] , "Export Directory");
DisplayDataDirectory(datadirectoryrva[1]
, datadirectorysize[1] , "Import Directory");
DisplayDataDirectory(datadirectoryrva[2]
, datadirectorysize[2] , "Resource Directory");
DisplayDataDirectory(datadirectoryrva[3]
, datadirectorysize[3] , "Exception Directory");
DisplayDataDirectory(datadirectoryrva[4]
, datadirectorysize[4] , "Security Directory");
DisplayDataDirectory(datadirectoryrva[5]
, datadirectorysize[5] , "Base Relocation Table");
DisplayDataDirectory(datadirectoryrva[6]
, datadirectorysize[6] , "Debug Directory");
DisplayDataDirectory(datadirectoryrva[7]
, datadirectorysize[7] , "Architecture Specific");
DisplayDataDirectory(datadirectoryrva[8]
, datadirectorysize[8] , "Global Pointer");
DisplayDataDirectory(datadirectoryrva[9]
, datadirectorysize[9] , "TLS Directory");
DisplayDataDirectory(datadirectoryrva[10]
, datadirectorysize[10] , "Load Config Directory");
DisplayDataDirectory(datadirectoryrva[11]
, datadirectorysize[11] , "Bound Import Directory");
DisplayDataDirectory(datadirectoryrva[12]
, datadirectorysize[12] , "Import Address Table");
DisplayDataDirectory(datadirectoryrva[13]
, datadirectorysize[13] , "Delay Load IAT");
DisplayDataDirectory(datadirectoryrva[14]
, datadirectorysize[14] , "CLR Header");
Console.WriteLine();
}
public void
DisplayDataDirectory(int rva, int size , string ss)
{
string sfinal = "";
sfinal =
String.Format("// {0:x}" , rva);
sfinal =
sfinal.PadRight(12);
sfinal = sfinal +
String.Format("[{0:x}" , size);
sfinal =
sfinal.PadRight(21);
sfinal = sfinal +
String.Format("] address [size] of {0}:" , ss);
if (ss == "CLR
Header")
sfinal = sfinal.PadRight(67);
else
sfinal =
sfinal.PadRight(68);
Console.WriteLine(sfinal);
}
}
On compiling the above
program, program1.exe is generated. Now run the executable as
>Program1 a.exe
This command gives the
following output.
Output
// Microsoft (R) .NET Framework IL
Disassembler. Version 1.0.3328.4
// Copyright (C) Microsoft Corporation
1998-2001. All rights reserved.
// PE Header:
// Subsystem: 00000003
// Native entry point
address: 0000227e
// Image base: 00400000
// Section alignment: 00002000
// File alignment: 00000200
// Stack reserve size: 00100000
// Stack commit size: 00001000
// Directories: 00000010
// 0 [0
] address [size] of Export
Directory:
// 2228 [53
] address [size] of Import Directory:
// 4000 [318
] address [size] of Resource Directory:
// 0 [0 ] address [size] of Exception Directory:
// 0 [0 ] address [size] of Security Directory:
// 6000 [c
] address [size] of Base Relocation Table:
// 0 [0 ] address [size] of Debug Directory:
// 0 [0 ] address [size] of Architecture Specific:
// 0 [0 ] address [size] of Global Pointer:
// 0 [0 ] address [size] of TLS Directory:
// 0 [0 ] address [size] of Load Config Directory:
// 0 [0 ] address [size] of Bound Import Directory:
// 2000 [8
] address [size] of Import Address Table:
// 0 [0 ] address [size] of Delay Load IAT:
// 2008 [48
] address [size] of CLR Header:
Since time immemorial, the
first function to be called is Main. In this function, to begin with, an
instance of class zzz is created and then a non- static function abc is called
from it. The only reason for placing the bulk of our code in the abc function
is that the Main function is static. It cannot access instance variables till
an instance of its class is not created.
We promise that it is for
the first and the last time in this book that we will use names like zzz and a.
Henceforth we will abide by big meaningful names for variables/objects. Another
simple rule that we have adhered to is that if a variable is to be used by
another function, then it is made a global or an instance variable. Global in
the C# world is a no-no but in the C++ world is allowed. Therefore at times,
the names may sound legally wrong but they are morally right.
The abc function is given
an array of strings that hold the arguments assigned to the program. In our
case, it is the name of the .Net executable that is to be disassembled. While
writing code, there are possibilities of making errors. A dialog box pops up
each time an error is encountered which at times get extremely irritating. For
this purpose, the code in Main is enshrined within a try catch to simply
display the exceptions.
Now to understand the
functioning of abc.
The array variable args[0]
contains the name of the file to be disassembled which is saved in an instance
variable, filename.
The .Net world has a million
classes to handle files of which we have presently used only two. The first one
is the FileStream class. The constructor of this class simply takes two
parameters, the filename and an enum FileMode. The enum specifies how the file
should be opened. This enum takes three values which decide whether the file is
to be opened, created or overwritten. In the good old days of C, numbers or
strings were used for discreet values, however the modern world of today
prefers the enums instead. If you honestly ask us, we would prefer the old days
anytime, but we all have to move ahead with time, embrace the new and forget
the old ways.
Since the file is to be
opened, the value of Open in the enum is used. An exception is thrown if the
file does not exist. The handle to the file is stored in an instance variable
suitably named mfilestream. The only problem with the FileStream class is that
other than opening a file, it does nothing. It has a few rudimentary functions
that enable reading a byte from a file. However they are of no use to us since
our interest lies in reading a short or an int or a string from the file.
Therefore, another class BinaryReader, which permits reading primitive objects
like shorts, ints and longs from the file is used. The constructor of this
class requires the mfilestream handle. It is the BinaryReader class that will
be used and not the FileStream class in order to access the file.
The file format used by any
Windows application is called the PE or Portable Executable file format. Before
Windows evolved to become the big daddy of operating systems, the earlier king
of the hill was DOS. Each and every executable file started with the two bytes
of M and Z. This is how the DOS operating system would recognize an executable
file. The advent of windows did not in any sense change the mindset of people
thus they did not acknowledge the difference between the two operating systems.
Very often a a windows program was executed in the DOS environment.
DOS being a primitive
operating system normally checks the first two bytes and on not seeing the
magic numbers M and Z, it displays a confusing message ‘Bad Command or File
Name’. This led to some confusion, thus as a conscious decision, the makers of
the PE file format mandated that every PE file would start with a valid DOS
header. This header was then followed up with a program that printed a valid
error message if the program was to be executed in the DOS environment. The DOS
box of windows is a simulation of the original DOS.
The actual PE header of the
file starts at bytes 60. This location takes an int thus the first four bytes
are clubbed up together and indicate the start of the PE header. This offset is
not a fixed value as different compilers decide on the error messages for the
DOS program and thus change the length of the message. Using the Seek method of
the FileStream class, the file pointer is positioned to the 60th
byte in the file. The second parameter of the Seek function is an enum that
takes three values. These values decide whether the number specified in the
first parameter is an absolute offset from the beginning or end of the file or
a relative offset to the file pointer.
The file pointer is an
imaginary construct that points to the current or the next byte to be read. The
offset is stored in a variable startofpeheader and its value normally is 128.
As mentioned earlier this value can vary depending upon the compilers used. The
Seek method is used again to jump to the start of the PE header. The ReadByte
method is then implemented from the BinaryReader class to read each byte. The
magic number for a PE header is P and E followed by two zeroes i.e. ‘PE00’.
This magic number is
followed by a structure called the standard COFF header. COFF is the Common
Object File Format. The first two bytes or short is the machine or better still
the CPU type that this executable or image file can run on. An executable can
either run on the specified machine or a system that emulates it. The PE
specifications are available on the Microsoft site which specifies all possible
values that the various structures can have, hence we will not irk you with
these details.
In our case, the hex value
displayed is 0x14c which stands for an Intel 32 bit machine. This value has not
been displayed in the output for the simple reason that ildasm does not display
the value and we have decided to follow the ildasm program to the T. This value
is stored in a local variable called machine, it is not an instance variable.
The method ReadInt16 is used to read a short or two bytes from a file. This
method from the BinaryReader class is used to fetch bytes from the file. Thus
using the BinaryReader class saves us the hassle of reading bytes and then
doing their multiplications.
The second field is the
number of sections in the PE file. A PE file contains different types of
entities like code, data, resources etc. Each entity or section needs to be
stored in a different part of the PE file, therefore structures are used to
keep track of all them. The next short gives the number of sections and the
value received for the file is three. Some time later, the sections will have
to saved in structures and hence the variable sections is an instance. This is
followed by the date time stamp which gives information when this file was
created. The method ReadInt32 is used to extract this 4 byte value.
This is followed by a 4
byte entity that is a pointer or offset to the symbol table. The next int is
the number of symbols available. The value of the pointer to symbol table is zero,
which means an absence of the symbol table. Symbol tables are present only in
obj or object files. In the good old days the compilers created an obj file and
linkers created exe file from obj files. In the .net world the obj file are
obsolete and hence these two int’s are always zero.
After the first header, is
another header called the image optional header. This header is never seen in
obj files and its size can also vary but so far its been a constant value at
224 bytes.
Then comes a field called
characteristics, which specifies the attributes of the file. The value received
is 0x10e.
Bit diagram
Individual bits in a byte
carry different bits of information. The value of 0xe or 14 has a bit pattern wherein the 2, 3 and 4th
bits are on.
Bit Diagram
This signifies that the
file is a valid executable ( bit 2), there are no COFF line number present in
the file or have been stripped off ( bit 3) and the symbol table entries are also
absent( bit 4). A value of 0x100 signifies that the machine running the
executable is based on a 32 bit architecture. This value, which is the last
member of the structure, is not displayed by the ildasm utility.
The section table begins
immediately after the image optional header, i.e. thus it is after the start of
the optional header plus the size of the optional header. The variable
sectionoffset has been used to store this value, thus it can be used to jump to
the section table as and when required.
The optional header has the
first field of a short type, which represents the magic number. This can take
any of the two values, 0x10b if it follows the PE format which presently is the
case. The other value is 0x20b when the header is of a PE32+ format. This value is generally seen when
files use 64 bit addresses.
In the optional header, the
information is divided into three distinct parts. The first 28 bytes is part of
the standard PE header, the next 68 bytes applies to the windows operating
system only and the final bytes are for the data directories. The second and
third field of the standard header are the major and minor linker verison
numbers which presently have a value of 6 and 0. This is followed by the size
of the code block in the exe file. The size of initialized and uninitialized data
follows next.
The displayed value of
0x227e is for the next field called entrypoint. This value is relative to where
the program is loaded in memory or image base. In our case, since the file is
an exe file, the instruction at this value becomes the first memory location
that gets executed by the Operating System. In case of a device driver, there
is no such specific function to be called, and hence it is the address of the
initialization function. A DLL does not have to have an entry point and thus
may have a value of 0.
The base of code and base
of data are similar to the entrypoint field, which reveal the code or data area
when loaded in memory, all relative to the image base. The ImageBase field is a
logical address that points to the area where the Operating System loads the
exe file.
Similar to our likes and
dislikes, the OS prefers a value 0x00400000 as an address for executables, for
a DLL it is 0x10000000
and for Windows CE it is 0x00010000. These starting addresses can be changed by
supplying an option to the linker ant it must be a multiple of 64 K.
Nevertheless, it is not advisable experimenting with different values.
The
next value of 2048 is the section alignment. The above value signifies that
even when a section has a size of 100 bytes, the OS will allocate a minimum of
2048 bytes for it. The rest of the bytes in the memory area allocated remain
unused. This section alignment is normally the page size of the machine and is
used for purposes of efficiency. Similar to the Section alignment is the file
alignment field that applies to the file stored on disk. The file alignment is
displayed as 512 bytes, which implies that each section when stored on diske
takes up at least 512 bytes on disk, on disk 512 bytes make up one sector.
The next fields are the
major and minor numbers of the Operating System, the image and the subsystem.
The next field called verison is reserved. The following field is a size of all
the code plus headers, followed by a field that only stores the size of all the
headers including section headers. The next field called checksum helps the
Operating System detect whether the file has been damaged or tampered before it
can be loaded into memory.
The next field of subsystem
displayed by ildasm informs the Operating System of the minimum subsystem
required of it by the exe file. A value of 3 in our case means a console
subsystem, therefore no Graphical User Interface please; whereas a value of 2
would mean a Graphical User Interface system. The field dllflags applies to
DLL’s as the name signifies.
Following the field
dllflags are two fields that deal with the stack. The stack is an area in
memory, which is used to pass parameters to functions and create local
variables. The stack memory is reused at the end of a function call and hence
it is short-term memory whereas the heap area is for long duration. The second
field called stackcommit is the amount of memory that is allocated to the
stack. The value seen is 0x1000 bytes which is the stack reserve memory given
to the application. Thus initially stack commit is allocated and once this gets
used, one page at time is allocated dynamically, till the stack reserve is
used. The two fields after the stack field are not displayed as they deal with
the heap area in memory. The documentation is pretty candid that the loader
field is obsolete.
The last field of the
optional header gives the number of data directories following. So far only a
value of 16 is seen. Lets now understand the concept of a data directory.
A data directory is nothing
but two fields, the first field is a location or what is technically called an
RVA (Relative Virtual Address) that gives information as to where some data
starts in memory. The second field is size
in bytes of the entity. These are stored back to back.
Two arrays of size 16 and
data type int are created to store the RVA’s and sizes of each data directory
entry. If the 14th data directory entry has a size of zero, then it is
conclusive of the fact that the executable file is not created by a .Net
compiler. In such a case, there is no reason to continue further, so the
program is made to throw an exception and then gracefully quit out. The
reasoning will be catered to a couple of paragraphs down the road.
The section headers start
immediately after the data directories. However, we take no chances and use the
Position property of the FileStream class, to give the current position of the imaginary
file pointer. The Position property is read/write thus it not only gives the
details about the imaginary file pointer but also sets it to a new position if
need be.
The Seek method can be used
again, like before to jump to a part of the file, but as variety is the spice
of life, we set the Position property instead. The world of computer
programming lets us skin a cat multiple ways.
All the fields of the section headers are not important except three of them,
so we create three arrays of ints to store the three fields.
The first field is the
virtual address or RVA of the section in memory (We remember our promise to
explain it), this is followed by the size of the section and finally the
location on disk where the section is located. The size of a section header is
40 bytes. The three fields of our interest start 12 bytes from the start of the
header, so using the ReadBytes function, the first 12 bytes are skipped. Then,
the next three fields are read into the array variables. Since the remaining 16
bytes too have no significance, the last 16 bytes are skipped. We could have
used the Seek function to jump over the 24 bytes that we are not interested in.
Then again, we decided to use a method that is easiest to explain to you. The
data directory and the section headers are now saved in arrays.
The next function
DisplayPEStructures finally displays these values on the console. The only
stumbling block here is that the output should match that of ildasm and just to
remind you ildasm displays its output in a formatted manner. What we have is
the Shared Source code, which comes with the source code of a disassembler and
not the actual code of ildasm. The code when executed in no sense displays the
output similar to that of ildasm. Thus we had no choice but to spend a lot of
time figuring out how many spaces need to be placed at different points in the
line.
A byte by byte comparison
with the output generated by the original ildasm program can surely indicate
our follies. Thus we decided to take this approach as otherwise there is no
other way of knowing whether the code we have written works or not. To pursue it further, we wrote our own file
compare program to check whether the output generated by our disassembler and
that of ildasm is the same, however you have an option of choosing any file
compare program to suit your needs.
After displaying a new
line, the version number of the disassembler is displayed. In our case the
version is 1.0.3328.4, however yours could be larger or smaller, so please make
the appropriate changes. Then the values of 7 variables viz, subsystem, image
base, sectiona, filea, the stack variables and the number of data directories
are displayed
Initially, we have entered
the spaces manually for alignment purpose wherein numeric variables by default
are displayed in decimal and using the ToString function present in the object
class. There are a myriad of formats that can be put to use. The small x is
used for the hexadecimal numbering system with the alpha characters displayed
in small and not caps. The number 8 right justifies the number and fills up the
rest with zeroes.
The sixteen data
directories are displayed using a function DisplayDataDirectory. This function
takes the rva and size of the element in the array alongwith a string to denote
the name of the data directory. The prime objective of this function is to
format the output and display it in a certain manner.
The string sfinal does not
have to be initialized to a null string. However, we do the same out of habit
since C# does not permit using an uninititalized variable on the right hand
side of the equal to sign or as a return value.
Thereafter, using the
static Format function from the String class, the rva of the data directory is
displayed. The curly braces is a format option used by the WriteLine function
and the 0 is the placeholder for the first parameter. The colon following is used to specify the formatting. The small
x is for a hexadecimal output.
The open square [ brackets
must be placed 12 spaces away, and hence the PadRight function is used to pad
12 spaces to the string. The entire line to be displayed is then finally stored
in the string sfinal and then given to the WriteLine function to display it in
one go. Then using the Format function the size of the data directory is
emitted out but after having considered 21 spaces, to synchronize with the
ildasm output. Thereafter, the name of the data directory is displayed. Now for
some quirks. For some reason the last data directory is not displayed, the
second last is the CLR header.
For this data directory,
ildasm places 67 spaces before displaying it whereas for the others, after
displaying them, 67 spaces are inserted till the end of the line. For this
purpose, an if statement that checks the name of the data directory is
introduced which decides on the spaces that are to be padded to the string
before writing it out. To verify every byte displayed is similar to the output
displayed by ildasm, we had to cater to ever space seen also. Thus we had no
choice but to spend lots of time getting the spaces right. Now that the first
program is over, the output can be compared with that of the disassembler and
to check that it matches it to a T.
Even though the .Net
documentation very clearly specifies that the MS_DOS stub should be exactly 128
bytes large, not all .Net compilers follow the documentation. This
documentation also specifies the values that most fields must have.
In the standard PE header
the Machine field must always be 0x14c. The Date Time field is the number of
seconds since 1st Jan 1970 i.e 00:00:00 and the Pointer to Symbol
table and number of symbols must always be 0. The final field Characteristics
has the following bits 0x2, 0x4, 0x8, 0x100 set and the rest 0. The bit 0x2000
is set for a dll and cleared for an exe file.
The PE standard header
fields are now set as follows. The Magic number is 0x10b. The Major and Minor
version numbers are 6 and 0. The Code and Data sizes have the same meanings as
explained earlier. The RVA must point to bytes 0xff 0x25 followed by a RVA of
0x4000000 or 0 for a DLL.
The section that it falls in must have the attributes execute and read. The
Base of Code is 0x00400000 and 0 for a DLL and the base of Data is the data
section.
Every
exe file has a starting memory location that contains the first executable
instruction which is called the entry point. Windows 98 for example does not
understand a native .Net executable and hence it is called a non-CLI platform.
The words CLI will be repeated a trillion times and its full form is Common
Language Infrastructure.
For
an exe file, the first function to be called is CorExeMain and for a dll it is
_CorDllMain, the code of which resides in the library mscoree.dll. It is this
function that understands a .Net executable, thus we believe that in future
this function will reside in the operating system. It is this function that
understands concepts like IL and metadata which we will explain in course of
time.
The
Windows-specific fields have the following values. The image base as mentioned
earlier is 0x400000, the section and file alignment are 0x2000 and 0x200
respectively. The OS Major version is 4 and Minor version is 0. The User Major
and Minor versions are 0. The Sub-System Major version is 4 and Minor version
0. The Reserved field is always 0. The Image Size is a size in bytes of all
headers plus padding and it has to be a multiple of Section Alignment. The Header Size is the size of three
headers, DOS, PE header and optional PE
header. This also includes padding and must be a multiple of the File Alignment
value. The Checksum and DLL flags must be zero and the Subsystem can take a
value of 2 or 3 only. The Stack reserve has a value of 1Mb and stack commit is
4K. The heap Reserve and Commit have the same values also. The Loader flags are
0 and the Number of Data Directories are 16.
Most
of the data directories have an RVA value but with a size of 0. These are the
Import, Resource, Exception Certificate, Debug, Copyright , Global Ptr, TLS Table,
Load Config, Bound Import, Delay Import table and the last that is reserved.
The four directories that may have some size are the Import, Base Relocation ,
IAT and finally the CLI Header.
The
section headers immediately follow the optional headers since there is no entry
in the PE headers that point to the section headers. The name of the section is
what the section headers start with and it is 8 bytes large. Therefore there is
no terminating null when the length of the section name is 8 characters.
Normally section names start with a dot, for example, the section containing
code is called .text and that containing data is called .data. The second field
is called Virtual Size and it is a multiple of the section alignment. The field
stores the size of the section when the section is loaded in memory. The fourth
field is the SizeOfRawData. If this field is greater than the fourth, the
section is zero padded.
The
third field VirtualAddress is an RVA and thus relative to the image base. It
determines where the section is loaded in memory. The size of Raw Data is the
fourth field and it is the size of the initialized data on disk, thus a
multiple of the file alignment. As this field is rounded to the file alignment
and not section alignment like the virtual size, it cannot be greater than the
Virtual Size field. If the section contains only initialized data then the
value stored in this field is 0. The PointerToRawData field is a RVA to the
first page within the PE file and thus is a multiple of File Alignment.
The
next field is the Pointer to Relocations that is the rva of the relocation
section or .reloc. The Pointer to Line Numbers that follows is zero and the
Number of Relocations is the actual count number of relocations. The second
last field is the Number of Line numbers that is obviously zero. Finally there
is the characteristics that determines one of six possible attributes of the
sections. These attributes decide whether the section carries executable code,
initialized data, uninitialized data, is executable or read or write.
To stress test our disassembler,
we have looked at other languages also. Thus, if you are conversant with only
one language, you may find it a little difficult to stress-test your program.
The ones who are learned about the C++ programming language can attempt the
next program in sequence. About the software, fear not, cause if
you have installed Visual Studio.net the C++ compiler called cl also get
installed.
a.cpp
main()
{
}
a.cpp is a C++ program that simply contains one
function called main. There are two
dissimilarities between this cpp program and the smallest C# program. Firstly,
in C++, all functions need not be in a class hence they are made global. This
is one sensible thing in C++ that was amended in C# and Java, mainy due to
political reasons. The second difference is that the m of main is small and not
caps as in C#. This is done after consulting a dozen numerologists.
Compile the above cpp file to an exe file by
running the following command.
Cl /clr a.cpp
The /clr option creates a .Net executable. If for
some reason you cannot get the above program compiler, worry not as we have
gone the C++ way to get some more output and some more executables.
Our disassembler finally will
have beyond 10,000 lines of code. There is no way in heaven or hell, the entire
program can be explained in one go. Even God and ourselves would find it
difficult to understand what we are saying. So please follow instructions to
the letter T.
Program2.csc
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
}
public void ReadandDisplayImportAdressTable()
{
long stratofimports =
ConvertRVA(datadirectoryrva[1]);
mfilestream.Position = stratofimports;
Console.WriteLine("// Import Address
Table");
int outercount = 0;
while (true)
{
int rvaimportlookuptable =
mbinaryreader.ReadInt32();
if ( rvaimportlookuptable == 0)
break;
int datetimestamp = mbinaryreader.ReadInt32();
int forwarderchain = mbinaryreader.ReadInt32();
int name = mbinaryreader.ReadInt32();
int rvaiat = mbinaryreader.ReadInt32();
mfilestream.Position = ConvertRVA (name);
Console.Write("// ");
DisplayStringFromFile ();
Console.WriteLine("// {0} Import Address Table" ,
rvaiat.ToString("x8"));
Console.WriteLine("// {0} Import Name Table" ,
name.ToString("x8"));
Console.WriteLine("// {0} time date stamp" , datetimestamp);
Console.WriteLine("// {0} Index of first
forwarder reference" , forwarderchain);
Console.WriteLine("//");
long importtable = ConvertRVA(rvaimportlookuptable
) ;
mfilestream.Position = importtable;
int nexttable = mbinaryreader.ReadInt32();
if ( nexttable < 0 )
{
Console.WriteLine("// Failed to read import
data.");
Console.WriteLine();
outercount++;
mfilestream.Position = stratofimports + outercount
* 20;
continue;
}
int innercount = 0;
while ( true
)
{
long pos0 = ConvertRVA(rvaimportlookuptable) +
innercount * 4;
mfilestream.Position = pos0 ;
int pos1 = mbinaryreader.ReadInt32();
if ( pos1 == 0)
break;
long pos2 = ConvertRVA(pos1);
mfilestream.Position = pos2 ;
short hint = mbinaryreader.ReadInt16();
Console.Write("// ");
if ( hint.ToString("X").Length == 1)
Console.Write(" {0}" , hint.ToString("x"));
if ( hint.ToString("X").Length == 2)
Console.Write(" {0}" ,
hint.ToString("x"));
if ( hint.ToString("X").Length == 3)
Console.Write("{0}" ,
hint.ToString("x"));
Console.Write(" ");
DisplayStringFromFile();
innercount++;
}
Console.WriteLine();
outercount++;
mfilestream.Position = stratofimports + outercount
* 20;
}
Console.WriteLine("//Delay Load Import Address
Table");
if (datadirectoryrva[13] == 0)
Console.WriteLine("// No data.");
}
public long ConvertRVA (long rva)
{
int i;
for ( i = 0 ; i < sections ; i++)
{
if ( rva >= SVirtualAddress [i] && ( rva < SVirtualAddress[i] + SSizeOfRawData [i] ))
break ;
}
return SPointerToRawData [i] + ( rva -
SVirtualAddress[i] );
}
}
public void DisplayStringFromFile()
{
while ( true )
{
byte filebyte = (byte )mfilestream.ReadByte();
if ( filebyte == 0)
break;
Console.Write("{0}" , (char)filebyte);
}
Console.WriteLine();
}
// Import Address Table
// KERNEL32.dll
// 00006000 Import Address Table
// 000079bc Import Name Table
// 0 time date stamp
// 0 Index of first forwarder reference
//
// 167 GetModuleHandleA
// fd GetCommandLineA
// 1a8 GetSystemInfo
// 35d VirtualQuery
// mscoree.dll
// 000060e4 Import Address Table
// 000079d8 Import Name Table
// 0 time date stamp
// 0 Index of first forwarder reference
//
// 5a _CorExeMain
// Delay Load Import
Address Table
// No data.
The program program2.csc is
not shown in full. Only those functions that are new or changed are displayed.
Any instance variables added will also be shown. For example in the above since
we have introduced a function call to a new function ReadandDisplayImportAdressTable in the
abc function, the abc function is displayed again. The ReadPEStructures
function undergoes no change and hence is not shown at all.
Our disassembler also does not aim at winning any prizes in any
competition on speed and efficiency. The main objective of the program is to
help you understand the workings of a disassembler. Once this objective is
achieved, then modifications can be made for it to work faster. We have
scarified speed at the altar of understanding.
This program displays the import table. In the programming world
we share, share and share. Thus other programmers write code that is placed
functions in dll’s and we mortals call those functions in our code.
Microsoft
Windows comes with 100’s of such dll’s that contain code and expect the
programmers to use these functions while coding. These dll’s have names like
user32.dll, kernel32.dll etc. Every C# program eventually calls code in these
dll’s.
Besides,
Microsoft also allows programmers to create their own dll with their set of
functions and have other coders call them. When the linker creates an exe file
it list out all the dll’s that the exe file is calling code from.
Simultaneously, within these dll’s, there is a list that enlists the functions
that are being called. Thus, before executing any program in memory, the
operating system needs to also load the dll’s mentioned in the import table in
memory and check for the function in the executable with its corresponding
entry in the dll.
In order to
display the contents of the import directory, rva and the size are required.
The second data directory gives the rva and size of the import directory. The
function ReadandDisplayImportAdressTable then figures out
the Dll names and displays them as prescribed by the ildasm program.
The RVA or a relative virtual address is a number
that represents some entity in the memory. This location is where the runtime
loader will place the entity in memory. The file addresses are not significant
because the PE file format is optimized for memory. Thus using the RVA, it is
pertinent to figure out where on disk the import directory begins.
Function ConvertRVA comes to aid as it will convert
an RVA into a physical address. In the last program, three
section header details were stored in three different arrays alongwith the
number of sections in a variable called sections. This function ConvertRVA is
passed a memory location as a long that is to be converted into a disk based
address. As arrays start from 0, the for loop begins from 0 and ends when it
reaches the number of sections minus 1. In the loop, the parameter passed i.e.
rva is checked to be greater than the value of the array member SVirtualAddress
and at the same time less than the same value plus the second array
SSizeOfRawData.
The check is performed because the array
SVirtualAddress stores the starting rva that this section is associated with
and SSizeOfRawData is the size of the data of this section. Thus, the section
headers report the memory occupied by each header. The third member is the
SpointerToRawData, which is also the address of the start of the section, but
on disk. This approach helps in deciphering the rva the section belongs to and
once an equal match is attained, the loop is abruptly terminated. The
SPointerToRawData value cannot be the return value as it is the starting
position of the section on disk, therefore the rva parameter is subtracted from
the SVirtualAddress or starting rva in memory. This offset is then added to the
SSizeOfRawData value. Bear in mind that this works on the assumpution that a
valid rva is given and hence no error checks are performed.
Thus in short, the above workings are as follows.
The starting rva’s of each section and the length of the data of the section is
available. In the for loop, the rva passed is checked to be in the range of
each section. If so, the difference is added to the disk location where this
section starts. In other words, an RVA is the address of an entity after the
loader loads it in memory. Obviously the address where the image is loaded in
memory or Image Base is subtracted from the RVA. This is because the image can
be theoretically loaded anywhere in memory.
The method adopted for locating the physical file
location, given a RVA, is taken from the documentation that comes with the Tool
Developers Guide.
The wrongly spelt variable stratofimports tells is
where on disk the import table begins. This value is given to the Position
property. In this case, the variable in not needed but hey nobody is charging
us for an extra variable.
Two loops have been given to display the import
table since there are two different entities to display. The outer loop applies
to selecting each and every dll, one at a time and the inner loop is to display
the names of the functions from the chosen dll. The variable outercount is
initialized to 0 and it will be used later in the program.
Every dll has a structure called the Import
Directory Table that represents the details of the functions that are being
imported and its size is 20 bytes wide. The 20 bytes are well categorized in 5
fields. The first field is the address of the Import Lookup Table that gives
the name of the function that is being imported from each dll.
If this value is zero, then it signifies that the
Import Directory table has ended and the outer loop is to be terminated. The
second field is for the date time stamp and is always zero. When the exe file
is loaded into memory, the loader sets this field to the data time stamp of the
Dll. The third field is the index of the first forwarder reference
and its value is also zero. The fourth field is the name of the Dll and the
address is an RVA and hence the ConvertRVA function is
used which convert the address into a physical memory location. The Position
property is directly set to this value directly and then the function
DisplayStringFromFile is used to display the DLL name which is stored in ASCII
format. The last field is an RVA of the Import Address Table. This table is
similar in content to the Import Lookup Table and only changes after
the image is loaded into memory or bound. This value may be the last field of
the structure but displayed first.
Lets first move to the function DisplayStringFromFile.
The function starts with an indefinite while loop and simply fetches one byte
at a time from the file. This function assumes that the file pointer is placed
on the first byte and does not attempt to save the file position.
It then uses the Write function to display the byte
as a character by using the char cast. If the byte picked up is zero, it means
that the end of the dll names has been reached and the loop is then terminated.
Before we quit out of the function, we have ended with a new line. We could
have instead returned a string but chose not to for no particular reason.
The values of the structure members like the Import
Address Table, the Import Name Table and the Date Time stamp and the forwarder
are then displayed. Then using the loop construct, the names of the functions
from the dll’s are displayed. The
variable rvaimportlookuptable gives the rva of the Import Lookup Table and
using the ConvertRVA function this rva is converted into a physical location on disk.
As the innercount variable value is 0, the
multiplication yields a zero. The position property is set to this value
thereby having the file pointer positioned at the start of the table. The
Import Lookup Table is a set of int’s, one for each function being imported.
The value of the int being zero is an indication that the table is over,
thus we quit out of the inner loop.
The 31st bit is the most crucial and if
it is set, i.e. has a value of 1, then the importing is by ordinal values or number,
and if it is not set then the importing is by name. Our hypothesis is that we
are importing by name and hence the int read is taken to be an RVA to a Hint
Name table, otherwise it is a simple ordinal number.
One more reason as to why the 31st bit is
not checked for is that till date we have not encountered a single .Net
executable that imports functions from a DLL by its ordinal value. Thus writing
code that checks for imports by ordinal value is baseless since it can never be
verified for its accuracy. Also, the .Net world unlike the good old C/C++ world
does detest playing around with the internals of sections.
The int picked up is converted into a physical
location of the Hint Name Table. This table is of variable length with the
first short as the Hint field. The second field is the name of the function
stored as an ASCII string.
After attaining the size of the hint, the spacing
is determined. We could have done the formatting using inbuilt functions but
chose the brute force method just to tell you that as long as it works, use it.
However if the options get to many, then the above method gets to tedious. Then
the string class and ToString function offers a more elegant solution than our
clumsy way.
Once the name of the function is displayed, the
innercount variable is increased by 1.
On returning to the start of the loop, the next
task is to display the second function name and hint. But, the problem is that
the file pointer is currently positioned at the end of the name of the first function,
hence it is not on the right byte. Therefore, there is no alternate approach
but to jump to the start of the Import Lookup Table and then determine the rva
of second function.
The
ConvertRVA function moves to the start of the table and as innercount is
one and the size of each field is 4, the rva is now of the second function. We
are very much aware that the above is not an elegant way, but it works. We could have stored the original file
pointer position before moving the file pointer around.
An enter key is emitted after moving out of the
inner while loop. Then the variable outercount is increased by 1. This variable
keeps a count of the dll’s that have been scrutinized.
Bear in mind that before looping back to the outer
for loop, the file pointer must be positioned at the start of the appropriate
Import Directory Table. Thus, the same procedure is adopted again where using
the variable stratofimports we move to the start of the Import Directory Table
structures. Since each structure is 20 bytes long, we multiply the number of
dll’s already completed with the number that is stored in the outercount
variable. In this manner, the Import table is then completely displayed.
The second table that ildasm displays is the Delay
Load Import Address. The RVA and size for these are stored in the 14 Data
Directory. After examining around 5000 .Net executables, we realized that not
one of them had this table. The reason being that in the .Net world it is not
acceptable to write and create our own sections and hence this table does not
get created. The linker is endowed the responsibility to create this table.
You may be a bit surprised to see the if statement
checking the value of the variable nexttable. There is one file
system.enterpriseservices.thunk.dll that gives an error with the dll
oleaut32.dll while displaying the imports. The error check is to print the
error and continue with the program.
After having determined the start of the import
table, the rva of the hint name table is picked up. If this rva for some reason
is less than 0, we display an error message and then go back to the start of
the next dll. An rva represents a memory location and therefore cannot be
negative. This error occurs only with one dll and one dll within it.
For the import address table, the Date Time stamp
and the Forwarder Chain fields are zero as per the specification. Remember a
zero denotes the end of the table. The hint field should also be zero. A point
to be noted is that the name of the function is case sensitive. The names
_CorExeMain and _CorDllMain are decided by the specifications.
---
Program3.csc
int metadatarva;
int corflags;
int entrypointtoken;
int vtablerva;
int vtablesize;
int exportaddressrva;
int exportaddresssize;
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
}
public void ReadandDisplayCLRHeader()
{
Console.WriteLine("// CLR Header:");
mfilestream.Position =
ConvertRVA(datadirectoryrva[14]);
int size = mbinaryreader.ReadInt32();
int majorruntimeversion =
mbinaryreader.ReadInt16();
int minorruntimeversion =
mbinaryreader.ReadInt16();
metadatarva = mbinaryreader.ReadInt32();
int metadatasize = mbinaryreader.ReadInt32();
corflags = mbinaryreader.ReadInt32();
entrypointtoken = mbinaryreader.ReadInt32();
int resourcesrva = mbinaryreader.ReadInt32();
int resourcessize = mbinaryreader.ReadInt32();
int strongnamesigrva = mbinaryreader.ReadInt32();
int strongnamesigsize = mbinaryreader.ReadInt32();
int codemanagerrva = mbinaryreader.ReadInt32();
int codemanagersize = mbinaryreader.ReadInt32();
vtablerva = mbinaryreader.ReadInt32();
vtablesize = mbinaryreader.ReadInt32();
exportaddressrva = mbinaryreader.ReadInt32();
exportaddresssize = mbinaryreader.ReadInt32();
int managednativeheaderrva =
mbinaryreader.ReadInt32();
int managednativeheadersize =
mbinaryreader.ReadInt32();
if ( size >= 100)
Console.WriteLine("// {0} Header Size", size);
else
Console.WriteLine("// {0} Header Size", size);
Console.WriteLine("// {0} Major Runtime Version",
majorruntimeversion);
Console.WriteLine("// {0} Minor Runtime Version",
minorruntimeversion);
Console.WriteLine("// {0} Flags",
corflags.ToString("x"));
string dummy = "// " + entrypointtoken.ToString("x");
dummy = dummy.PadRight(12) + "Entrypoint
Token";
Console.WriteLine(dummy);
DisplayDataDirectory(metadatarva , metadatasize ,
"Metadata Directory");
DisplayDataDirectory(resourcesrva, resourcessize,
"Resources Directory");
DisplayDataDirectory(strongnamesigrva,
strongnamesigsize, "Strong Name Signature");
DisplayDataDirectory(codemanagerrva,
codemanagersize, "CodeManager Table");
DisplayDataDirectory(vtablerva, vtablesize,
"VTableFixups Directory");
DisplayDataDirectory(exportaddressrva,
exportaddresssize , "Export Address Table");
DisplayDataDirectory(managednativeheaderrva,
managednativeheadersize, "Precompile Header");
Console.WriteLine("// Code Manager
Table:");
if ( codemanagerrva == 0)
Console.WriteLine("// default");
}
}
// CLR Header:
// 72 Header Size
// 2 Major Runtime Version
// 0 Minor Runtime Version
// 1 Flags
// 6000001 Entrypoint Token
// 2074 [1b4
] address [size] of Metadata Directory:
// 0 [0 ] address [size] of Resources Directory:
// 0 [0 ] address [size] of Strong Name Signature:
// 0 [0 ] address [size] of CodeManager Table:
// 0 [0 ] address [size] of VTableFixups Directory:
// 0 [0 ] address [size] of Export Address Table:
// 0 [0 ] address [size] of Precompile Header:
// Code Manager Table:
// default
The focus of this program is to simply display the
CLR header. In the computer world, it is extremely difficult to incorporate any
changes in the program, as existing code needs to run parallely with new code.
Consequently, the file format for the .Net world is a strict extension of the
existing PE file format. The operating system however, has to identify with
some methods to differentiate between a conventional PE file and a .Net
executable.
Up to now, all our programs have worked with the
conventional PE file format. Henceforth, we will concentrate on the new
entities the .Net world brings in.
The CLI header is the starting point of all the
entities that make up the .Net world. This header is read-only and hence it is
placed in the read-only section. It is the second last section that resolves
the RVA and size of the CLI header.
It should suffice to say that metadata is the
pillar on which the .Net world rests on. If the .Net world were to get
renowned, it would be due to the concepts of metadata.
The program has seven instance variables freshly
created. In addition, a function named ReadandDisplayCLRHeader has been added
in the abc function. This newly introduced function will now display the CLI
header.
The data in the 14th directory entry is
used to position the file pointer at the start of the CLI header on disk. The
header begins with the size of the CLI header in its very first field, the
value shown is 72 and it is in bytes. The second and third refer to the version
of the runtime that is required to run the program. The major version normally
comes first, presently having the value of 2 and the minor version is 0. The
fourth field is the RVA of the meta data. This is then followed up with the
flags field that describes the image. The flag field is used by the loader.
Flag diagram
---
The first flag when it is set is is indicative of
the fact that the image is an IL only image. The second flag decides whether we
can run an image of 64 bit native integers in a 32 bit address space. The
fourth flag informs of a strong name signature. Thereafter comes the entry
point token or details of the first function, this will be explained later in
greater detail.
A series of data directory entries for the
Resources, StrongNameSignature, Code Manager table, Vtable Fixup’s, the Export Address
Table Jumps and finally the Managed Native Header follow up next. The Code
Manager table and Export Address Jumps and Managed Native Headers are always
zero.
In the program, a check is perform on the size to
be greater than 100. This consideration helps in formatting the output since an
extra space is to be added. As mentioned earlier the Code Manager table always
has a value of zero.
Program4.csc
int [] rows;
string [] tablenames = new
String[]{"Module" , "TypeRef" , "TypeDef" ,"FieldPtr","Field",
"MethodPtr","Method","ParamPtr" ,
"Param", "InterfaceImpl", "MemberRef",
"Constant", "CustomAttribute", "FieldMarshal",
"DeclSecurity", "ClassLayout", "FieldLayout",
"StandAloneSig" , "EventMap","EventPtr",
"Event", "PropertyMap", "PropertyPtr",
"Properties","MethodSemantics","MethodImpl","ModuleRef","TypeSpec","ImplMap","FieldRVA","ENCLog","ENCMap","Assembly","AssemblyProcessor","AssemblyOS","AssemblyRef","AssemblyRefProcessor","AssemblyRefOS","File","ExportedType","ManifestResource","NestedClass","TypeTyPar","MethodTyPar"};
long valid ;
byte [] metadata;
bool debug = true;
int tableoffset ;
int offsetstring = 2;
int offsetblob = 2;
int offsetguid = 2;
byte [] blob;
byte [] us;
byte [] guid;
string [] streamnames;
byte [] strings;
int [] ssize ;
int [] offset;
byte [][] names;
long startofmetadata;
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
}
public void ReadStreamsData()
{
startofmetadata = ConvertRVA(metadatarva);
if ( debug )
Console.WriteLine("Start of Metadata {0}
rva={1}" , metadatarva , startofmetadata );
mfilestream.Position = startofmetadata ;
mfilestream.Seek(4 + 2 + 2 + 4 ,
SeekOrigin.Current);
int lengthofstring = mbinaryreader.ReadInt32();
if ( debug )
Console.WriteLine("Length of String {0}"
, lengthofstring );
mfilestream.Seek(lengthofstring ,
SeekOrigin.Current);
long padding = mfilestream.Position % 4 ;
if ( debug )
Console.WriteLine("Padding {0}" , padding
);
mfilestream.Seek(2 , SeekOrigin.Current);
int streams = mbinaryreader.ReadInt16();
if ( debug )
Console.WriteLine("No of streams {0}
Position={1}" , streams , mfilestream.Position);
streamnames = new string[5];
offset = new int[5];
ssize = new
int[5];
names = new byte[5][];
names[0] = new byte[10];
names[1] = new byte[10];
names[2] = new byte[10];
names[3] = new byte[10];
names[4] = new byte[10];
int j ;
for ( int i = 0 ; i < streams ; i++)
{
if (debug)
Console.WriteLine("At Start Position={0}
{1}" , mfilestream.Position ,
mfilestream.Position % 4);
offset[i] = mbinaryreader.ReadInt32();
ssize[i] = mbinaryreader.ReadInt32();
if (debug)
Console.WriteLine("offset={0} size={1}
Position={2}" , offset[i] , ssize[i] , mfilestream.Position);
j = 0;
byte bb ;
while ( true )
{
bb = mbinaryreader.ReadByte();
if ( bb == 0)
break;
names[i][j] = bb;
j++;
}
names[i][j] = bb;
streamnames[i] = GetStreamNames (names[i]);
/*
To entertain the processor, we have to now write
extra code. As the stream names vary in size, we have to skip bytes until we
reach a four byte boundary. The best way to do it is check the value of the
Position property. If it is divisible by 4, then break out of the loop.
*/
while ( true )
{
if ( mfilestream.Position % 4 == 0 )
break;
byte b =
mbinaryreader.ReadByte();
}
if (debug)
Console.WriteLine("At End Position={0}
{1}" , mfilestream.Position ,
mfilestream.Position % 4);
}
for ( int i = 0 ; i < streams ; i++)
{
if ( streamnames[i] == "#~" ||
streamnames[i] == "#-" )
{
metadata = new byte[ssize[i]];
mfilestream.Seek(startofmetadata + offset[i] ,
SeekOrigin.Begin);
for ( int k = 0 ; k < ssize[i] ; k ++)
metadata[k] = mbinaryreader.ReadByte();
}
if ( streamnames[i] == "#Strings" )
{
strings = new byte[ssize[i]];
mfilestream.Seek(startofmetadata + offset[i] ,
SeekOrigin.Begin);
for ( int k = 0 ; k < ssize[i] ; k ++)
strings[k] = mbinaryreader.ReadByte();
}
if ( streamnames[i] == "#US" )
{
us = new byte[ssize[i]];
mfilestream.Seek(startofmetadata + offset[i] , SeekOrigin.Begin);
for ( int k = 0 ; k < ssize[i] ; k ++)
us[k] = mbinaryreader.ReadByte();
}
if ( streamnames[i] == "#GUID" )
{
guid = new byte[ssize[i]];
mfilestream.Seek(startofmetadata + offset[i] , SeekOrigin.Begin);
for ( int k = 0 ; k < ssize[i] ; k ++)
guid[k] = mbinaryreader.ReadByte();
}
if ( streamnames[i] == "#Blob" )
{
blob = new byte[ssize[i]];
mfilestream.Seek(startofmetadata + offset[i] , SeekOrigin.Begin);
for ( int k = 0 ; k < ssize[i] ; k ++)
blob[k] = mbinaryreader.ReadByte();
}
}
if ( debug )
{
for ( int i = 0 ; i < streams ; i++)
{
Console.WriteLine("{0} offset {1} size
{2}" , streamnames[i] , offset[i]
, ssize[i]);
if ( streamnames[i] == "#~" ||
streamnames[i] == "#-" )
{
for ( int ii = 0 ; ii <= 9 ; ii++)
Console.Write("{0} " , metadata[ii].ToString("X"));
Console.WriteLine();
}
if ( streamnames[i] == "#Strings")
{
for ( int ii = 0 ; ii <= 9 ; ii++)
Console.Write("{0} " ,
strings[ii].ToString("X"));
Console.WriteLine();
}
if ( streamnames[i] == "#US")
{
for ( int ii = 0 ; ii <= 9 ; ii++)
Console.Write("{0} " ,
us[ii].ToString("X"));
Console.WriteLine();
}
if ( streamnames[i] == "#GUID")
{
for ( int ii = 0 ; ii <= 9 ; ii++)
Console.Write("{0} " ,
guid[ii].ToString("X"));
Console.WriteLine();
}
if ( streamnames[i] == "#Blob")
{
for ( int ii = 0 ; ii <= 9 ; ii++)
Console.Write("{0} " ,
blob[ii].ToString("X"));
Console.WriteLine();
}
}
}
int heapsizes = metadata[6];
if ( (heapsizes & 0x01) == 0x01)
offsetstring
= 4;
if ( (heapsizes & 0x02) == 0x02)
offsetguid = 4;
if ( (heapsizes & 0x04) == 0x04)
offsetblob = 4;
valid = BitConverter.ToInt64 (metadata, 8);
tableoffset = 24;
rows = new int[64];
Array.Clear (rows, 0, rows.Length);
for ( int k = 0 ; k <= 63 ; k++)
{
int tablepresent = (int)(valid >> k ) &
1;
if ( tablepresent == 1)
{
rows[k] = BitConverter.ToInt32(metadata ,
tableoffset);
tableoffset += 4;
}
}
if ( debug )
{
for ( int k = 62 ; k >= 0 ; k--)
{
int tablepresent = (int)(valid >> k ) &
1;
if ( tablepresent == 1)
{
Console.WriteLine("{0} {1}" ,
tablenames[k] , rows[k]);
}
}
}
}
public string GetStreamNames(byte [] b)
{
int i = 0;
while (b[i] != 0 )
{
i++;
}
System.Text.Encoding e = System.Text.Encoding.UTF8;
string
dummy = e.GetString(b , 0 , i );
return dummy;
}
}
Output
Start of Metadata 8308
rva=628
Length of String 12
Padding 0
No of streams 4
Position=660
At Start Position=660 0
offset=96 size=196
Position=668
At End Position=672 0
At Start Position=672 0
offset=292 size=96
Position=680
At End Position=692 0
At Start Position=692 0
offset=388 size=16
Position=700
At End Position=708 0
At Start Position=708 0
offset=404 size=32
Position=716
At End Position=724 0
#~ offset 96 size 196
0 0 0 0 1 0 0 1 47 14
#Strings offset 292 size 96
0 3C 4D 6F 64 75 6C 65 3E 0
#GUID offset 388 size 16
6F E9 56 DC 49 C3 F1 4E A6
9
#Blob offset 404 size 32
0 8 B7 7A 5C 56 19 34 E0 89
AssemblyRef 1
Assembly 1
CustomAttribute 1
MemberRef 2
Method 2
TypeDef 2
TypeRef 2
Module 1
We start as before by creating a series of instance
variables and calling a function called ReadStreamsData. This function is the
one that delves into the innards of the .Net world. The CLI header has member
that holds the starting position or the RVA of the metadata. The ConvertRVA
method positions the file pointer to this start of metadata on the disk.
MetaData is the crux of .Net world. We will come to
a formal definition in some time.
The variable startofmetadata is used in the program
at a later stage to position at the start of this structure. The point to be
noted here is that most offsets in the .Net world are relative to the metadata
root since the internal .Net structures commence at this point.
The value stored in variable metadatarva may/may
not be displayed since the ildasm utility does not display its value unless it is
executed in debug mode. Therefore the variable is enclosed in an if statement,
which will be displayed only when the value of the variable debug is set to
true. The WriteLine function dutifully displays the value stored in the
variable.
Thus, all output that ildasm does not display, but
is required for debugging purposes is placed within an if statement. The if
statement checks on the value of the debug variable to display these
statements. The data following the starting position of metadata can be easily
placed in a structure. It begins with a magic number which can be referred to
as a signature, viz, 0x424A5342 or BJSB.
The team that developed the .Net internals
comprised of four of the most brilliant individuals who ever worked at
Microsoft. This will be proved only after you have finished reading this book.
BJSB are the initials of the four heroes who designed the internals of the .Net
world.
Following the signature is the Major version
number, 1 and then the minor verison number, 0. These two fields are of short
type and can be easily ignored while scrutinizing the internals. The next four
bytes are reserved and will always have a value of zero. Thus, the next step is
to skip these 12 bytes from the start, as the four fields are irrelevant. The Seek function comes handy, with the
second parameter as 12 followed by the enum Current (not Begin).
The fifth field is the length of an ASCII or UTF8
string which is due to follow. This string represents the version number, viz
v1.0.3328 of the .Net Framework installed on the machine. This number can be
further verified with the output displayed by the C# compiler, csc but with no
options. The length of the string is 12 bytes.
The Seek function is used to proceed further, with
the variable lengthofstring as the first parameter since it holds the lengh of
the string.
A 32-bit processor works best when fields begin at
a 4-byte boundary. It is for this
reason that the next field starts at a location divisible by 4.
The mod operator % is implemented to discern the
padding length, whereupon the bytes are eventually skipped within the file.
Nevertheless, the length of the version string is completely divisible by 4,
therefore the padding is zero, thus leaving no bytes for the padding. An
additional seek function too could have served in skipping the padding bytes
too.
The padding is followed by the flags field, which
is a short and for the moment reserved and has a value of 0.
All .Net entities stored on disk are in form of
streams. It is the second last field that unravels the total count of streams
in the file. The topnotch number is five, but in our file, it is one less i.e.
4.
Thereafter comes the the Stream Header, one for
each stream present in the file. Even though there are 4
streams in the file, just to be a lot careful, a maximum of five streams
are considered while writing the
program, thus an array of five strings called streamnames is created.
The stream headers also gives the size and offset
of each stream in the file. Therefore, int arrays of offset and ssize have been
introduced to store these values. The names of the streams are stored in the
stream header as a series of bytes. Thus while read these bytes in memory, a
two dimensional array of bytes called names has been introduced.
The bigger dimension is five large as in all there
are five streams, which the new accomplishes. Then each of these five byte
arrays are given a size. Since the name of each stream does not exceed 10
characters, the individual arrays names[0] onwards are initialized to an array
10 large.
While reading each stream header, the WriteLine
function is used which gives details on the position in the file. In the next
program, the debug variable will be set to false as the output is not in sync
with that of ildasm as mentioned before.
The stream headers gives three bits of data. It
starts with two int’s that give us an offset from the start of the metadata
root where the data for this streams begins. The second int is the size of the
stream in bytes, always a multiple of 4. As mentioned earlier, a location
divisible by 4 is the most preferred option. Lastly, the final field is the
name of the stream, always null terminated.
For reasons such as, the stream name is different thus
it varies, the stream header is not fixed in size. Even then, the next stream
header will begin at a 4 byte boundary. After storing the offset and size in an
array, an infinite while loop is placed to read the name of the stream stored
in bytes into the array called names.
The variable i i.e. the loop variable decides the
outer array dimension and the variable j the inner dimension. If the byte read
is zero, we exit off the string, thus terminating the loop. A null character is
added to the names array thereafter, and hence the last array member is
initialized to the value of bb, a null.
The function GetStreamNames that comes next
converts a byte array ending with a null to a string. The job of this function
is to scan through the byte array until it reaches the terminating null and
increment the variable i. Thus at the end of the loop, variable i stores the
number of bytes that make up the steam name minus the null.
The GetString function then, given a byte array and
a starting point and length, converts the byte array into a string. The string
is stored in a variable called dummy, which is then returned and stored in the
array streamnames. Thus, the stream names as well as their offsets and sizes
are now stored together in the array. The next hurdle is reading from a disk,
which is a very slow process. Thus it is not a good idea to read the entire
stream data into array.
The for loop is brought in again but now with the
if statements that check the names of the streams. If the stream has a certain
name, say #~, then an array called metadata
is created with the length equivalent to the value held in corresponding
ssize array member. The corresponding offset member is then used to jump to the
start of the data for the #~ named stream. Do take into account that the offset
is relative to the start of the metadata root, whose value is stored in the
variable startofmetadata. Thereafter using a simple for loop and the ReadByte
function the entire stream contents are read, one byte at a time.
In a similar fashion, if the name of the stream is
#Strings, the contents are read in the
strings array. So also, if it is #US, then us array is used and for #Guid, the
guid array. Finally the #Blob stream is also stored into the blob array.
In this manner, the contents of the entire stream
are read into corresponding arrays, thus avoiding disk accesses completely.
To verify our acts, the debug variable can be put
to use which will display the first 10 bytes of the five streams using the
arrays that have been just filled. The
size and the offset of each stream is also displayed.
Ignoring the initial bytes for the time being, we
directly jump to the 7 byte of the metadata stream, or the stream called #~.
The 7th byte is examined bitwise, where the 1st, 2nd
and 4th bit of the byte are examined. If it is on, i.e. 1, the
variable offsetstring or offsetguid or offsetblob is set to 4 from its
predetermined value of 2.
The question like what do these variables do will
be explain in just a second. The stream called #Strings is made up of all the
strings entered in the program. But, these strings are not the text strings
which are given as parameters to functions like WriteLine. For the text based
strings, there is a separate stream called #US.
Names like System, Console, WriteLine etc used in
the program need to be stored in some location. This location is the stream
called #Strings.
Thus the #Strings stream is a series of null
terminated strings, wherein each string begins at a certain or fixed offset
from the onset of the stream. The word System will be replaced with an offset
from the start of the streams data. However, the question that now arises is
whether this offset should be taken as 2 bytes or 4 bytes.
This matters a lot from the viewpoint of efficiency
since a namespace name like System which is referred to around 1000 times in a
program, may get completely misaligned. A wrong byte choice could lose large
number bytes.
The designers considered the alternative of 2 bytes
and 4 because in 2 bytes a stream larger than 64k large could not be
accommodated. Also a defult use of 4 bytes could lead to immense wastage when
the size is small. For this reason, the
6th bit is made extremely significate since it states the stream size..
If the bit is on, the stream is larger than 64k and hence the variable is set
to 4 from the default value of 2. We could have also use the size array to
figure out the byte size, but reserving a bit for it is a more elegant option.
It is only the streams of #String, #Guid and #Blob that have a size. The string
#Us is restricted to a size of 64k whereas the most important stream #~ has no
such restriction as it is not referred to, on the contrary, the #~ refers to
the other streams.
The next task, to read the 8 bytes that are stored
from the 6th array position is performed by using the static
function ToInt64 from the class BitConverter. This function read bytes from a
byte array and converts them into a short or a int or a long. The first
parameter is the name of the array, just to jog your memory, in case you have
forgotten, the entire stream #~ has been read into the metadata array earlier.
The second parameter is an offset into the array. As a result, the
variable valid now gives the long stored at the 8th byte
position. The variable tableoffset is set to 24 and then an array rows that is
of size 64 is created.
It is a good idea to initialize everything
including the variables newly create, therefore the Clear static function of the array class is used which sets
all the 64 members passed in the third parameter to a value 0, the value passed
as the second parameter. This is however not mandatory as instance variables
get initialized to zero, but why take chances. What if the programmer writing the
above code was sleeping at the wheel.
Lets us now look at the concept of metadata that we
have been threatening to unfold for a very long time.
Lets take a simple class having three methods. The
names of the three methods as well as the name of the class have to be stored
somewhere. This somewhere is called a table as there will be multiple
occurrences of the above entities. The guys who designed the .Net world assumed
that the maximum number of such table will be 64 and thus they gave each table
a number.
For eg, the Table 0 keeps track of all the modules
the user-written program contains. Also, each .net image comprises of one and
only one module. Table 1 is for the types referred to in the program and Table
2 is for all the user-created types.
The requirement of the .Net framwork was to enclose
an efficient way of representing the tables present in the current image and
storing their information. A long datatype was considered most suitable as it
is represented as 64 bits wheren each bit would signify whether a certain table
was present or not. Thus if bit 0 is set,
it implied that the Module table is present, whereas when 0 then it is
missing . In a similar manner, a user-defined type is createad in the program,
bit 2 of the long will be initialized or set. This approach thus is a highly
efficient and elegant way of keeping track of whether 64 entities are present
or not.
The next 8 bytes are a field called sorted which
has not been used followed by a a series of int’s that state the number of records
present in each present table.
Thus if bytes 0,2 and 3 are on, the first int tells
starting from byte 24 tells us the number of records of table 0, the next int
the number of records in table 2 as table one has its bit off and thus no
present. The next int will tell us the number of records of table 3 and so on.
Lets convert this into code and fill up the rows array with the number of rows
that each table has.
Remember the rows array has been initialized to
zero. We start with a loop where the variable k doubles up as the loop variable
as well as table number. We first need to know whether the table has the bit
set or not. We right shift the valid member by k thus throwing off all the
earlier bits and also making the k th
bit the first bit.
We bit wise and with 1 and thus tablepresent is
either 0 or 1. Thus to take a concrete example, if the we want to know the
status of the 10 table, we right shift all the bits by 10. This ensures that
the first bit is what used to be the 10th. We then bit wise and with
1, zeroing out all the bits but the first.
If the answer is one, this means that the table is
present, we read the int stored from tableoffset that is given a value of 24 as
this is where the number of rows begin. We are using the function ToInt32 to
read an int. The corresponding rows array member is set to the number of rows
in the table as the variable is the table number.
We then increase the tableoffset by 4 as the size
of an int is 4 and we need tableoffset to point to the next int that stands for
the rows of the next present table. Lets us now look at the concept of metadata
that we have been threatening to do for a very long time. Lets take a simple
class with three methods.
We would need to store the names of the three
methods someone as well as the name of the class. This somewhere is called a
table as there will be multiple occurrences of the above entities. The guys who
designed the .Net world assumed that the maximum number of such table will be
64 and they gave each table a number.
Thus table 0 keeps track of all the modules our
program contains and each .net image is comprised of one and only one module.
Table 1 is all the types we refer to and table 2 all the types we create ourselves.
Now the .Net world wanted a efficient way of representing which tables are
present in the current image.
They needed a efficient way of storing this
information. A long has 64 bits and thus each bit tells us whether a certain
table is present or not. Thus if bit 0 is set, this means that the Module table
is present, if 0 then it is not.
Similarly if we have created a type in our program,
bit 2 will be set. This is a highly efficient and elegant way of keeping track
of whether 64 entities are present or not. The next 8 bytes are a field called
sorted which we do not use and then we have a series of int’s that tell us the
number of records present in each present table.
Thus if bytes 0,2 and 3 are on, the first int
starting from byte 24 stands for the number of records of table 0, the next int
stands for the number of records in table 2 since table one has its bit off
thus having no records. The next int holds the number of records for table 3
and so on.
-------------------------------------------------*-*-*-
Lets convert this into code and fill up the rows
array with the number of rows that each table has. Remember the rows array has
been initialized to zero. We start with a loop where the variable k doubles up
as the loop variable as well as table number. We first need to know whether the
table has the bit set or not.
We right shift the valid member by k thus throwing
off all the earlier bits and also making the k th bit the first bit. We bit wise and with 1 and thus tablepresent
is either 0 or 1. Thus to take a concrete example, if the we want to know the
status of the 10 table, we right shift all the bits by 10.
This ensures that the first bit is what used to be
the 10th. We then bit wise and with 1, zeroing out all the bits but
the first. If the answer is one, this means that the table is present, we read
the int stored from tableoffset that is given a value of 24 as this is where
the number of rows begin.
We are using the function ToInt32 to read an int.
The corresponding rows array member is set to the number of rows in the table
as the variable is the table number. We then increase the tableoffset by 4 as
the size of an int is 4 and we need tableoffset to point to the next int that
stands for the rows of the next present table.
Lets now understand the concepts of metadata in
greater detail. The physical and logical representation of metadata is the
same. Some fields that can be ignored while reading. Everything that represents
metadata is stored in streams. The #Strings stream stores identifier strings
and is also called the string heap.
The #Blob heap is the most complex of them all as
function signatures are stored here in a extremely compressed form. The best
way to get a headache is to understand the Blob string. The reason why it is
called a blob is because it has no structure at all and every byte stands by
itself.
The other streams have a form and structure. The
other important stream is the #~ as this is what contains the actual metadata
or physical tables. The documentation tells us very clearly that some compilers
do not call the stream #~ but #-. This is a uncompressed, non optimized representation of the metadata tables.
It also uses some extra tables to store pointer
values. We will not go into details other than tell you the good news, this
stream is not part of the ECMA standard. The documentation is very clear that
the streams will appear only once and we cannot have two #Blob streams for
example.
All the streams do have to be present but normally
they are. In our case we have no strings passed as parameters and hence the #US
stream is not there. We have not come across a single .net executable that did not have at least four streams at the
bare minimum.
The #Strings stream contains a series of null
terminated strings that are accessed from the #~ stream. The first entry is
always 0. The #Strings stream may contain garbage but those offsets are not
addressable from tables. Those that are accessible are always valid null
terminated ASCII or UTF8 strings.
The first byte of the blob heap is always a size
byte that tells us how many bytes follow. This is not strictly true as
otherwise the size of the blob data would always be less than 255 bytes. Thus
data in the blob heap is compressed including the size byte. All this will be
explained later on in the book.
This is required as the blob heap has no null
marker. The first entry is again 0. The #GUID stream contains 128 bit Guids
which are nothing but large numbers to uniquely identify something like our
image. The #~ stream is the only stream that has its own structure, parts of
what we have touched upon earlier.
The first four bytes are always 0 as they are
reserved. The next two bytes are the pesky version numbers of the table data
schemata that will be 1 for the major version and 0 for the minor version. The
fourth field is the heapsizes that we did earlier and the fifth field is 1 and
its value is reserved.
Some reserved fields have a value of 1 and some 0,
place your thinking cap on to figure out why, we tried and failed. The next 8
bytes are called valid as we explained before, followed by 8 bytes of a field
called sorted and then the number of rows per table. After this is the actual
data for the tables.
The valid field is called the bitvector as each bit
denotes whether a table is present or not. As of now there are only 43 or 0x2b
tables defined and hence all bits larger than these will be set to 0. The
actual data for the tables depends on the structure for each table.
Thus if a certain table size is 20 bytes and if it
has 20 rows, 200 bytes will be the data for the table. Once these 200 bytes get
over, the data for the next table will start.
Program5.csc
int [] sizes;
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
}
public void FillTableSizes()
{
int modulesize = 2 + offsetstring + offsetguid +
offsetguid + offsetguid ;
int typerefsize
= GetCodedIndexSize ("ResolutionScope") + offsetstring +
offsetstring ;
int typedefsize = 4 + offsetstring + offsetstring +
GetCodedIndexSize("TypeDefOrRef") + GetTableSize("Method")
+ GetTableSize("Field");
int fieldsize = 2 + offsetstring + offsetblob ;
int methodsize = 4 + 2 + 2 + offsetstring +
offsetblob + GetTableSize("Param");
int paramsize = 2 + 2 + offsetstring;
int interfaceimplsize =
GetTableSize("TypeDef") +
GetCodedIndexSize("TypeDefOrRef");
int memberrefsize =
GetCodedIndexSize("MemberRefParent") + offsetstring + offsetblob ;
int constantsize = 2 +
GetCodedIndexSize("HasConst") + offsetblob;
int customattributesize =
GetCodedIndexSize("HasCustomAttribute") +
GetCodedIndexSize("HasCustomAttributeType") + offsetblob;
int fieldmarshallsize =
GetCodedIndexSize("HasFieldMarshal") + offsetblob;
int declsecuritysize = 2 +
GetCodedIndexSize("HasDeclSecurity") + offsetblob;
int classlayoutsize = 2 + 4 +
GetTableSize("TypeDef");
int fieldlayoutsize = 4 +
GetTableSize("Field");
int stanalonssigsize = offsetblob;
int eventmapsize =
GetTableSize("TypeDef") +
GetTableSize("Event");
int eventsize = 2 + offsetstring +
GetCodedIndexSize("TypeDefOrRef");
int propertymapsize =
GetTableSize("Properties") + GetTableSize("TypeDef") ;
int propertysize = 2 + offsetstring + offsetblob;
int methodsemantics = 2 +
GetTableSize("Method") + GetCodedIndexSize("HasSemantics");
int methodimplsize =
GetTableSize("TypeDef") +
GetCodedIndexSize("MethodDefOrRef") +
GetCodedIndexSize("MethodDefOrRef");
int modulerefsize = offsetstring;
int typespecsize = offsetblob;
int implmapsize = 2 +
GetCodedIndexSize("MemberForwarded") + offsetstring +
GetTableSize("ModuleRef");
int fieldrvasize = 4 + GetTableSize("Field");
int assemblysize = 4 + 2 + 2 + 2 + 2 + 4 + offsetblob + offsetstring + offsetstring ;
int assemblyrefsize = 2 + 2 + 2 + 2 + 4 + offsetblob + offsetstring + offsetstring +
offsetblob;
int filesize = 4 + offsetstring + offsetblob;
int exportedtype = 4 + 4 + offsetstring +
offsetstring + GetCodedIndexSize("Implementation");
int manifestresourcesize = 4 + 4 + offsetstring +
GetCodedIndexSize("Implementation");
int nestedclasssize =
GetTableSize("TypeDef") + GetTableSize("TypeDef") ;
sizes = new int[]{ modulesize, typerefsize ,
typedefsize ,2, fieldsize ,2,methodsize ,2,paramsize
,interfaceimplsize,memberrefsize ,constantsize ,customattributesize
,fieldmarshallsize ,declsecuritysize ,classlayoutsize
,fieldlayoutsize,stanalonssigsize ,eventmapsize ,2,eventsize ,propertymapsize
,2,propertysize ,methodsemantics ,methodimplsize ,modulerefsize ,typespecsize
,implmapsize ,fieldrvasize ,2 , 2 , assemblysize ,4,12,assemblyrefsize
,6,14,filesize ,exportedtype ,manifestresourcesize ,nestedclasssize };
}
public int GetCodedIndexSize(string nameoftable)
{
if ( nameoftable == "Implementation")
{
if ( rows[0x26] >= 16384 || rows[0x23] >=
16384 || rows[0x27] >= 16384 )
return 4;
else
return 2;
}
else if ( nameoftable ==
"MemberForwarded")
{
if ( rows[0x04] >= 32768 || rows[0x06] >=
32768)
return 4;
else
return 2;
}
else if ( nameoftable == "MethodDefOrRef")
{
if ( rows[0x06] >= 32768 || rows[0x0A] >=
32768)
return 4;
else
return 2;
}
else if ( nameoftable == "HasSemantics")
{
if ( rows[0x14] >= 32768 || rows[0x17] >=
32768)
return 4;
else
return 2;
}
else if ( nameoftable ==
"HasDeclSecurity")
{
if ( rows[0x02] >= 16384 || rows[0x06] >=
16384 || rows[0x20] >= 16384)
return 4;
else
return 2;
}
else if ( nameoftable ==
"HasFieldMarshal")
{
if ( rows[0x04] >= 32768|| rows[0x08] >=
32768)
return 4;
else
return 2;
}
else if ( nameoftable == "TypeDefOrRef")
{
if ( rows[0x02] >= 16384 || rows[0x01] >=
16384 || rows[0x1B] >= 16384 )
return 4;
else
return 2;
}
else if ( nameoftable ==
"ResolutionScope")
{
if ( rows[0x00] >= 16384 || rows[0x1a] >=
16384 || rows[0x23] >= 16384 || rows[0x01] >= 16384 )
return 4;
else
return 2;
}
else if ( nameoftable == "HasConst")
{
if ( rows[4] >= 16384 || rows[8] >= 16384 ||
rows[0x17] >= 16384 )
return 4;
else
return 2;
}
else if ( nameoftable ==
"MemberRefParent")
{
if ( rows[0x08] >= 8192 || rows[0x04] >= 8192
|| rows[0x17] >= 8192 )
return 4;
else
return 2;
}
else if ( nameoftable ==
"HasCustomAttribute")
{
if ( rows[0x06] >= 2048 || rows[0x04] >= 2048
|| rows[0x01] >= 2048 || rows[0x02] >= 2048 || rows[0x08] >= 2048 ||
rows[0x09] >= 2048 || rows[0x0a] >= 2048 || rows[0x00] >= 2048 ||
rows[0x0e] >= 2048 || rows[0x17] >= 2048 || rows[0x14] >= 2048 ||
rows[0x11] >= 2048 || rows[0x1a] >= 2048 || rows[0x1b] >= 2048 ||
rows[0x20] >= 2048 || rows[0x23] >= 2048 || rows[0x26] >= 2048 || rows[0x27]
>= 2048 || rows[0x28] >= 2048 )
return 4;
else
return 2;
}
else if ( nameoftable ==
"HasCustomAttributeType")
{
if ( rows[2] >= 8192 || rows[1] >= 8192 ||
rows[6] >= 8192 || rows[0x0a] >= 8192 )
return 4;
else
return 2;
}
else
return 2;
}
public int GetTableSize(string tablename)
{
int ii;
for ( ii = 0 ; ii < tablenames.Length ; ii++)
{
if ( tablename == tablenames[ii] )
break;
}
if ( rows[ii] >=65535)
return 4;
else
return 2;
}
}
The above example is one of the most difficult
examples to understand and you may want to read the explanation n number of
times where n can at times exceed infinity. Or better still, skip this program
and read the next one and then maybe come back to it once again. We have only
created one array of ints called sizes.
The abc function has one more function call
FillTableSizes. All that we have done in this function is initialize variables
like modulesize, typerefsize etc and while creating the array sizes, used these
variables to initialize the array called sizes.
Thus the zeroth member of the sizes array is filled
up by the modulesize variable, the first member by typerefsize etc. If you
remember the module table is known by a number 0, the typeref table by number 1
and so on. The problem is that the metadata documentation does not tell us the
size of any table.
If we do not know the size of a table, how do we
read the metadata present in the #~ stream. The only reason that the docs do
not specify the size of a table is for reasons of efficiency. The size of the data
decides the size of the table.
Smaller the table size, lesser the space it
occupies and the faster the speed to access data associated with the table.
Lets look at the module table and figure out its size. The first field is
called the Generation and is reserved and is always 2 bytes.
We will come across lots of fields like the one
above that have a fixed size come heaven or hell. The second field of the
module table is the name of the module that is an offset into the strings
stream. This is where we have a problem as the specifications cannot tell us
whether the index should be 2 or 4 bytes long.
This is decided by the size of the stream #Strings.
This is where we use our variable offsetstring to figure out whether the size
is 2 or 4 bytes. In the same vein, the next two fields are an offset into the
guid stream and therefore we use the offsetguid variable.
Thus the value of the variable modulesize cannot be
figured out in advance and it depends upon the sizes of the two streams Strings
and Guid. This is how we dynamically determine the size of each table and place
that value in the sizes array. The second table TypeRef creates a bigger
problem.
Thus lets skip that for a moment and move on to the
third table TypeDef. The last field of this table is an offset into the Field
table. We cannot assume a size of 2 for this field as then we are restricting
the field table to 65536 records.
Thus we use a function called GetTableSize to
figure out whether the index should be 2 or 4 bytes. We first need to convert
the table name into its number and we scan through the tablenames array and
break out when we meet a match. At this time the variable ii is the table
number.
We then check the corresponding member of the rows
array which if you remember contains the number of rows present in each table.
If the rows member is greater than 65535 we return 4, else we return 2.
Normally tables do not have so many records and we could get away with assuming
a index size of 2.
Lets look at the fourth field of the TypeDef table.
This is also an offset into a table, but with a slight twist. The offset can be
to on of three tables, TypeDef, TypeRef or TypeSpec. This is a very common
construct in the metadata world where a field can refer to one of many tables
and it is called a coded index.
Thus we leave 2 bits to store the table that it
belongs to. In this specific case a value of the first two bits being 0 makes
the remaining 14 bits point to the a record in the TypeDef table, 1 signifies
the TypeRef table and 2, the TypeSpec table. We can codify 4 tables within 2
bits.
Thus we have as mentioned before, 14 bits to store
the record number. Thus if any of the three tables has a record count larger
than 16384, then the size of the field is 4 bytes. The key point is any of the
three tables has a row count larger than 16384.
Two of the three tables may have 0 rows, but as one
of them is larger than 16384, the index size is 4. There are a dozen such coded indexes and the
function GetCodedIndexSize first checks the name of the coded index passed as a
parameter.
It then checks if any of the tables exceed the
number of rows that is decided by the number of bits used to code the tables.
This function finally returns 2 or 4.
Thus at the end of the day all that we do is simply initialize the sizes
array with the actual lengths of each table.
Program6.csc
public struct FieldPtrTable
{
public int index;
}
public struct MethodPtrTable
{
public int index;
}
public struct ExportedTypeTable
{
public int flags ;
public int typedefindex ;
public int name ;
public int nspace ;
public int coded ;
}
public struct NestedClassTable
{
public int nestedclass;
public int enclosingclass;
}
public struct MethodImpTable
{
public int classindex;
public int codedbody;
public int codeddef;
}
public struct ClassLayoutTable
{
public short packingsize ;
public int classsize ;
public int parent ;
}
public struct ManifestResourceTable
{
public int offset;
public int flags;
public int name;
public int coded;
}
public struct ModuleRefTable
{
public int name;
}
public struct FileTable
{
public int flags;
public int name;
public int index;
}
public struct EventTable
{
public short attr;
public int name;
public int coded;
}
public struct EventMapTable
{
public int index;
public int eindex;
}
public struct MethodSemanticsTable
{
public short methodsemanticsattributes;
public int methodindex;
public int association;
}
public struct PropertyMapTable
{
public int parent;
public int propertylist;
}
public struct PropertyTable
{
public int flags;
public int name;
public int type;
}
public struct ConstantsTable
{
public short dtype;
public int parent;
public int value ;
}
public struct FieldLayoutTable
{
public int offset;
public int fieldindex;
}
public struct FieldRVATable
{
public int rva ;
public int fieldi;
}
public struct FieldMarshalTable
{
public int coded;
public int index;
}
public struct FieldTable
{
public int flags;
public int name;
public int sig;
}
public struct ParamTable
{
public short pattr;
public int sequence;
public int name;
}
public struct TypeSpecTable
{
public int signature;
}
public struct MemberRefTable
{
public int clas;
public int name;
public int sig;
}
public struct StandAloneSigTable
{
public int index;
}
public struct InterfaceImplTable
{
public int classindex;
public int interfaceindex;
}
public struct TypeDefTable
{
public int flags;
public int name;
public int nspace;
public int cindex;
public int findex;
public int mindex;
}
public struct CustomAttributeTable
{
public int parent;
public int type;
public int value;
}
public struct AssemblyRefTable
{
public short major,minor,build,revision;
public int flags ;
public int publickey ;
public int name ;
public int culture ;
public int hashvalue ;
}
public struct AssemblyTable
{
public int HashAlgId;
public int major, minor,build,revision ;
public int flags ;
public int publickey ;
public int name ;
public int culture ;
}
public struct ModuleTable
{
public int Generation;
public int Name;
public int Mvid;
public int EncId;
public int EncBaseId;
}
public struct TypeRefTable
{
public int resolutionscope;
public int name;
public int nspace;
}
public struct MethodTable
{
public int rva;
public int impflags;
public int flags;
public int name;
public int signature;
public int param;
}
public struct DeclSecurityTable
{
public int action;
public int coded;
public int bindex;
}
public struct ImplMapTable
{
public short attr;
public int cindex;
public int name;
public int scope;
}
public AssemblyTable [] AssemblyStruct;
public AssemblyRefTable [] AssemblyRefStruct ;
public CustomAttributeTable []
CustomAttributeStruct;
public ModuleTable[] ModuleStruct;
public TypeDefTable [] TypeDefStruct;
public TypeRefTable [] TypeRefStruct;
public InterfaceImplTable [] InterfaceImplStruct;
public FieldPtrTable [] FieldPtrStruct;
public MethodPtrTable [] MethodPtrStruct;
public MethodTable [] MethodStruct;
public StandAloneSigTable [] StandAloneSigStruct;
public MemberRefTable [] MemberRefStruct;
public TypeSpecTable [] TypeSpecStruct;
public ParamTable [] ParamStruct;
public FieldTable [] FieldStruct;
public FieldMarshalTable [] FieldMarshalStruct;
public FieldRVATable [] FieldRVAStruct;
public FieldLayoutTable [] FieldLayoutStruct;
public ConstantsTable [] ConstantsStruct;
public PropertyMapTable [] PropertyMapStruct;
public PropertyTable [] PropertyStruct;
public MethodSemanticsTable []
MethodSemanticsStruct;
public EventTable [] EventStruct;
public EventMapTable [] EventMapStruct;
public FileTable [] FileStruct;
public ModuleRefTable [] ModuleRefStruct;
public ManifestResourceTable []
ManifestResourceStruct;
public ClassLayoutTable [] ClassLayoutStruct;
public MethodImpTable [] MethodImpStruct;
public NestedClassTable [] NestedClassStruct;
public ExportedTypeTable [] ExportedTypeStruct;
public DeclSecurityTable [] DeclSecurityStruct;
public ImplMapTable [] ImplMapStruct;
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
}
public void ReadTablesIntoStructures()
{
//Module
int old = tableoffset;
bool tablehasrows
= tablepresent(0);
int offs = tableoffset;
if ( debug )
Console.WriteLine("Module Table Offset {0}
Size {1}" , offs , sizes[0]);
tableoffset = old;
if ( tablehasrows
)
{
ModuleStruct = new ModuleTable[rows[0] + 1];
for ( int k = 1 ; k <= rows[0] ; k++)
{
ModuleStruct[k].Generation = BitConverter.ToUInt16
(metadata, offs);
offs += 2;
ModuleStruct[k].Name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
ModuleStruct[k].Mvid = ReadGuidIndex(metadata,
offs);
offs += offsetguid;
ModuleStruct[k].EncId = ReadGuidIndex(metadata,
offs);
offs += offsetguid;
ModuleStruct[k].EncBaseId = ReadGuidIndex(metadata,
offs);
offs += offsetguid;
}
}
//TypeRef
old = tableoffset;
tablehasrows = tablepresent(1);
offs = tableoffset;
if ( debug )
Console.WriteLine("TypeRef Table Offset {0} Size
{1}" , offs , sizes[1]);
tableoffset = old;
if ( tablehasrows )
{
TypeRefStruct = new TypeRefTable[rows[1] + 1];
for ( int k = 1 ; k <=rows[1] ; k++)
{
TypeRefStruct[k].resolutionscope =
ReadCodedIndex(metadata , offs , "ResolutionScope");
offs = offs +
GetCodedIndexSize("ResolutionScope");
TypeRefStruct[k].name = ReadStringIndex(metadata ,
offs);
offs = offs + offsetstring;
TypeRefStruct[k].nspace = ReadStringIndex(metadata
, offs);
offs = offs + offsetstring;
}
}
//TypeDef
old = tableoffset;
tablehasrows
= tablepresent(2);
offs = tableoffset;
if ( debug )
Console.WriteLine("TypeDef Table Offset {0}
Size {1}" , offs , sizes[2]);
tableoffset = old;
if ( tablehasrows )
{
TypeDefStruct = new TypeDefTable[rows[2] + 1];
for ( int k = 1 ; k <= rows[2] ; k++)
{
TypeDefStruct[k].flags = BitConverter.ToInt32
(metadata, offs);
offs += 4;
TypeDefStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
TypeDefStruct[k].nspace = ReadStringIndex(metadata,
offs);
offs += offsetstring;
TypeDefStruct[k].cindex = ReadCodedIndex (metadata
, offs , "TypeDefOrRef");
offs +=
GetCodedIndexSize("TypeDefOrRef");
TypeDefStruct[k].findex = ReadTableIndex(metadata,
offs , "Field");
offs += GetTableSize("Field");
TypeDefStruct[k].mindex = ReadTableIndex(metadata,
offs , "Method");
offs += GetTableSize("Method");
}
}
//FieldPtr
old = tableoffset;
tablehasrows
= tablepresent(3);
offs = tableoffset;
if ( debug )
Console.WriteLine("FieldPtr Table Offset {0}
Size {1}" , offs , sizes[3]);
tableoffset = old;
if ( tablehasrows )
{
FieldPtrStruct = new FieldPtrTable[rows[3] + 1];
for ( int k = 1 ; k <= rows[3] ; k++)
{
FieldPtrStruct[k].index =
BitConverter.ToInt16(metadata, offs);
offs += 2;
}
}
//Field
old = tableoffset;
tablehasrows
= tablepresent(4);
offs = tableoffset;
if ( debug )
Console.WriteLine("Field Table Offset {0} Size
{1}" , offs , sizes[4]);
tableoffset = old;
if ( tablehasrows )
{
FieldStruct = new FieldTable[rows[4] + 1];
for ( int k = 1 ; k <= rows[4] ; k++)
{
FieldStruct[k].flags = BitConverter.ToInt16
(metadata, offs);
offs += 2;
FieldStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
FieldStruct[k].sig = ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//MethodPtr
old = tableoffset;
tablehasrows
= tablepresent(5);
offs = tableoffset;
if ( debug )
Console.WriteLine("Method Table Offset {0}
Size {1}" , offs , sizes[5]);
tableoffset = old;
if ( tablehasrows )
{
MethodPtrStruct = new MethodPtrTable[rows[5] + 1];
for ( int k = 1 ; k <= rows[5] ; k++)
{
MethodPtrStruct[k].index =
BitConverter.ToInt16(metadata, offs);
offs += 2;
}
}
//Method
old = tableoffset;
tablehasrows
= tablepresent(6);
offs = tableoffset;
if ( debug )
Console.WriteLine("Method Table Offset {0}
Size {1}" , offs , sizes[6]);
tableoffset = old;
if ( tablehasrows )
{
MethodStruct = new MethodTable[rows[6] + 1];
for ( int k = 1 ; k <= rows[6] ; k++)
{
MethodStruct[k].rva = BitConverter.ToInt32
(metadata, offs);
offs += 4;
MethodStruct[k].impflags = BitConverter.ToInt16
(metadata, offs);
offs += 2;
MethodStruct[k].flags = (int)BitConverter.ToInt16
(metadata, offs);
offs += 2;
MethodStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
MethodStruct[k].signature = ReadBlobIndex(metadata,
offs);
offs += offsetblob;
MethodStruct[k].param = ReadTableIndex(metadata,
offs , "Param");
offs += GetTableSize("Param");
}
}
//Param
old = tableoffset;
tablehasrows
= tablepresent(8);
offs = tableoffset;
if ( debug )
Console.WriteLine("Param Table Offset {0} Size
{1}" , offs , sizes[8]);
tableoffset = old;
if ( tablehasrows )
{
ParamStruct = new ParamTable[rows[8] + 1];
for ( int k = 1 ; k <= rows[8] ; k++)
{
ParamStruct[k].pattr = BitConverter.ToInt16
(metadata, offs);
offs += 2;
ParamStruct[k].sequence = BitConverter.ToInt16 (metadata,
offs);
offs += 2;
ParamStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
}
}
//InterfaceImpl
old = tableoffset;
tablehasrows
= tablepresent(9);
offs = tableoffset;
if ( debug )
Console.WriteLine("InterfaceImpl Table Offset {0}
Size {1}" , offs , sizes[9]);
tableoffset = old;
if ( tablehasrows )
{
InterfaceImplStruct = new
InterfaceImplTable[rows[9] + 1];
for ( int k = 1 ; k <= rows[9] ; k++)
{
InterfaceImplStruct[k].classindex =
ReadCodedIndex(metadata , offs , "TypeDefOrRef");
offs +=
GetCodedIndexSize("TypeDefOrRef");
InterfaceImplStruct[k].interfaceindex =
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
}
}
//MemberRef
old = tableoffset;
tablehasrows
= tablepresent(10);
offs = tableoffset;
if ( debug )
Console.WriteLine("MemberRef Table Offset {0}
Size {1}" , offs, sizes[10]);
tableoffset = old;
if ( tablehasrows )
{
MemberRefStruct = new MemberRefTable[rows[10] + 1];
for ( int k = 1 ; k <= rows[10] ; k++)
{
MemberRefStruct[k].clas = ReadCodedIndex(metadata ,
offs , "MemberRefParent");
offs +=
GetCodedIndexSize("MemberRefParent");
MemberRefStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
MemberRefStruct[k].sig = ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//Constants
old = tableoffset;
tablehasrows
= tablepresent(11);
offs = tableoffset;
if ( debug )
Console.WriteLine("Constant Table Offset {0}
Size {1}" , offs, sizes[11]);
tableoffset = old;
if ( tablehasrows )
{
ConstantsStruct = new ConstantsTable[rows[11] + 1];
for ( int k = 1 ; k <= rows[11] ; k++)
{
ConstantsStruct[k].dtype = metadata[offs];
offs += 2;
ConstantsStruct[k].parent = ReadCodedIndex(metadata
, offs , "HasConst");
offs += GetCodedIndexSize("HasConst");
ConstantsStruct[k].value = ReadBlobIndex(metadata,
offs);
offs += offsetblob;
}
}
//CustomAttribute
old = tableoffset;
tablehasrows
= tablepresent(12);
offs = tableoffset;
if ( debug )
Console.WriteLine("CustomAttribute Table
Offset {0} Size {1}" , offs , sizes[12]);
tableoffset = old;
if ( tablehasrows )
{
CustomAttributeStruct = new
CustomAttributeTable[rows[12] + 1];
for ( int k = 1 ; k <= rows[12] ; k++)
{
CustomAttributeStruct[k].parent =
ReadCodedIndex(metadata , offs , "HasCustomAttribute");
offs +=
GetCodedIndexSize("HasCustomAttribute");
CustomAttributeStruct[k].type =
ReadCodedIndex(metadata , offs , "HasCustomAttributeType");
offs +=
GetCodedIndexSize("HasCustomAttributeType");
CustomAttributeStruct[k].value =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//FieldMarshal
old = tableoffset;
tablehasrows
= tablepresent(13);
offs = tableoffset;
if ( debug )
Console.WriteLine("FieldMarshal Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
FieldMarshalStruct = new FieldMarshalTable[rows[13]
+ 1];
for ( int k = 1 ; k <= rows[13] ; k++)
{
FieldMarshalStruct[k].coded =
ReadCodedIndex(metadata , offs , "HasFieldMarshal");
offs +=
GetCodedIndexSize("HasFieldMarshal");
FieldMarshalStruct[k].index =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//DeclSecurity
old = tableoffset;
tablehasrows
= tablepresent(14);
offs = tableoffset;
if ( debug )
Console.WriteLine("DeclSecurity Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
DeclSecurityStruct = new DeclSecurityTable[rows[14]
+ 1];
for ( int k = 1 ; k <= rows[14] ; k++)
{
DeclSecurityStruct[k].action = BitConverter.ToInt16
(metadata, offs);
offs += 2;
DeclSecurityStruct[k].coded =
ReadCodedIndex(metadata , offs , "HasDeclSecurity");
offs +=
GetCodedIndexSize("HasDeclSecurity");
DeclSecurityStruct[k].bindex =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//ClassLayout
old = tableoffset;
tablehasrows
= tablepresent(15);
offs = tableoffset;
if ( debug )
Console.WriteLine("ClassLayout Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
ClassLayoutStruct = new ClassLayoutTable[rows[15] +
1];
for ( int k = 1 ; k <= rows[15] ; k++)
{
ClassLayoutStruct[k].packingsize =
BitConverter.ToInt16 (metadata, offs);
offs += 2;
ClassLayoutStruct[k].classsize =
BitConverter.ToInt32 (metadata, offs);
offs += 4;
ClassLayoutStruct[k].parent =
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
}
}
//FieldLayout
old = tableoffset;
tablehasrows
= tablepresent(16);
offs = tableoffset;
if ( debug )
Console.WriteLine("FieldLayout Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
FieldLayoutStruct = new FieldLayoutTable[rows[16] +
1];
for ( int k = 1 ; k <= rows[16] ; k++)
{
FieldLayoutStruct[k].offset = BitConverter.ToInt32
(metadata, offs);
offs += 4;
FieldLayoutStruct[k].fieldindex =
ReadTableIndex(metadata, offs , "Field");
offs += GetTableSize("Field");
}
}
//StandAloneSig
old = tableoffset;
tablehasrows
= tablepresent(17);
offs = tableoffset;
if ( debug )
Console.WriteLine("StandAloneSig Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
StandAloneSigStruct = new
StandAloneSigTable[rows[17] + 1];
for ( int k = 1 ; k <= rows[17] ; k++)
{
StandAloneSigStruct[k].index =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//EventMap
old = tableoffset ;
tablehasrows
= tablepresent(18);
offs = tableoffset;
if ( debug )
Console.WriteLine("EventMap Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
EventMapStruct = new EventMapTable [rows[18] + 1];
for ( int k = 1 ; k <= rows[18] ; k++)
{
EventMapStruct[k].index = ReadTableIndex(metadata,
offs , "TypeDef");
offs += GetTableSize("TypeDef");
EventMapStruct[k].eindex = ReadTableIndex(metadata,
offs , "Event");
offs += GetTableSize("Event");
}
}
//Event
old = tableoffset;
tablehasrows
= tablepresent(20);
offs = tableoffset;
if ( debug )
Console.WriteLine("Event Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
EventStruct = new EventTable[rows[20] + 1];
for ( int k = 1 ; k <= rows[20] ; k++)
{
EventStruct[k].attr = BitConverter.ToInt16
(metadata, offs);
offs += 2;
EventStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
EventStruct[k].coded = ReadCodedIndex(metadata ,
offs , "TypeDefOrRef");
offs +=
GetCodedIndexSize("TypeDefOrRef");
}
}
//PropertyMap
old = tableoffset;
tablehasrows
= tablepresent(21);
offs = tableoffset;
if ( debug )
Console.WriteLine("PropertyMap Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
PropertyMapStruct = new PropertyMapTable[rows[21] +
1];
for ( int k = 1 ; k <= rows[21] ; k++)
{
PropertyMapStruct[k].parent =
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
PropertyMapStruct[k].propertylist =
ReadTableIndex(metadata, offs , "Properties");
offs += GetTableSize("Properties");
}
}
//Property
old = tableoffset;
tablehasrows
= tablepresent(23);
offs = tableoffset;
if ( debug )
Console.WriteLine("Property Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
PropertyStruct = new PropertyTable[rows[23] + 1];
for ( int k = 1 ; k <= rows[23] ; k++)
{
PropertyStruct[k].flags = BitConverter.ToInt16
(metadata, offs);
offs += 2;
PropertyStruct[k].name= ReadStringIndex(metadata,
offs);
offs += offsetstring;
PropertyStruct[k].type = ReadBlobIndex(metadata,
offs);
offs += offsetblob;
}
}
//MethodSemantics
old = tableoffset ;
tablehasrows
= tablepresent(24);
offs = tableoffset;
if ( debug )
Console.WriteLine("MethodSemantics Table
Offset {0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
MethodSemanticsStruct = new
MethodSemanticsTable[rows[24] + 1];
for ( int k = 1 ; k <= rows[24] ; k++)
{
MethodSemanticsStruct[k].methodsemanticsattributes = BitConverter.ToInt16 (metadata, offs);
offs += 2;
MethodSemanticsStruct[k].methodindex =
ReadTableIndex(metadata, offs , "Method");
offs += GetTableSize("Method");
MethodSemanticsStruct[k].association =
ReadCodedIndex(metadata , offs , "HasSemantics");
offs +=
GetCodedIndexSize("HasSemantics");
}
}
//MethodImpl
old = tableoffset;
tablehasrows
= tablepresent(25);
offs = tableoffset;
if ( debug )
Console.WriteLine("MethodImpl Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
MethodImpStruct = new MethodImpTable[rows[25] + 1];
for ( int k = 1 ; k <= rows[25] ; k++)
{
MethodImpStruct[k].classindex =
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
MethodImpStruct[k].codedbody =
ReadCodedIndex(metadata , offs , "MethodDefOrRef");
offs +=
GetCodedIndexSize("MethodDefOrRef");
MethodImpStruct[k].codeddef =
ReadCodedIndex(metadata , offs , "MethodDefOrRef");
offs +=
GetCodedIndexSize("MethodDefOrRef");
}
}
//ModuleRef
old = tableoffset;
tablehasrows
= tablepresent(26);
offs = tableoffset;
if ( debug )
Console.WriteLine("ModuleRef Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
ModuleRefStruct = new ModuleRefTable[rows[26] + 1];
for ( int k = 1 ; k <= rows[26] ; k++)
{
ModuleRefStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
}
}
//TypeSpec
old = tableoffset;
tablehasrows
= tablepresent(27);
offs = tableoffset;
if ( debug )
Console.WriteLine("TypeSpec Table Offset {0}
size={1}" , offs , rows[27]);
tableoffset = old;
if ( tablehasrows )
{
TypeSpecStruct = new TypeSpecTable[rows[27] + 1];
for ( int k = 1 ; k <= rows[27] ; k++)
{
//if ( debug )
//Console.WriteLine("TypeSpec Table Offset
offs={0} k={1} Length={2}" , offs , k , metadata.Length);
TypeSpecStruct[k].signature =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//ImplMap
old = tableoffset;
tablehasrows
= tablepresent(28);
offs = tableoffset;
if ( debug )
Console.WriteLine("ImplMap Table Offset
offs={0} rows={1} len={2}" , offs , rows[28] , metadata.Length);
tableoffset = old;
if ( tablehasrows )
{
ImplMapStruct = new ImplMapTable[rows[28] + 1];
for ( int k = 1 ; k <= rows[28] ; k++)
{
ImplMapStruct[k].attr = BitConverter.ToInt16
(metadata, offs);
offs += 2;
ImplMapStruct[k].cindex = ReadCodedIndex(metadata ,
offs , "MemberForwarded");
offs +=
GetCodedIndexSize("MemberForwarded");
ImplMapStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
ImplMapStruct[k].scope = ReadTableIndex(metadata,
offs , "ModuleRef");
offs += GetTableSize("ModuleRef");
}
}
//FieldRVA
old = tableoffset;
tablehasrows
= tablepresent(29);
offs = tableoffset;
if ( debug )
Console.WriteLine("FieldRVA Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
FieldRVAStruct = new FieldRVATable[rows[29] + 1];
for ( int k = 1 ; k <= rows[29] ; k++)
{
FieldRVAStruct[k].rva = BitConverter.ToInt32
(metadata, offs);
offs += 4;
FieldRVAStruct[k].fieldi = ReadTableIndex(metadata,
offs , "Field");
offs += GetTableSize("Field");
}
}
//Assembley
old = tableoffset;
tablehasrows
= tablepresent(32);
offs = tableoffset;
if ( debug )
Console.WriteLine("Assembly Table Offset
{0}" , offs);
tableoffset = old;
AssemblyStruct = new AssemblyTable[rows[32] + 1];
if ( tablehasrows )
{
for ( int k = 1 ; k <= rows[32] ; k++)
{
AssemblyStruct[k].HashAlgId = BitConverter.ToInt32
(metadata, offs);
offs += 4;
AssemblyStruct[k].major = BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyStruct[k].minor = BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyStruct[k].build= BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyStruct[k].revision = BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyStruct[k].flags = BitConverter.ToInt32
(metadata, offs);
offs += 4;
AssemblyStruct[k].publickey = ReadBlobIndex(metadata,
offs);
offs += offsetblob;
AssemblyStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
AssemblyStruct[k].culture =
ReadStringIndex(metadata, offs);
offs += offsetstring;
}
}
//AssemblyRef
old = tableoffset;
tablehasrows
= tablepresent(35);
offs = tableoffset;
if ( debug )
Console.WriteLine("AssembleyRef Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
AssemblyRefStruct = new AssemblyRefTable[rows[35] +
1];
for ( int k = 1 ; k <= rows[35]; k++)
{
AssemblyRefStruct[k].major = BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyRefStruct[k].minor = BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyRefStruct[k].build= BitConverter.ToInt16
(metadata, offs);
offs += 2;
AssemblyRefStruct[k].revision =
BitConverter.ToInt16 (metadata, offs);
offs += 2;
AssemblyRefStruct[k].flags = BitConverter.ToInt32
(metadata, offs);
offs += 4;
AssemblyRefStruct[k].publickey =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
AssemblyRefStruct[k].name =
ReadStringIndex(metadata, offs);
offs += offsetstring;
AssemblyRefStruct[k].culture =
ReadStringIndex(metadata, offs);
offs += offsetstring;
AssemblyRefStruct[k].hashvalue =
ReadBlobIndex(metadata, offs);
offs += offsetblob;
}
}
//File
old = tableoffset;
tablehasrows
= tablepresent(38);
offs = tableoffset;
if ( debug )
Console.WriteLine("File Table Offset {0}"
, offs);
tableoffset = old;
if ( tablehasrows )
{
FileStruct = new FileTable[rows[38] + 1];
for ( int k = 1 ; k <= rows[38] ; k++)
{
FileStruct[k].flags = BitConverter.ToInt32
(metadata, offs);
offs += 4;
FileStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
FileStruct[k].index = ReadBlobIndex(metadata,
offs);
offs += offsetblob;
}
}
//ExportedType
old = tableoffset;
tablehasrows
= tablepresent(39);
offs = tableoffset;
if ( debug )
Console.WriteLine("ExportedType Table Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
ExportedTypeStruct = new ExportedTypeTable[rows[39]
+ 1];
for ( int k = 1 ; k <= rows[39] ; k++)
{
ExportedTypeStruct[k].flags = BitConverter.ToInt32
(metadata, offs);
offs += 4;
ExportedTypeStruct[k].typedefindex =
BitConverter.ToInt32 (metadata, offs);
offs += 4;
ExportedTypeStruct[k].name =
ReadStringIndex(metadata, offs);
offs += offsetstring;
ExportedTypeStruct[k].nspace =
ReadStringIndex(metadata, offs);
offs += offsetstring;
ExportedTypeStruct[k].coded = ReadCodedIndex (
metadata, offs , "Implementation");
offs +=
GetCodedIndexSize("Implementation");
}
}
//ManifestResource
old = tableoffset;
tablehasrows
= tablepresent(40);
offs = tableoffset;
if ( debug )
Console.WriteLine("ManifestResource Table
Offset {0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
ManifestResourceStruct = new
ManifestResourceTable[rows[40] + 1];
for ( int k = 1 ; k <= rows[40] ; k++)
{
ManifestResourceStruct[k].offset =
BitConverter.ToInt32 (metadata, offs);
offs += 4;
ManifestResourceStruct[k].flags =
BitConverter.ToInt32 (metadata, offs);
offs += 4;
ManifestResourceStruct[k].name = ReadStringIndex(metadata,
offs);
offs += offsetstring;
ManifestResourceStruct[k].coded =
ReadCodedIndex(metadata , offs , "Implementation");
offs += GetCodedIndexSize("");
}
}
//Nested Classes
old = tableoffset;
tablehasrows
= tablepresent(41);
offs = tableoffset;
if ( debug )
Console.WriteLine("Nested Classes Offset
{0}" , offs);
tableoffset = old;
if ( tablehasrows )
{
NestedClassStruct = new NestedClassTable[rows[41] +
1];
for ( int k = 1 ; k <= rows[41] ; k++)
{
NestedClassStruct[k].nestedclass=
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
NestedClassStruct[k].enclosingclass=
ReadTableIndex(metadata, offs , "TypeDef");
offs += GetTableSize("TypeDef");
}
}
}
public bool tablepresent(byte tableindex)
{
int tablebit = (int)(valid >> tableindex)
& 1;
for ( int j = 0 ; j < tableindex ; j++)
{
int o = sizes[j] * rows[j];
tableoffset = tableoffset + o;
}
if ( tablebit == 1)
return true;
else
return false;
}
public int ReadCodedIndex(byte [] metadataarray ,
int offset , string nameoftable)
{
int returnindex = 0;
int codedindexsize =
GetCodedIndexSize(nameoftable);
if ( codedindexsize == 2)
returnindex = BitConverter.ToUInt16 (metadataarray
, offset );
if ( codedindexsize == 4)
returnindex = (int)BitConverter.ToUInt32
(metadataarray , offset );
return returnindex;
}
public int ReadTableIndex(byte [] metadataarray ,
int arrayoffset , string tablename)
{
int returnindex = 0;
int tablesize = GetTableSize(tablename);
if ( tablesize == 2)
returnindex = BitConverter.ToUInt16 (metadataarray ,
arrayoffset );
if ( tablesize == 4)
returnindex = (int)BitConverter.ToUInt32
(metadataarray , arrayoffset );
return returnindex;
}
public int ReadStringIndex(byte [] metadataarray ,
int arrayoffset)
{
int returnindex = 0;
if ( offsetstring == 2)
returnindex = BitConverter.ToUInt16 (metadataarray
, arrayoffset );
if ( offsetstring == 4)
returnindex = (int)BitConverter.ToUInt32
(metadataarray , arrayoffset );
return returnindex;
}
public int ReadBlobIndex (byte [] metadataarray ,
int arrayoffset)
{
int returnindex = 0;
if ( offsetblob == 2)
returnindex = BitConverter.ToUInt16 (metadataarray
, arrayoffset );
if ( offsetblob == 4)
returnindex = (int)BitConverter.ToUInt32
(metadataarray , arrayoffset );
return returnindex;
}
public int ReadGuidIndex (byte [] metadataarray ,
int arrayoffset)
{
int returnindex = 0;
if ( offsetguid == 2)
returnindex = BitConverter.ToUInt16 (metadataarray
, arrayoffset );
if ( offsetguid == 4)
returnindex = (int)BitConverter.ToUInt32
(metadataarray , arrayoffset );
return returnindex;
}
public void DisplayTablesForDebugging()
{
Console.WriteLine("Strings Table:{0}" ,
strings.Length);
for ( int o = 0 ; o < strings.Length ; o++)
{
if ( strings[o] == 0)
{
Console.WriteLine();
Console.Write("{0}:" , o+1);
}
else
Console.Write("{0}" , (char)strings[o]);
}
Console.WriteLine();
Console.WriteLine("Module Table: Records
{0}" , ModuleStruct.Length);
Console.WriteLine("Name={0} {1}",
GetString(ModuleStruct[1].Name) ,
ModuleStruct[1].Name.ToString("X"));
Console.WriteLine("Generation={0} Mvid={1}
EncId={2} EncBaseId={3}" , ModuleStruct[1].Generation ,
ModuleStruct[1].Mvid , ModuleStruct[1].EncId , ModuleStruct[1].EncBaseId);
Console.WriteLine();
Console.WriteLine("TypeRef Table: Records
{0}" , TypeRefStruct.Length );
for ( int o = 1 ; o < TypeRefStruct.Length ;
o++)
{
Console.WriteLine("Type {0}", o );
Console.WriteLine("Resolution Scope={0}
{1}" , GetResolutionScopeTable(TypeRefStruct[o].resolutionscope),
GetResolutionScopeValue(TypeRefStruct[o].resolutionscope));
Console.WriteLine("NameSpace={0} {1}" ,
GetString(TypeRefStruct[o].nspace) ,
TypeRefStruct[o].nspace.ToString("X"));
Console.WriteLine("Name={0} {1}" ,
GetString(TypeRefStruct[o].name),
TypeRefStruct[o].name.ToString("X"));
}
Console.WriteLine();
Console.WriteLine("TypeDef Table: Records {0}" , TypeDefStruct.Length );
for ( int o = 1 ; o < TypeDefStruct.Length ;
o++)
{
Console.WriteLine("Type {0} ", o) ;
Console.WriteLine("Name={0} {1}" ,
GetString(TypeDefStruct[o].name), TypeDefStruct[o].name.ToString("X"));
Console.WriteLine("NameSpace={0} {1}" ,
GetString(TypeDefStruct[o].nspace) ,
TypeDefStruct[o].nspace.ToString("X"));
Console.WriteLine("Field[{0}]",
TypeDefStruct[o].findex);
Console.WriteLine("Method[{0}]",
TypeDefStruct[o].mindex);
}
}
public string GetString (int starting)
{
int ending = starting;
starting , strings.Length);
if ( starting < 0)
return "";
if ( starting >= strings.Length )
return "";
while (strings[ending] != 0 )
{
ending++;
}
System.Text.Encoding e = System.Text.Encoding.UTF8;
string
returnstring = e.GetString(strings, starting , ending - starting );
if ( returnstring.Length == 0)
return "";
else
return returnstring;
}
public string GetResolutionScopeTable (int rvalue)
{
string returnstring = "";
int tag = rvalue & 0x03;
if ( tag == 0 )
returnstring = returnstring +
"Module" ;
if ( tag == 1 )
returnstring = returnstring +
"ModuleRef" ;
if ( tag == 2 )
returnstring = returnstring +
"AssemblyRef" ;
if ( tag == 3 )
returnstring = returnstring +
"TypeRef" ;
return returnstring;
}
public int GetResolutionScopeValue (int rvalue)
{
return rvalue >> 2;
}
}
Strings Table:96
1:<Module>
10:a.exe
16:mscorlib
25:System
32:Object
39:zzz
43:Main
48:.ctor
54:System.Diagnostics
73:DebuggableAttribute
93:a
95:
96:
Module Table: Records 2
Name=a.exe A
Generation=0 Mvid=1 EncId=0
EncBaseId=0
TypeRef Table: Records 3
Type 1
Resolution
Scope=AssemblyRef 1
NameSpace=System 19
Name=Object 20
Type 2
Resolution
Scope=AssemblyRef 1
NameSpace=System.Diagnostics
36
Name=DebuggableAttribute 49
TypeDef Table: Records 3
Type 1
Name=<Module> 1
NameSpace= 0
Field[1]
Method[1]
Type 2
Name=zzz 27
NameSpace= 0
Field[1]
Method[1]
----------------------------------28th
Feb
The program above is one of the largest to date and
is the last program before we start writing our disassembler. We have added a
function ReadTablesIntoStructures that reads all the table data into
structures. All that we have done so far is read the stream called #~ into an
array metadata.
The 24th byte of this stream contains a
series of ints that tell us the number of rows. As we are ignorant of how many
tables we have that contain rows, the start of the actual table data is unknown
to us. The size of each table is stored in the sizes array. Lets see how we
read the table data by specifically looking at the module and typeref tables.
We start with initializing the variable old to
tableoffset that has a value that is set to the first byte of the table data.
In the function ReadStreamsData we kept increasing its value by 4 so that at
the end of the function it pointed to the start of the table data. We then call
a function called tablepresent.
This function returns a true or a false depending
upon whether the table number passed as a parameter has rows or not. The first
that we do is find out the bit representing the table by shifting that many
bits to the right and bit wising anding by one. All this is not new as we
explained it earlier.
Depending on whether this bit is 1 or 0, we return
a true or a false. The assumption is that if the table is present the table has
rows. Another way would be to check the value of the corresponding rows
parameter, it is zero, then return false else return true.
The most important thing that this function does is
changed the value of the variable tableoffset so that it now points to the
start of the data for this table number passed as a parameter and not the start
of the table data. We use a for loop to loop through all the tables but the one
we are passed as a parameter.
We know that the number of records is stored in the
rows array and the size in the sizes array. We multiply the two to give us the
number of bytes occupied by this table and add this value stored in variable o to
the tableoffset variable.
This way at the end of the loop, the tableoffset
parameter now contains the position of the first byte of data for this table
number. We then store the value of the tableoffset variable into the offs
variable and use this variable from now on. The WriteLine function use is to
display debugging information.
We can only read table data if it the table has
records and if the tablehasrows variable is true the rest of the code gets
called. The tableoffset variable is set again to its original value stored in
old that is the start of table data. This is why we need the variable offs to
read the data of each table.
The problem with
arrays is that they start at 0 and not one. This makes program a little
counter intuitive. We prefer languages that start arrays at one and not zero.
We need to store the module table data in a structure. Thus we created a
structure tag called ModuleTable that has five members or fields.
The second field as explained before a index into
the string field. This field can either be 2 or 4 bytes. As we do not know in
advance we will take the larger and hence the name field is an int. The first
field is the Generation field and even though we know that it is short, we have
yet made its data type a int, as we are not concerned with questions of
efficiency.
All that will happen is that our programs run a
little slower, but at least you understand what is happening. Remember our job
is to help you write a disassembler, not a fast efficient disassembler. We then
create an array of structures ModuleTable that look like ModuleStruct as an
instance variable.
In this function we actually create the structure
ModuleTable passing it the number of rows stored in the rows array plus one.
The reason we add one is that we will store all the data starting at member
ModuleTable[1] and not member ModuleTable[0]. Thus we have created a global
structure for the module table and an array of structures as an instance
variable.
We thus for each table. We then in a for loop read
that many records. We start the count at 1 and not 0 as the first array member
we initialize starts at 1 and not zero. Thus ModuleTable[0] will always be
null. The generation field is always a short and thus we use the ToUint16
function to set its value.
The name field could be either 2 or 4 bytes large.
Thus we use a function ReadStringIndex to fetch its value. This function simply
looks at the value of variable offsetstring and if it is 2 uses th ToUint16
function to read of the array or otherwise the ToUint32 function.
The offs variable needs to be set to the next field
each time and for the generation field it is always 2 and for the name field we
use the value of the offsetstring variable. The Guid and Blob indexes are read
using the same concept and the functions ReadGuidIndex and ReadBlobIndex do the
job.
The TypeDef table has some problems for the field
called cindex as it is a coded index field. Thus we have a function
ReadCodedIndex that takes an extra parameter which is the name of the coded
index. This function uses another GetCodedIndexSize that we had learned earlier
to figure out the size of the coded index.
We use the return value to either read 2 or 4 bytes
from the array. The offset variable is increased by 2 or 4 depending upon the
return value of the GetCodedIndexSize function. The next field carries an
offset into a table called field. We pass this table name as the last parameter
to the function ReadTableIndex.
We reuse the function GetTableSize that we did
earlier and once again depending upon the value returned read 2 or 4 bytes.
Ditto for the offs variable, it is also set to the return value of this
function. Thus we have never tried to assume the size of a field unless the
specification very clearly says a constant number of bytes.
Thus we have now filled up a series of arrays of
structures with table data from the metadata array, bearing in mind that the
first record is array dimension 1 and not 0.
Lets now actually display some table data using the second function we
have added DisplayTablesForDebugging. In succeeding programs we simply comment
out this function.
We would first like to display the stream called
#Strings and then the first three tables. The array called strings contains the
#Strings stream and we start by displaying its length. We then display each
byte a char by moving through the strings array one byte at a time.
We know that each string is null terminated and
hence when an array member is a 0, we know that it is the end of a string. We
write out a new line and then display the corresponding offset of the string in
the array. This table is very useful in manually reading the string offsets
that are stored in the tables.
The first table we display is the Module table. We start
by displaying the length of the array object ModuleStruct and we get two in
spite of the fact that the module table can only have one record. If you have
forgotten, we have added one to the size of the array.
Thus to display the Name field, we use the member
ModuleStruct [1].Name and not ModuleStruct[0].Name. The problem is that we see
a hex number that is an offset into the strings heap. Thus we have written a
function GetString that fetches the string at the offset specified by the Name
field.
This function first needs to figure out the length
of the string. We have the string as a series of bytes in an array and need to
convert these bytes into a string. Thus we need to know where the last byte of
the string is. But first, some error checks.
If the value of the parameter passed for some
reason is negative or larger than the size of the strings array we return an
empty string. We then scan though the strings array starting from the beginning
of the string as we have initialized the variable ending that we are using as
the array parameter to the value of the start of the string stored in the
parameter passed starting.
When we exit the while loop on reaching the null
character, the ending variable will tell us where the string ends. We then use
the same old GetString function that we did earlier. The only difference is
that the second parameter is the start of the string in the strings array and
is the value of the variable starting.
The length of the string is the difference between
the end of the string and beginning and thus is the difference between the two
values ending and starting. Once again if the Length of the string is zero we
return an empty string. For the module table we know that we have only one
record and hence we used no loop.
The TypeRef and TypeDef structures can have as many
records as they like. The first field is called the Resolution scope field and
is a coded index. This coded index can take one of 4 values. If we display the
coded index, it makes no sense to use and hence we have two functions,
GetResolutionScopeTable that returns the table name and the other
GetResolutionScopeValue that gives us the row in the table.
This coded index has three bits to store the table
number and thus we first bitwise and with 3 as two bits are used for coding the
table. Then we check the value of the variable and return the appropriate table
name. In our case as the second bit is on, we get a value of AssemblyRef.
The GetResolutionScopeValue function is easier as
we know that two bits are used to store the table number, we simply right shift
the int by 2, throwing off the table number bits and returning this value. The
bits that move in from the left are 0 and thus do not effect the answer.
You could display the data from all the other
tables stored in structures in the same way if you like.
Program7.csc
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
}
public void ReadandDisplayVTableFixup ()
{
if ( vtablerva != 0)
{
long save ;
long position = ConvertRVA(vtablerva) ;
if ( position == -1)
return;
mfilestream.Position = position;
Console.WriteLine("// VTableFixup
Directory:");
int count1 = vtablesize/8;
for ( int ii = 0 ; ii < count1 ; ii++)
{
int fixuprva = mbinaryreader.ReadInt32();
Console.WriteLine("// IMAGE_COR_VTABLEFIXUP[{0}]:" , ii);
Console.WriteLine("// RVA:
{0}",fixuprva.ToString("x8"));
short count = mbinaryreader.ReadInt16();
Console.WriteLine("// Count: {0}", count.ToString("x4"));
short type = mbinaryreader.ReadInt16();
Console.WriteLine("// Type: {0}",
type.ToString("x4"));
save = mfilestream.Position;
mfilestream.Position = ConvertRVA(fixuprva) ;
int i1 ;
long val = 0 ;
for ( i1 = 0 ; i1 < count ; i1++)
{
if ( (type&0x01) == 0x01)
val = mbinaryreader.ReadInt32();
if ( (type&0x02) == 0x02)
val = mbinaryreader.ReadInt64();
if ( (type&0x01) == 0x01)
Console.WriteLine("// [{0}] ({1})",i1.ToString("x4") ,
val.ToString("X8"));
if ( (type&0x02) == 0x02)
Console.WriteLine("// [{0}] (
{1})",i1.ToString("x4") ,
(val&0xffffffff).ToString("X"));
}
mfilestream.Position = save;
}
Console.WriteLine();
}
}
}
e.il
.class public a11111
{
.method public static void adf() cil managed
{
.entrypoint
}
.method
public int64 a1() cil managed
{
}
.method
public int64 a2() cil managed
{
}
.method
public int64 a3() cil managed
{
}
.method
public int64 a4() cil managed
{
}
.method
public int64 a5() cil managed
{
}
.method
public int64 a6() cil managed
{
}
.method
public int64 a7() cil managed
{
}
}
.vtfixup [1] int32 at D_00008010
.vtfixup [1] int32 fromunmanaged at D_00008020
.vtfixup [1] int64 at D_00008030
.vtfixup [1] int64 fromunmanaged at D_00008040
.vtfixup [0] int64 at D_00008050
.vtfixup [2] int64 int64 at D_00008060
.data D_00008010 = bytearray ( 01 00 00 06)
.data D_00008020 = bytearray ( 02 00 00 06)
.data D_00008030 = bytearray ( 03 00 00 06)
.data D_00008040 = bytearray ( 04 00 00 06)
.data D_00008050 = bytearray ( 05 00 00 06)
.data D_00008060 = bytearray ( 06 00 00 06 00 00 00
00 07 00 00 06 00 00 00 00)
Output
// VTableFixup Directory:
// IMAGE_COR_VTABLEFIXUP[0]:
// RVA: 00004000
// Count: 0001
// Type: 0001
// [0000] (06000001)
// IMAGE_COR_VTABLEFIXUP[1]:
// RVA: 00004004
// Count: 0001
// Type: 0005
// [0000] (06000002)
// IMAGE_COR_VTABLEFIXUP[2]:
// RVA: 00004008
// Count: 0001
// Type: 0002
// [0000] (
6000003)
// IMAGE_COR_VTABLEFIXUP[3]:
// RVA: 0000400c
// Count: 0001
// Type: 0006
// [0000] ( 6000004)
// IMAGE_COR_VTABLEFIXUP[4]:
// RVA: 00004010
// Count: 0000
// Type: 0002
// IMAGE_COR_VTABLEFIXUP[5]:
// RVA: 00004014
// Count: 0002
// Type: 0002
// [0000] (
6000006)
// [0001] (
6000007)
We start this program by letting you on the bad
news. If you really want to write the worlds best disassembler, the only way to
do it is by learning IL. IL is the new machine language of the .Net world and
all code whether in C# or managed C++ or VB.Net gets converted into IL and then
machine language.
What bytecodes is to Java, IL is to the .Net world.
Thus from now on all our code will be in IL and we will explain IL as we go
along. Also the dissembler output is also in IL. Lets start with the file e.il
first. IL is a lot like any programming language and instead of writing class
we write .class.
This is followed by the access modifier and the
name of the class. A class carries methods and the .method defines a method,
followed by the access modifiers, then the static keyword if it is a static
function. Then we have the return value and the name of the function.
Then we end with an optional keyword il managed
which means that the code is written by us using IL and it is managed and not
unmanaged. Code that is managed follows all the rules of the .Net world and is
save and trusted. In C# there was one function that was the first one to get
executed and was called Main.
In IL the name of the function can be whatever we
like but must have the directive .entrypoint. In the same vein we create seven
more function ranging from a1 to a7. Then we have a directive that our program
focuses on namely .vtfixup. Lets first understand what this directive is all
about.
If we write a C++ program, we may never write such
a directive, the compiler that generates IL code is what writes out these
directives. The main point to be noted is that if il does not have a directive
or instruction to accomplish something, the programming language will also not
have such a feature.
All code written in any .Net language is always converted
into IL. The reason why the .vtfixup directive was introduced was because we
wanted to call from unmanaged code into managed code. As mentioned earlier
unmanaged code was written in languages like C, C++ where no error checks were
done by the system.
The language could read and write any memory
locations and was very easy to introduce a virus. Managed code has tons of
checks built and it is next to impossible or so says Microsoft that we can add
arbitrary code and have it executed. One of the reasons why .net code is slower
than the older conventional code is that lots of checks happen before your code
is stamped ready to run.
Some operating systems may need to follow a series
of steps to transition from unmanaged code to managed code. The other reason is
that the data types of the parameters may need to be converted from one type to
another for the call. This conversion is called data marshalling.
We specify all this using the .vtfixup directive.
If we forget to mention, in any machine language, anything that begins with a
dot is called a directive. This directive may appear a million times but has to
appear at the top level of a IL file. A top level means that it cannot
appear inside of a directive like a class or namespace directive.
After the directive .vtfixup we have with square
brackets a number that tells is how many entries are present in the vtable. We
will come to the concept of a vtable a little later. Coming back to our IL file
We have six .vtfixup directives. Four of them have 1 in square brackets that
tells us that they have only 1 vtable entry. The last vtfixup directive has two
vtable entries and finally the fourth has none.
This number is followed by the width of each slot
in the vtable. The attributes or values that can be used are int32 and int64
and they are mutually exclusive. If we do not specify one int32 is the default.
A slot is made up of a 32 bit metadata token. This token starts with the number
of the table called MethodDef that is 6 and then a 24 bit number specifying the
row within the table.
This is how we use 32 bits to represent a table and
row with a table. These entities are called meta data tokens. If the slot is 64
bits, the remaining 32 bits are zeroed out. It is the job of the system to
figure out a pointer to the function
that is represented by the metadata token and convert it into the same width as
the slot.
The only other attribute that can be used along
with int32 and int64 is fromunmanaged. This attribute tells the CLI to first
generate a piece of code called the thunk that will convert the unmanaged
method call to a managed call. Then it will call the actual method and return
the result of the managed call to the unmanaged code.
The thunk will obviously convert the parameter type
or do data marshalling and also follow the rules of calling code that the
platform it is running on wants it to do. This is followed by the block address
that contains the metadata token. Thus we specify a block address of D_00008010
for the first directive.
We now have to create such a block and use the
.data directive to do so and specify some data to be present in this block
using the bytearray keyword. The value we specify is the metadata token. As the
last vtfixup directive is 64 bits, the metadata token has 4 bytes of zeroes.
The ilasm syntax however does not specify a
specific way of creating these tables of tokens and simply recommends that we
use the data directive as we have. Once again to summarize, the .vtfixup
directive tells us that if we go to a memory location we shall find a table.
This table contains metadata tokens that represent method names that will be
converted into method pointers.
It is the job of the system or CLI to do the above
conversion automatically when the loader loads the file into memory which being
executed. All that we specify is the number of entries in the table or vtable,
the kind of method pointer and the width of the entry of each slot and finally
where the table is located in memory.
The programming language C++ introduced a concept
of virtual functions where the object calling the function could be of the base
type but the function called would be of the derived type. This was possible
only if the base class type object was initialized to the derived type object.
To achieve this the compiler and not the linker or
runtime created a table in memory with the address of the virtual functions
called the vtable. This also meant that languages like C++ chose not to follow
a common type system runtime model. Finding the correct v-table slot and
calling the code represented by that slot was handled by the compiler.
The compiler simply placed the latest address of
the function in the v-table. It is the vtfixup directive that contains the location
and size of the vtable fixups and they have to be present in a read write part
of the PE file. These vtable slots are all contiguous and must be of the same
size.
It is at run time that the Loader will convert
every table entry into a pointer to machine code for the CPU. This code can
also be called directly. This structure has 3 members, the first a int that
contains the RVA of the table, then followed by two shorts that tell us the
count or number of vtable entries and finally the type of entry that can be
int32 or int64 followed by unmanaged.
There is a fourth type called call most derived
that we were not able to simulate as it deals with virtual methods. Lets now
look at the actual code that lets us display the vtfixup directive as ildasm does.
We start by adding a function ReadandDisplayVTableFixup to the
abc function.
In the CLR header there was a data directory entry
that specified the RVA and size of the vtable fixups table and we used the
instance variables vtablerva and vtablesize to store the rva and size
respectively. We can only display the fixups if the variable vtablerva is not
zero.
We then use the ConvertRVA function to convert the
rva o a physical size on disk and then set the Position property of the
FileStream object to its value. We however first build a error check to make
sure that the position variable does not have a value of zero.
Some time ago we mentioned that the vtable fixups
structure is 8 bytes and contains three fields. Thus the count1 variable tells
us how many vtfixup entries there are. These structures are stored back to back
and thus in the for loop we iterate count1 number of times. We first read the
rva of where the table is loaded.
Then we have the number of vtable slots or entries
that we store in the variable count and finally the type of the entries. In the
square brackets we simply display the value of the loop variable ii. We then
save the file pointer position in the variable save and then jump to the actual
vtable location that is stored in the first field fixuprva and using the
ConvertRVA function.
Here is stored the actual meta data tokens whose
count is the second field. As we are moving the file pointer we save it in the
variable old and at the end of the loop we restore the file pointer again as we
need to read the next vtfixup structure. The actual vtable entries are stored
in a different place than the vtfixup structures.
In the for loop we loop depending upon the vtable
slots and depending upon the type of entries, we either read a int or a long.
Then we display the loop variable as well as the metadata token. Here we need
to format the display depending upon whether it is a 32 bit or 64 bit slot. We
finally end with a new line.
Program8.csc
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
}
public void ReadandDisplayExportAddressTableJumps
()
{
Console.WriteLine("// Export Address Table
Jumps:");
if ( exportaddressrva == 0)
{
Console.WriteLine("// No data.");
Console.WriteLine();
}
}
public void DisplayModuleRefs()
{
if (ModuleRefStruct == null )
return;
for ( int ii = 1 ; ii < ModuleRefStruct.Length ;
ii++)
{
string dummy = GetString(ModuleRefStruct[ii].name);
Console.WriteLine(".module extern {0}
/*1A{1}*/" , dummy , ii.ToString("X6"));
}
}
e.il
.module extern kernel32.dll
.module extern dd.dll
.class zzz
{
.method public static void adf() cil managed
{
.entrypoint
}
}
Output
// Export Address Table
Jumps:
// No data.
.module extern kernel32.dll
/*1A000001*/
.module extern dd.dll
/*1A000002*/
An extremely short program but we have added two
function calls to the abc function. The first is called
ReadandDisplayExportAddressTableJumps that reads the export address table. This
data directory entry is always zero as mandated by the specifications. The
second displays all the modules that contain functions that we reference from
our code.
The .module extern directive is extremely
unintelligent and simply fills up the ModuleRef table which has a number 1a
with a record for each directive. This table is the smallest possible table
with only one field the module name that is an index into the String heap.
The index field into the String heap follows some
basic rules that we will not repeat again. The name must point to a non null
string in the heap. The system does not bother to check at assemble time
whether the dll actually exists or not. The name has a restriction depending
upon MAX_PATH_NAME.
As it stands for the name of a file its format is
filename.extension. There are no drive letters or directory names or colons or
any form of slashes. There should be no duplicate rows. This is however a
warning only and the name that we find in this table must also be present in
the File table that we will come to later.
This file table entry is used to locate the actual
file that hold or contains the module. If the ModuleRef table has no records,
the ModuleRefStruct has a value of null. If a variable holds such a value, it
means that we have not instantiated the object and hence are not allowed to
reference any members of this array.
In the for loop the count variable starts from 1
and goes on to the length of the array using the field Length that every array
contains. If the array variable ModuleRefStruct is not initialized, an
exception will be thrown.
Thus as a policy at the start of every function, we
check whether the instance array variables we are using in the function are
instantiated or not. If not, it only means that the corresponding table is empty
and therefore we bail out of the function.
All that we do in the for loop is display the
string represented by the name index using the GetString function. We also
display the record number and the table name within comments.
Program9.csc
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
}
public void DisplayAssembleyRefs ()
{
if (AssemblyRefStruct == null)
return;
for ( int i = 1 ; i < AssemblyRefStruct.Length ;
i++)
{
Console.WriteLine(".assembly extern /*23{0}*/
{1}", i.ToString("X6") , GetString(AssemblyRefStruct[i].name));
Console.WriteLine("{");
if (AssemblyRefStruct[i].publickey != 0)
{
Console.Write(" .publickeytoken = (");
DisplayFormattedColumns(AssemblyRefStruct[i].publickey,22
, false);
}
if ( AssemblyRefStruct[i].hashvalue != 0 )
{
Console.Write(" .hash = (");
DisplayFormattedColumns(AssemblyRefStruct[i].hashvalue,11
, false);
}
int rev = AssemblyRefStruct[i].revision;
if ( rev < 0)
rev = 65536 + rev;
Console.WriteLine(" .ver {0}:{1}:{2}:{3}" , AssemblyRefStruct[i].major,
AssemblyRefStruct[i].minor,AssemblyRefStruct[i].build,rev);
The hash directive is similar to the publickeytoken
directive. We next come to the ver directive. The fields major, minor, build
are fields in the AssemblyRef table. The revision field needs some special
handling.
if ( AssemblyRefStruct[i].culture != 0 )
{
Console.Write(" .locale = (");
int index = AssemblyRefStruct[i].culture;
int cnt = 0;
while ( strings[index] != 0)
{
Console.Write("{0} 00 " , strings[index].ToString("X"));
cnt++;
index++;
}
int nos = 64 - (14 + 6 + (6*cnt+1));
Console.Write("00 00");
Console.Write(" ){0}// " ,
CreateSpaces(nos));
index = AssemblyRefStruct[i].culture;
while ( strings[index] != 0)
{
Console.Write("{0}." ,
(char)strings[index]);
index++;
}
Console.WriteLine("..");
}
Console.WriteLine("}");
}
}
public void DisplayFormattedColumns ( int index ,
int startingspaces , bool putonespace)
{
int howmanybytes,uncompressedbyte ;
howmanybytes = CorSigUncompressData (blob , index ,
out uncompressedbyte);
index = index + howmanybytes;
byte [] blobarray = new byte[uncompressedbyte];
Array.Copy(blob , index , blobarray , 0 ,
uncompressedbyte);
int maincounter = 0;
while (maincounter < uncompressedbyte)
{
string firststring = "";
string secondstring = "";
int counterforascii = 0;
int ii ;
bool noascii = false;
for (counterforascii = 0 ; counterforascii < 16
; counterforascii++)
{
if ( maincounter == uncompressedbyte )
break;
if ( blobarray[maincounter] >= 0x20 && blobarray[maincounter]
<= 0x7e)
noascii = true;
maincounter ++;
}
maincounter -= counterforascii;
for ( ii = 0 ; ii < 16 ; ii++)
{
if ( maincounter == uncompressedbyte)
break;
firststring = firststring +
blobarray[maincounter].ToString("X2") + " ";
if ( blobarray[maincounter] >= 0x20 &&
blobarray[maincounter] <= 0x7e)
secondstring = secondstring +
(char)blobarray[maincounter];
else
secondstring = secondstring + ".";
maincounter ++;
}
if (maincounter == uncompressedbyte)
{
int leftovers = maincounter% 16;
if ( leftovers != 0)
{
leftovers = 15 - leftovers;
firststring = firststring + ")" ;
int space = leftovers*3 + 3;
if (noascii)
firststring = firststring + CreateSpaces(space);
else
firststring = firststring + " ";
}
else
{
firststring = firststring + ")";
if ( putonespace && uncompressedbyte <=
16)
firststring = firststring + " ";
firststring = firststring + CreateSpaces(0);
}
}
if ( maincounter == uncompressedbyte )
{
if ( noascii )
firststring = firststring + " // " +
secondstring;
}
else
if (noascii)
firststring = firststring + " // " + secondstring;
Console.WriteLine(firststring);
if ( maincounter != uncompressedbyte )
Console.Write(CreateSpaces(startingspaces));
}
}
public string CreateSpaces (int howmanyspaces)
{
string returnstring = "";
for ( int j = 1 ; j <= howmanyspaces ; j++)
returnstring = returnstring + " ";
return returnstring ;
}
public int CorSigUncompressData ( byte [] blobarray
, int index , out int answer)
{
int howmanybytes = 0;
answer = 0;
if ( (blobarray[index] & 0x80) == 0x00)
{
howmanybytes = 1;
answer = blobarray[index];
}
if ( (blobarray[index] & 0xC0) == 0x80)
{
howmanybytes = 2;
answer = ((blobarray[index] & 0x3f) <<8 )
| blobarray[index+1];
}
if ( (blobarray[index] & 0xE0) == 0xC0)
{
howmanybytes = 3;
answer = ((blobarray[index] & 0x1f) <<24
) | (blobarray[index+1] << 16)
| (blobarray[index+2] << 8) |
blobarray[index+3];
}
return howmanybytes;
}
}
e.il
.assembly extern zzz
{
.ver 1:2:3:4
.hash = ( 1 2 3 a b)
.locale "vijay "
.publickeytoken = ( 01 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16)
.publickey = ( 01 2 3 4 5 6 7 8 9 10 11)
}
.class zzz
{
.method public static void adf() cil managed
{
.entrypoint
}
}
Output
.assembly extern
/*23000001*/ zzz
{
.publickeytoken = (01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
16 )
.hash = (01 02 03 0A 0B )
.ver 1:2:3:4
.locale = (76 00 69 00 6A 00 61 00 79 00 20 00 00 00 ) // v.i.j.a.y. ...
}
.assembly extern
/*23000002*/ mscorlib
{
.ver 0:0:0:0
}
In this program we now display all the assembly
refs. We will come to the difference between a assembly and a module a little
later. A assembly contains lots of files. We use the assembly to mediate
accesses to these files from other assemblies.
Any code that we access from an assembly should
contain a assembly extern directive stating that these assemblies will be
referenced by code following. This directive is also a top level directive.
The as clause is optional and all that it does is
provides an alias to the assembly name, if and only if there are two assembly
extern directives that have the same name. If there is no clash like in our
case then the as clause is ignored.
The assemblies that have the same name must differ
in version, culture etc that we will explain in a short while. There are a
million things that we have not implemented in our disassembler and one of them
is the as clause.
The reason we did not was that if we have more than
one assembly ref with the same name, the alias name is not stored in the
AssemblyRef table. All that happens is that the name of the assembler gets a
underscore and a number 1 or 2 added to it.
Thus we have to first sort the records in the
Assembly table and if there are duplicates we call them in our case zzz_1,
zzz_2 etc. We have left the above as an exercise for you, the reader. If we
added everything that ildasm does, you would not be reading this book, we would
yet be writing the disassembler.
The name of the assembler ref is called the dotted
name and it obviously must match the name of the assembler as specified in the
.assembly directive in that assembly. However for some reason ilasm does not
bother to even check whether the assembler zzz actually exists or not.
But as per the specs, the name should match in a
case sensitive way even though the operating system or file system is like DOS
that is case insensitive. If you see the output, there is a reference to the
assembly mscorlib. We have not referred to this assembly in any way in our IL
file.
What we forget was that every class in the IL world
is derived from object, whether we specify it or not. As the class zzz is
derived from nothing, it gets derived from object which like all basic types is
present in mscorlib.dll. Thus the il assembler always adds mscorlib as a
assembly reference.
Lets look at the individual directives present
within the assembly ref directive which are similar to the assembly directive.
The assembly directive creates the manifest so that the assembly extern can
refer to it. The first directive we touch on is the ver directive or the
version number of the assembly.
This is a series of four 32 int’s or 32 bit
integers separated by colons. This version number is decided at compile time
and is used to bind this assembly to all parts in the compiled module that need
to refer to it. The specs however do not specify how the version numbers should
be used.
The implementation may decide to ignore the version
numbers or may check that the version numbers are the same or match precisely
when referring to or binding to the assembly. Normally or by convention the
first number is the main version number.
Thus if the assembly extern has a version number of
1 and the assembly with the same name has the version number of 2, they are to
be considered different assemblies and are not interchangeable. This makes
sense as when we change the major version number we are saying that a this is a
major upgrade.
Also by convention larger version numbers should be
compatible with smaller version numbers so that existing code does not break
and backward compatibility is maintained.
The other way around is not true.
The second number is the minor version number and
assemblies that differ only in the second number are said to refer to the same
assembly. This means that the assembly has changed in a significant way but
backward compatibility has been maintained. This is also called a point
release.
The third number changing is called the revision
number and are meant to be fully compatible if this is the only number that has
changed. The revision number changes to fix some bugs or a major security hole.
The last one is the build number and if it changes, then the source code has
remained the same but a recompilation was carried out.
This was because the processor changed, or some
compiler option was changed or was compiled for a new operating system. The
first major version number means that lots of changes have taken place in the
source code, the last is at the other extreme, no changes took place in the
source.
The above rules are only a convention that may or
may not be carried. After all who is to decide whether the changes made are big
enough to change the major or minor or revision version number. At times the
toss of a coin is the best way to decide. Heads change the major version, tails
the minor and if the coin gets lost as is normally the case the revision
number.
The next field we will look at is what most of the
world including is lacks, culture. The specs say that instead of using the
directive locale, the Microsoft implementations will use culture instead. The
culture directive uses a string that denotes the culture and this is called a
QString.
Culture locale etc have to do with the language
used for storing strings and is part of the internationalization of software.
This is also called i18n as there are 18 characters between I and n of
internationalization. By specifying a culture we are saying that this assembly
has been customized or uses a certain culture.
The specs have a doc file called Partition IV that
specify the culture strings that can be used. The metadata specs are part of
what is called Partition 2. The class System.Globalization.CultureInfo is
what is the class responsible for understanding different cultures.
The culture strings are meant to be case insensitive.
For some reason the Microsoft implementation uses
.locale rather than .culture. These culture rules follow the internet standard
that is called a RFC or request for comments. The RFC is 1766. The format of
the culture names should be language-country/region and not what we have used.
The language should follow the two character code
that every country has been given by another international standard ISO 639-1.
This is the same codification that domain names use on the internet. The
country/region code is also an uppercase two letter code that is specified by
the ISO standard 3166.
The third directive is the hash algorithm used.
Hashing is a very known concept in the computer world and the bytes after the
hash directive specify the hash value of the algorithm. The VES or Virtual
Execution system will first calculate the hash value and then only access the
file. If the hash value does not match, an exception will be thrown.
When the created the assembly, we used the .hash
algorithm where we specified the hashing method to be used. We will come back
to it when we do the assembly table. The publickey token is the last directive
that can be placed in the assembly ref directive.
This directive is used to store the lower 8 bytes
of the SHA1 hash of the originators public key. The assembly reference can
store either the full public key or only the lower 8 bytes as the
publickeytoken directive does. We can use either of them.
We need the above to validate that the same private
key that was used to sign the assembly at the time of compilation is now being
used to sign at runtime. It is not mandatory that any of the two directives
.publickey or .publickeytoken must be present and if present, only one is
useful not both.
To conform to the rules of .Net, this validation is
not a must. But if it does, and the validation fails, it may not load the
assembly. As an added caveat, the system may refuse access to an assembly
unless the assembly ref specifies the correct publickey or publickeytoken
directive.
The full key is always cryptographically safe but
the converse is that it requires more storage space in all assembly references
as it is larger.
We add jut one more function call DisplayAssembleyRefs
to the abc function. It is this function that
displays all the assembly refs or the assembly extern directive. Every
assembly extern directive that we write gets one row in the AssemblyRef table.
Thus the first thing we check is whether we have an records in the
AssemblyRefStruct array.
We iterate as always the relevant array starting
the loop counter from one and up to the Length of the array. For most of these
tables we display the table number followed by the name of the assembly stored
in the name field and using the GetString function.
The order of directives is important and the first
one to be seen is the publickeytoken directive. This directive is not mandatory
and hence we check if the filed has a value, If and if it does we display the
string publickeytoken. We then use a function DisplayFormattedColumns that
displays its value.
We will come to this function in a short while as
other directives also use it. We first store the field revision in a variable
called rev. If the value of variable rev is less than 0 we add 65536 or 2^16 to
it and then display it. The last field we need to display is the culture or
locale field.
The culture field is a index into the strings heap
that we store in a variable called index. We know that any index into the
strings heap will be null terminated and this is used as the condition in the
while loop. We do three things in the while loop, we first display the ascii
value in hex of the string.
Unfortunately the string bytes must be displayed as
Unicode where every character is 16 bytes wide. Thus we have to add the extra
00 after each ASCII character. We also need a count of bytes we are displaying
and hence we increase the cnt variable by 1. We also have to increase the index
variable by 1 as the while loop uses it as a condition.
We next write a null at the end which in Unicode is
a set of zeroes. We then close the bracket and now need to display the Unicode
characters as readable texts but within comments. We thus need to place a
series of spaces before we can display the characters. We learnt that the
comments always start at 64 characters.
The assembly extern directive takes 20 characters
and each Unicode character takes 6 bytes and the number of Unicode characters
is stored in the cnt variable. We hope you understand why we need the extra
space. This way we calculate the number of spaces and use a simple function
CreateSpaces to add that many spaces.
Throughout this book we have had to focus on how
many spaces to add at different points of time. The above example is the one of
the easiest that you will find in this book. All that we do in the CreateSpaces
function is in a for loop keep adding a space to the string returnstring which
at the end of the for loop is made up of howmanyspaces.
We once again initialize the index variable to the
culture field and then iterate the while loop again. This time however we
display the ASCII value by casting the string heap byte to a char and putting a
dot after each character signifying the null character. We end with two dots
for the null characters and finally we close the brace for the assembly extern
directive.
Before we start explaining the function
DisplayFormattedColumns, lets speak the truth. Our program is littered with
functions like these that simply take care of formatting. All that the above
function does is display some bytes in a table with formatting done like the
way ildasm does.
The problem with code that formats things in a pretty
way is that when we write the code, we have to keep a dentists knife in our
mind, not a butchers knife. We have to write very precise code as a space here
or there makes a lot of difference. The first parameter is an offset into the
blob heap.
The blob heap is a unique entity as it has no
apparent structure. There is no end of string marker like the string heap or us
heap or is made up of 16 byte entities like the guid heap. For the blob heap,
the first byte is the important byte, as this byte tells us the length of the
bytes owned by the field.
If the designers chose only one byte, we would have
problem as then the maximum length represented would be 255 characters. If we
keep the length bytes as 2, then the maximum length represented will be 65535
bytes but for those entities that are less than 255 bytes, a byte in the blob
heap would be wasted and thus make the blob heap larger and less efficient in
size.
Thus the designers created a unique manner of
representing the length owned by a series of bytes. We have a function
CorSigUncompressData that excepts three two parameters and gives two answers.
The first is obviously the array of bytes, in our case the blob array and
second is the index byte or starting byte in the array.
We are returned the number of bytes that make up
the length or count byte and the third parameter uncompressedbyte will contain
the actual value that makes up the count or length bytes. Lets first move on to
this function and understand the compression method used. The name of this
function has a interesting story.
Before we write a line of code ourselves, we look
at other who have written on similar subjects. The .Net samples have a C++
program that displays metadata. That program comes with source code and uses
the function CorSigUncompressData to decode the count or length byte. All that
we did was rewrite the C++ code into C#.
For some reason code that does lots of useful stuff
like a debugger are not available as C# samples. Maybe Microsoft is telling us
something and we are turning a deaf ear to it. The Microsoft disassembler ildasm is a C++ program also. The function
CorSigUncompressData first checks the last bit of the first byte.
If it is zero, then the next seven bits tells us
the count or length bytes. We set the variable howmanybytes to 1 as the count
or length byte is one and the out parameter answer to the value of the byte
whose index is the parameter index.
Thus a count of up to 127 bytes can be represented
by 1 byte and normally the indexes into the blob will not be own more than 127
bytes. We then check the last two bits of the next 16 bytes. If they are 1 and
0, i.e. the last bit is 1 and the second last 0, this means that the next 14
bits store the size.
This now means that a length larger than 127 to
16383 can be represented by only 2 bytes. Thus we set the variable howmanybytes
to 2 and the answer parameter first knocks off the last two bits and then left
shifts by 8 for the multiplication and places the next 8 bits by using a simple
or.
Finally if the last three bits are 1, 1 and 0, then
the next 29 bits stand for the length. The howmanybytes variable is set to 4 as
this value is what we return back. The answer variable is a little more complex
as now we have to strip off bits 30, 31 and 32 using bit wise anding and then
left shifting by 24.
The remaining next three bytes are bit wise ored
after shifting them to the left by 16 and 8 and 1. If the above is not clear,
another way is to actually read in the individual bytes using variables and then
doing the multiplication and then adding them up.
Lets now move back to the earlier function
DisplayFormattedColumns and after the call to the CorSigUncompressData that we
have just explained. The howmanybytes variable tells us the number of bytes
occupied by the length byte in the blob array.
We then add this value to the index variable so
that the variable points to the start of the first byte of data. We cannot
assume that the variable howmanybytes will have a value of 1. Most of the time
if we make the above assumption, we will not be wrong, but we will show you
cases where the length byte is 2.
The variable uncompressedbyte now contains the
actual length of the bytes. We would now like to copy these sequence of bytes from
the offset in the blob array into an separate array just for our convenience.
Thus we create a new array blobarray that is uncompressedbyte bytes or the
length of the bytes in the blob array large.
We then use the static function Copy to copy from source
array blob, the first parameter, starting at position index within the array,
the second parameter. The third parameter is the destination array blobarray
and the fourth and fifth is the starting point and the length in the
destination array.
Throughout this book you will come across code like
the above as it is more convenient referring to bytes starting at zero than at
position index within an array. We will run thorough each byte in the array
blobarray using the variable maincounter and stopping when we loop
uncompressedbyte bytes.
In the main while loop we first initialize two
strings firststring and secondstring to an empty string. We enter a for
statement that will loop 16 times and the variable counterforascii is only used
as loop variable. This for statement will be executed many times and if ever
the outer loop variable maincounter ever become equal to uncompressedbyte we
quit the for statement.
The main part of the for is a if statement that
checks whether we have nay printable ascii characters or not as part of the 16
bytes we are checking. If we do, we take the Boolean variable noascii that is
false and set its value to true.
Thus even if only one of the 16 bytes is a
printable ascii character that we define as from hex 20 to 7e, the variable
noascii will be true. Why we do this will be clearer to you a couple of lines
down the road. We also increment the maincounter by one as we use as an index
into the blobarray variable.
When we exit the for loop we again set the
maincounter variable back to the value it was before we entered the loop. It
could be 16 but if the first if statement is true, the value of the
counterforascii variable will be far less than 16.
We now have another for loop that uses a loop
variable ii and starts for 0 to 16 like the earlier for loop variable did. We
again make sure that the maincounter variable does not exceed the number of
bytes stored in the uncompressedbyte variable. The first string variable is
nothing but a series of bytes in hex that we concatenate together.
This is because the hash directive etc show us a
series of bytes. The second if statement checks whether the byte is a printable
ascii byte or it. If it is, then it gets concatenated as its character value
using a cast as before. If not, the disassembler shows it as a dot.
Normally most table displays by ildasm have the 16
bytes in the first column and then the printable ascii characters or a dot
following comments. As this sort of display is very common, we have created a
function to handle it. We as before increment maincounter but when we leave the
for loop we do not set it back as we did earlier.
After this for loop we have a if and else where the
if true if we have run out of bytes to display which will happen once only when
the two variables maincounter and uncompressedbyte are equal and we have left
the for statement due to the first if being true. Thus lets focus on the else
statement first.
All that we do in the else is first look at value
of the noascii variable. If it is true, we simply add to first string a space
followed comment and the value of the variable second string. The ildasm
program says that it will display the comments and the printable ascii
characters if and if there is at least one printable ascii char among the ones
it is trying to display.
If there are all non printable it will display
none. If you recall the variable noascii is false and it becomes through only
if it meets one printable ascii. This is why we needed a special for just to
tell us whether to display the variable secondstring along with the contents.
The guy who wrote ildasm felt that a display with
all dots does not loop pretty. Remember this applies to a single line being
displayed only. We then use the WriteLine function to display the variable
firststring. We have reached the end of our tether and also the end of the main while.
Thus we have to write out a series of spaces so
that the next time we come to the above WriteLine we display everything one below
the other neatly laid out. The second parameter is the number of spaces that we
need to write out and we use the startingspaces parameter to the CreateSpaces
function. The vertical position where the table starts for the moment is a
constant as the name of the directive does not change nor are there any extra
spaces before the directive name.
This only needs to be written out for all line but
the last and hence when the two variables maincounter and uncompressedbyte are
not equal. Lets come back to what happens when we reach the last line to be
displayed.
Let us reiterate. We are here in the if statement
because the last line has to be displayed. We first need to know how many
characters will be written on the lat line. The variable leftovers will tells
us how many bytes will be displayed on this th last line.
As an example if we have to display 18 bytes in
all, the first line will display 16 bytes and as leftovers will be 2, the last
line will have 2 characters. We first take the normal case where the number of
characters to be displayed on the last line is not 16.
We first subtract by 15 to get the number of
characters that we have not displayed on this last line. We then add a close
bracket to the variable firststring.
We need this new number in leftovers as it lets us
calculate how many spaces we need to reach the end of the line so that we can
display the comment sign below the earlier comment sign. Each number displayed
takes up 3 bytes and one for the road is how we calculate the number of spaces
to be placed.
We also look at the noascii variable which if true
means that we have to place the spaces and this will followed by the string. If
false i.e. no printable ascii chars, we simply place a single space. Now lets
look at the else which kicks in if the number of characters on the last line is
exactly 16.
We as before concatenate the close bracket like
before and now decide whether to concatenate a space or not. The third
parameter is a true or a false. If it is true, which is not so in our case and
the number of bytes to be written out is less than 16 or one line, at times we
have to add a extra space.
The last line of the else does nothing ad can be
ignored. Even we forgot why it was placed there. Finally the same if statement
again and we placed it out of the earlier if for reasons of clarity.
We once again check the noascii variable and if
true we concatenate the comments and the contents of the variable secondstring
as we did earlier. If you look hard enough we really do not need this if
statement as it duplicates what the else does.
But hey we never said code we wrote would win a
competition. This is the end of a large program where we spend a lot of time in
simply displaying data in a table.
Program10.csc
int spacesforrest = 2;
int spacesfornested;
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
}
public void DisplayAssembley()
{
if (AssemblyStruct.Length == 1)
return;
Console.WriteLine(".assembly /*20000001*/
{0}" , GetString(AssemblyStruct[1].name));
Console.WriteLine("{");
DisplayAllSecurity( 2 , 1);
if ( AssemblyStruct[1].publickey != 0)
{
Console.Write(" .publickey = (");
DisplayFormattedColumns(AssemblyStruct[1].publickey
,16 , true);
}
if ( AssemblyStruct[1].HashAlgId != 0)
Console.WriteLine(" .hash algorithm
0x{0}",AssemblyStruct[1].HashAlgId.ToString("x8"));
int rev = AssemblyStruct[1].revision;
if ( rev < 0)
rev = 65536 + rev;
Console.WriteLine(" .ver {0}:{1}:{2}:{3}" , AssemblyStruct[1].major,
AssemblyStruct[1].minor,AssemblyStruct[1].build,rev);
if ( AssemblyStruct[1].culture != 0 )
{
Console.Write(" .locale = (");
int index = AssemblyStruct[1].culture;
int cnt = 0;
while ( strings[index] != 0)
{
Console.Write("{0} 00 " ,
strings[index].ToString("X"));
index++;
cnt++;
}
Console.Write("00 00");
Console.Write(" )");
int nos =
64 - (15 + cnt*6 + 6);
Console.Write(CreateSpaces(nos));
Console.Write("// ");
index = AssemblyStruct[1].culture;
while ( strings[index] != 0)
{
Console.Write("{0}." ,
(char)strings[index]);
index++;
}
Console.WriteLine("..");
}
Console.WriteLine("}");
}
public void DisplayAllSecurity (int tabletype , int
tableindex)
{
if (DeclSecurityStruct == null)
return;
for ( int ii = 1 ; ii < DeclSecurityStruct.Length
; ii++)
{
int coded = DeclSecurityStruct[ii].coded;
int table = coded & 0x03;
int row = coded >> 2;
if ( (table == tabletype && row ==
tableindex ) )
{
string returnstring;
if ( tabletype == 2)
returnstring = CreateSpaces(2 );
else if ( tabletype == 0)
returnstring = CreateSpaces(spacesforrest +
spacesfornested);
else
returnstring = CreateSpaces(spacesforrest +2 +
spacesfornested);
returnstring = returnstring + ".permissionset
" ;
string actionname = GetActionSecurity (DeclSecurityStruct[ii].action);
returnstring = returnstring + actionname ;
returnstring = returnstring + " = (" ;
Console.Write(returnstring);
int index = DeclSecurityStruct[ii].bindex;
if ( index == 0)
Console.WriteLine(")");
else
DisplayFormattedColumns(index ,returnstring.Length,
false );
}
}
}
public string GetActionSecurity (int actionbyte)
{
string returnstring = "";
if ( actionbyte == 1)
returnstring = "request";
if ( actionbyte == 2)
returnstring = "demand";
if ( actionbyte == 3)
returnstring = "assert";
if ( actionbyte == 4)
returnstring = "deny";
if ( actionbyte == 5)
returnstring = "permitonly";
if ( actionbyte == 6)
returnstring = "linkcheck";
if ( actionbyte == 7)
returnstring = "inheritcheck";
if ( actionbyte == 8)
returnstring = "reqmin";
if ( actionbyte == 9)
returnstring = "reqopt";
if ( actionbyte == 10)
returnstring = "reqrefuse";
if ( actionbyte == 11)
returnstring = "prejitgrant";
if ( actionbyte == 12)
returnstring = "prejitdeny";
if ( actionbyte == 13)
returnstring = "noncasdemand";
if ( actionbyte == 14)
returnstring = "noncaslinkdemand";
if ( actionbyte == 15)
returnstring = "noncasinheritance";
return returnstring;
}
}
e.il
.assembly
zzz
{
.ver 1:2:3:4
.hash algorithm 32773
.permissionset assert = ( 1 2 3 65 65 67 68 69 70
71 72 73 20 21 22 23 24 25)
}
.class zzz
{
.method public static void adf() cil managed
{
.entrypoint
}
}
Output
.assembly /*20000001*/ zzz
{
.permissionset assert = (01 02 03 65 65 67 68 69 70 71 72 73 20
21 22 23 // ...eeghipqrs !"#
24 25 ) // $%
.hash algorithm 0x00008005
.ver 1:2:3:4
}
Let us now display the actual Assembly table by
calling the function DisplayAssembley in the abc function. We are following the
same order that ildasm follows. As always we check whether the AssemblyStruct
array has been created or not and then decide to exit or not. From none on we
will ignore this family of error checks.
The first directive that we discuss in the assembly
directive is all about security. We use a function DisplayAllSecurity to
display the security directives and pass the function two parameters 2 and 1.
The security directive called permissionset can be placed in lots of other
places and hence we call this function many times.
But before we look at what it does, most of what we
did with the assembly ref or assembly extern directive is repeated here also.
The only difference is that the third parameter to the DisplayFormattedColumns
columns is true and not false as before.
The first is the publickey directive that is the
same along with the ver directive. There is a hash algorithm directive that
simply takes a number representing the algorithm that we use. A assembly is
made up of many files and all the files must use the same hashing algorithm to
calculate the hash value.
There is only one value that can be specified by a
conforming implementation that is the algorithm SHA1 that has a value of 32772.
It makes sense to specify a single value but we used a different value and the
system did not bother to check.
The culture code is the same as in the assembly
extern case and we were to tired to convert it into a function. The problem
with compilers is that they allow lazy practices like what we have done to
continue. We would have preferred if the compiler refused to compile our code
saying that we first place the above code in a function. Its like my car
refusing to move if the seat belts are not worn.
The key function in the above code is the one that
displays the security or the function DisplayAllSecurity. This function has two
parameters, one the tabletype that has nothing to do with the metadata tables
and the second the table index. All the security attributes are stored in a
table called DeclSecurity.
This table has a field called coded that as the
name suggests is a coded index field and the specs call the field parent. The
directive permissionset can appear in one of three places, a type def or a
method or a assembly. The first two
bits are assigned for the table and the remaining for the row within the table.
This coded index is called the HasDeclSecurity
coded index. By shifting the bits two to the right we
have in the variable row the actual row number. When we called the function DisplayAllSecurity,
the tabletype was 2 as this is value of the coded index for the assembly table
and the tableindex had a value of 1 as this is a row number and the assembly
table can have only one row.
For the method table, it will stand for an index
for the method that should carry the security directive or the method in which
we wrote the permissionset directive. Thus it is the if statement checks for
the right table and row to figure out whether the permissionset directive
should be placed here or not.
The tabletype parameter has another role to play.
Depending upon its value which represents a coded index, we decide how many
spaces the string returnstring should have. The Assembly extern directive has
no indentation and hence the permissionset directive is always indented by two
spaces.
When the tabletype is 0, the value of the coded
index is the type def table and here the indentation is decided by the
namespace and whether the type is a nested type. The third table is the method
table which has a coded index value of 1. Here the .method directive has to be
indented by 2.
The variables spacesforrest and spacesfornested will store for us the number
of indentations. This we will explain as we move along. As a reiteration, we
need to find that row in the DeclSecurity table that has a coded index that
matches the Assembly, TypeDef or method table.
We are passing as the second parameter the row
which may contain the permissionset directive. After writing the directive
permissionset we must figure which action the directive pertains to. We pass
the action number stored in the field action to the function GetActionSecurity
which returns the action as a name.
This function simply checks whether the parameter
actionbyte has a value from 1 to 15. These are the only values the action field
can have and depending upon the value we return a certain string. You can not
get a easier function to write. The bindex field is similar to the fields in
the assembly extern directive that point to a index in the blob heap.
We make sure that the index into the blob heap is
not zero which means no security and thus display the close bracket. We use our
good old DisplayFormattedColumns function to display the security in a tabular
format.
Two questions come to our mind, what is the
permissionset attribute all about and where did we get the action field value
from. The documentation as of the moment does not give us all the action field
values.
What we did was write out the permissionset
directive in a il file, used ilasm to give us an exe file and then ran both the
original ildasm program and our disassembler. In our disassembler we display
the value of the parameter actionname which gives us a number and the ildasm program
gives us a name.
Tedious, but we have no choice as the specs do not
give us all the names and number combinations. We ran our disassembler through
5000 files that we keep boasting about but yet we must have missed out certain
action field values. We did the unthinkable. We ran a file spying program that
we downloaded from the site sysinternals.com.
This program told us all the files that ildasm.exe
opens up. We scanned these files though a he editor and realized that the
disassembler itself ildasm.exe had loads of strings in it. We wrote out these
strings in a separate file.
We then searched for the string assert and this is
how we figured out that the action field has 15 possible values. Ildasm.exe is
a program written in C/C++ and all the output the program generates must use
strings stored in an array. The contents of these arrays is what tells us all
possible options for every directive and or instruction. Tell us if you know of
a better way.
The .permissionset directive is a way to attach declarative
security attributes to a typedef, assembly or a method. The bytes displayed
after the action value are a XML based representation. These are meant to
specify a serialized version of the security settings.
Program11.csc
public string NameReserved (string name)
{
if ( name.Length == 0)
return name;
if ( (byte)name[0] == 7 )
return "'\\a'";
if ( (byte)name[0] == 8 )
return "'\\b'";
if ( (byte)name[0] == 9)
return "'\\t'";
if ( (byte)name[0] == 10)
return "'\\n'";
if ( (byte)name[0] == 11 )
return "'\\v'";
if ( (byte)name[0] == 12 )
return "'\\f'";
if ( (byte)name[0] == 13 )
return "'\\r'";
if ( (byte)name[0] == 32 )
return "' '";
if ( name == "'")
return "'\\''";
if ( name == "\"")
return "'\\\"'";
if ( name.Length == 2 && (byte)name[1] ==
7 )
return "'" + name[0] + "\\a'";
if ( name.Length == 2 && (byte)name[1] ==
8 )
return "'" + name[0] + "\\b'";
if ( name.Length == 2 && (byte)name[1] ==
'\t' )
return "'" + name[0] + "\\t'";
if ( name.Length == 2 && (byte)name[1] ==
'\n' )
return "'" + name[0] + "\\n'";
if ( name.Length == 2 && (byte)name[1] ==
'\v' )
return "'" + name[0] + "\\v'";
if ( name.Length == 2 && (byte)name[1] ==
'\f' )
return "'" + name[0] + "\\f'";
if ( name.Length == 2 && (byte)name[1] ==
'\r' )
return "'" + name[0] + "\\r'";
if ( name.Length == 2 && (byte)name[1] ==
'"' )
return "'" + name[0] +
"\\\"'";
if ( name.Length == 2 && (byte)name[1] ==
'\'' )
return "'" + name[0] + "\\\''";
if ( name.Length >= 1 && name[0] == '\''
&& name[name.Length-1] == '\'' )
return name;
int i = 0;
while ( i < name.Length )
{
if ( (name[i] >= '0' && name[i] <= '9') && i == 0 )
return "'" + name + "'";
if ( name[i] >= 1 && name[i] <= 31 )
return "'" + name + "'";
if ( name[i] >= 127 || name[i] == '<' || name[i] == '+' || name[i] == '-' ||
name[i] == ' ' || name[i] == '\'' )
{
int ind = name.IndexOf("'");
if ( ind != -1)
name = name.Insert(ind , "\\");
return "'" + name + "'";
}
i++;
}
string [] namesarray =
{"value","blob","object","method"
,"init",".init","array","policy.2.0.myasm","policy.2.0.myasm.dll","add",
"assembly", "serializable", "lcid" ,
"stream" , "filter" , "handler" , "record",
"request", "opt", "clsid", "hresult",
"cf", "custom" , "to", "ret",
"import", "field", "sub", "any",
"pop", "final", "rem" , "storage",
"error", "nested", "il" , "instance",
"date", "iunknown", "literal",
"implements", "unused", "not" , "alignment",
"unicode", "bstr", "auto", "retval",
"variant", "or", "family", "arglist",
"br", "wrapper", "demand", "fault", "call"
, "algorithm",
"native",
"fixed",
"string", "char"
, "decimal", "float" , "int", "void",
"pinned",
"div",
"true" ,
"false",
"default",
"abstract" , "^" , "`", "{",
"|" , "}", "~" , "!" , "#" , "(",
"%", "-" ,
")", ":" , ";", "=", ">",
"assert", "synchronized", "runtime",
"with", "class",
"newarr", "ldobj",
"ldloc", "stobj", "stloc", "starg",
"refany",
"ldelema",
"ldarga",
"ldarg",
"initobj" ,
"box"
,"demand" ,
"ldfld", "ldflda", "ldsfld",
"ldsflda", "vector",
"in" , "out", "and", "int8",
"xor" , "as", "at", "struct",
"finally" , "interface" , ".CPmain",
"enum", "vararg", "marshal",
"policy.1.0.MathLibrary","policy.1.0.MathLibrary.dll","final",
"System.Windows.Forms.Design.256_2.bmp" ,
"System.Windows.Forms.Design.256_1.bmp"};
for ( int ii = 0 ; ii < namesarray.Length ;
ii++)
{
if ( name == namesarray[ii])
{
int ind = name.IndexOf("'");
if ( ind != -1)
name = name.Insert(ind , "\\");
return "'" + name + "'";
}
}
return name;
}
string dummy =
NameReserved(GetString(ModuleRefStruct[ii].name));
Console.WriteLine(".module extern {0}
/*1A{1}*/" , dummy ,
ii.ToString("X6"));
Console.WriteLine(".assembly extern /*23{0}*/
{1}", i.ToString("X6") ,
NameReserved(GetString(AssemblyRefStruct[i].name)));
Console.WriteLine(".assembly /*20000001*/
{0}" , NameReserved(GetString(AssemblyStruct[1].name)));
e.il
.assembly
'$$vijay-1'
{
}
.class '??'
{
.method public static void adf() cil managed
{
.entrypoint
}
}
.assembly /*20000001*/ '$$vijay-1'
{
.ver
0:0:0:0
}
Shakespeare once said What’s in a name. A rose by
another name will smell just the same. For us, however a name is very important
for a simple reason. In C# the word ldarg may not be a reserved word but it is
so in IL. The Perl programming may not treat the word class as a reserved word.
At the end of the day, it is immaterial whether a
programming treats a word as reserved or not, what is material is whether IL
treats it as a reserved word or not. We are allowed to use a reserved word as a
valid identifier in IL but with on proviso. It has to be enclosed in single
inverted commas, if and only if it is a reserved word.
We have been using names already in three different
places, for assembly and assembly and module extern’s. If the name is reserved
by IL standards, we have to place it as said before in single and not double
inverted commas.
Thus a name that has a minus sign in it is a
reserved word in IL and hence the name of the assembly has to be in single
inverted commas or the assembler gives us an error. This is one type of
reserved word that is not allowed to contain certain characters. The second are
words that form part of the IL vocabulary like ldarg, class etc.
The third case is where we use the unprintable
ASCII chars like number 7 as a name. If we use the WriteLine function, we will
not see a legible output and the number 10 will instead give us a new line.
These special characters have to be handled separately.
Why would anyone in their right mind use a enter a
name is beyond us but we came across scores of files that for some reason chose
to use the unprintable ASCII chars that have a value of up to 31. Thus we first
check if the string has a length of 0 and if true, we simply return the empty
string.
Then we check for some of the unprintable ascii
chars and return a \n instead of the unprintable character when for say the
value is 10. However the \n will be read as a enter by all and sundry and we
need to escape the black slash by putting one more. All this must be protected
by placing them in single inverted commas.
All this assumes that the first array member is a
unprintable character. At times the second member is and hence the next series
of if statements take care of just this eventuality. We should have instead used
a loop to take care of the general case but in our minds asked why should
anyone in the world use such unprintable characters for names.
If they insist, use someone else’s disassembler.
For the these series of if statements
we are checking the second member of the array name[1]. If the string is
only one char or byte large we will have an exception thrown as we are moving
beyond the bounds of the array. Hence the check for the length of the string
being greater than 2.
We first check if the name is a single or double
inverted comma. Once again we ask why would somebody chose such a name. It
seems that they are such intelligent people
in the world and hence we have to write such code. For a single inverted
comm. We have to simply use two back slashes but the double is more complex.
Here we have to use three backslashes. The first
two become a single backslash and the third escapes the double inverted commas
as we are in double inverted commas. The next if statement checks whether the
entire string is already enclosed within single quotes. This check is needed if
we are by mistake calling this function NameReserved twice.
Thus we first check the first and last array
member. If the length of the string is 1, then both the first and last member
are the same which is the single inverted comma. If you have been paying
attention, we had checked for a single inverted comma earlier. Thus to be
politically correct we should check for string with a length of 2 or more.
But hey, we know that this book will win no prize
for the best code. What matters is that it works and you understand why. Now
comes the meat. There are certain special cases where if we find some special
characters in the string, we need to quote the string.
Thus we check character by character of the string
and use variable I for the array offset. If the first character is a numeric
digit, then the string needs to be quoted or else an array results. Remember
this applies to only the first character of the string.
The second rule is that if the string contains a
non printable ASCII char, then quote the string. Once again the special ones
that require special handling we have taken care of earlier using dedicated if
statements. We could have used the same if statements here also.
Finally from the printable ASCII set chars like the
less than sign, minus, plus , space etc are reserved and hence need to be
quoted. Also if the string contains chars beyond 127, then also the entire
string has to be quoted. This check again is for individual chars. There is one
small problem.
If ever the string that is to be single quoted also
contains a single quote, we have a problem as this single quote has to be
escaped using the backslash. So before we single quote the string, we check
whether it has a single quote using the IndexOf function which returns –1 if
the string is not present.
Else it returns where the search string in the
actual string. Knowing this number we sue the Insert function to actually put a
backslash at that point. The backslash has to be also escaped and the Insert
function will insert this backslash where the single inverted comma was earlier
and move the rest of the string to the right.
This will increase the size of the string by one.
Finally we have a huge array called namesarray that we fill up with a list of
reserved names from the IL specifications. We in a for loop check we check
whether any of the array members is equal to the string called name passed as a
parameter.
If it is a match, we check for the single quote
problem mentioned earlier. Yes you are right, we should have had a function to
the check as we are repeating the same code. To be honest, we did not write the
code again, we simply block copied it. We finally quote the string if a match
is found.
We learned some interesting things along the way. A
dot or two dots or full stops is not always a reserved. Having a extension of
bmp is not reserved and also having a
underscore and a dot and a bmp extension is not reserved. But a name like the
last two is reserved. GO figure out and remember what Shakespeare said about
names.
Program12.csc
public void DisplayFileTable ()
{
if ( FileStruct == null )
return;
for ( int ii = 1 ; ii < FileStruct.Length ;
ii++)
{
Console.WriteLine(".file /*26{0}*/
{1}{2}" , ii.ToString("X6") ,
GetFileAttributes(FileStruct[ii].flags) ,
NameReserved(GetString(FileStruct[ii].name)));
int table = entrypointtoken >> 24;
if ( table == 0x26 )
{
int row = entrypointtoken & 0x00ffffff;
if ( row == (ii) )
Console.WriteLine(" .entrypoint");
}
if ( FileStruct[ii].index != 0)
{
Console.Write(" .hash = (");
int index = FileStruct[ii].index ;
DisplayFormattedColumns(index ,13 , false);
}
}
}
public string GetFileAttributes(int fileflags)
{
string returnstring = "";
if ( fileflags == 0x00)
returnstring= "";
if ( fileflags == 0x01)
returnstring= "nometadata ";
return returnstring;
}
e.il
.file aa.exe
.file nometadata bb.exe
.file nometadata cc.exe .hash = ( 1 2 3)
.file dd.exe .hash = ( 1 2 3) .entrypoint
.class zzz
{
}
Output
.file /*26000001*/ aa.exe
.file /*26000002*/ nometadata bb.exe
.file /*26000003*/ nometadata cc.exe
.hash =
(01 02 03 )
.file /*26000004*/ dd.exe
.entrypoint
.hash =
(01 02 03 )
The above program displays the directive .file. In
the .Net world, we start with the concept of a module. Every file has to have
only one entry in the module table. An assembly is the next level of
aggregation and many modules can make up an assembly. Thus there is a one to
many relationship between modules and assemblies.
However an assembly can be made up of a number of
other files like text files, documentation files or what have you. There must
be some directive that tells the assembly about the existence of these file. In
the shipping world a manifest is a list of containers in the ship and thus the
file directive is used to add the file name to the assembly manifest.
Every file that is added to the assembly manifest
does not have to be a .net executable and have metadata. Thus the nometadata
option indicates that this file has no metadata and its format is not known to
us. It may be considered as a pure data file. The point is that no one checks
whether the file name exits or not at the time of assembly.
Thus the attribute nometadata tells us that the file
name following is not a module and does not adhere to the .Net specifications.
When ever we create the smallest .Net executable, the directive .entrypoint has
to be present in a function. This is mandatory as otherwise there is no way of
knowing which is the first function to be executed.
Thus the .entrypoint directive tells us that this
file is part of a multi module assembly and also contains the starting
function. When we move to this file, we must see a static function with the
entrypoint directive.
The assembly does not go out and do the check, but
prevents you from creating a function in this module with the .entrypoint
directive. The hash option is optional and if we do not specify it, the
assembly linker that is a program called al will automatically calculate it
during creation of the assembly.
Thus it may be optional at assemble time for ilasm,
but at runtime this hash value must be present. Thus if we do not create one,
it has to be present at runtime. We have a simple philosophy, if someone else
does the work for us, why should we. The designers of the .Net world have
created a table just to store the .file directive.
We loop through all the table entries and realize
that this table has only three fields. The first is a bitmask of type
FileAttributes that can have only two values in spite of being four bytes
large. A bitmask is a entity where the
value of the byte is unimportant, it is the individual bits that carry
information.
The file directive has only one option, nometadata.
Thus a value of zero implies that it is a resource file and hence we use the
function GetFileAttributes to return a null. A value of 1 means that the option
nometadata was used and hence we return the string nometadata.
The second field is called name and is a offset
into the strings heap and the last value is a index into the blob heap for the
hash value. We first check whether this field is no zero and if in the
affirmative, we use the DisplayFormattedColumns function as before to display
the hash value.
The name field cannot be null and it cannot be a
absolute path name. Its length is restricted by the value stored in
MAX_PATH_NAME. There are also some reserved names like con, aux, lpt, null and
com that cannot be used as they represent device names. Also we cannot use
numbers or special characters like a $ or a colon.
But, ilasm does not do these error checks for some
reason. Like always duplicate rows are an theoretically an error that nobody
checks. Also if the name of the file is in the Assembly table, it means that
the module holds the manifest, the same name cannot appear in the file table
also.
The file table being empty means that we have a
single file assembly. This also means that the Exported Type table should also
be empty. A long time back we came across a instance variable called
entrypointtoken whose value was set by the CLR header.
This value is a like and unlike a coded index where
the last and not the first 8 bits give us the table that contains the function
for the entrypoint token. We right shift this variable by 24 bits and if we get
a value of 26, it means that the entrypoint function lies in some other file.
We then take the first 24 bits that contain the row
in the file table that contains the entrypoint token. If this value matches the
loop index, we display the words entrypoint.
Tutorial on Assemblies, Modules, Files and other
such aliens from outer space.
p.il
.class zzz
{
.method public static void abc()
{
.entrypoint
ret
}
}
Let’s start at the very beginning. We create a file
called p.il and assemble it using the program ilasm to get a executable p.exe.
We have done this in the past is your immediate response. There is no way we
could disagree. When we run the program, we get an error that says that the
file p.exe was expected to contain a assembly manifest.
This is another way of saying that we have no
directive called assembly. Thus every executable file has to have a assembly
directive otherwise we cannot execute it. There is no physical entity called a
manifest stored anywhere in the file. This concept of a manifest is logical and
is created using rows from different metadata tables.
Thus if we have no row at all in the Assembly
table, we get the above error. A manifest is a entity created by the runtime
loader that specifies all the files an assembly is made up of. We must
reiterate again that there is no physical table that stores the assembly it is
computed at run time and is made up of a list of files.
.assembly aa
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
ret
}
}
In the above program we have added an assembly
called aa and left the declaration blank as they are all optional. After
reassembling the program and running we get no error as we have a manifest or
assembly directive called aa. In practical terms, you should always have the
assembly name and the file name the same as we will later demonstrate.
The reason being, the assembly directive has no file
option that specifies the name of the physical file. Thus it is assumed that
the file name will either be name of assembly with a dll or exe extension
added. Therefore the assembly or manifest name should not have an file
extension ever.
.assembly aa
{
}
.assembly bb
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
ret
}
}
With the assembly directive we can have one and
only one. We cannot have zero or two. This makes sense as the assembly name is
what we refer to when we want to access code from other assemblies. More on
this later.
A module directive if missing is not a problem as
mentioned before as the assembler adds one automatically for us with the same
name as the name of the file and the words dll in caps as the extension. Like
an assembly we cannot have two module directives. You will agree wit us a
little later why an assembly directive does not get added on.
p1.il
.assembly p1
{
}
.class public yyy
{
.method public static void pqr()
{
ret
}
}
p.il
.assembly aa
{
}
.assembly extern p1
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
call void [p1]yyy::pqr()
ret
}
}
We have now created two files p.il and p1.il. We
first assemble p1.il to a DLL and not a exe file as ilasm /Dll p1.il. It does
not really matter whether we make it into a DLL or a exe file, we chose the dll
option just to be different and have a change of scene. In this file we have a
class yyy that in turn has a static function called pqr.
We would like the world to call this function. We
have also called the assembly p1, which is the same name as the name of the
physical file. We then move on the file p.il which will be compiled to a
executable file and hence the assembly directive that does not have the same
name as the physical file.
To call a function, we use the call instruction and
first specify the return type that is void in this case. Then we specify the
function to be called. This gets a little complex. Starting from the left we
specify not the parameters in brackets but the data types of the parameters.
Its our job to place the parameters on the stack as
we shall see how to accomplish in the next program. We then state the function
name i.e. pqr followed by the class/namespace name. For some reason the delimiter
for class/namespace is a dot but for class/function it is a colon.
Don’t ask us why, as we believe the guy was
allergic to too many dots. Finally in square brackets we place the name of the
assembly that contains the code of the class yyy. This is assembly p1 and hence
we have to use the assembly extern directive. If we do not we get a assemble
error that tells us that we have a undefined assembly ref.
We are only allowed to put the name of a assembly
in the square brackets and not a file or a module or any such entity. Some
basic questions that come to our minds. We change the function name from pqr to
pqr1 and even though the function name is not present in the class yyy, ilasm
does not actually check the file p1.dll.
When we run the program p.exe, an exception gets
thrown at us that tells us that the method yyy:pqr1 is not found. Thus it is
our responsibility to check that the function is available. All that ilasm
checks is that whatever we use in square brackets is backed up by a assembly ref
directive.
Ilasm does not even check for the existence of such
a file. We delete the file p1.dll and when we now reassemble, we do not get an
error. We now run the program p and now have a file-not-found exception thrown
at us. The error message clearly tells us that the loader first checks for
p1.dll and then p1.dll again in a sub-directory p1.
It then checks for p1.exe and finally p1.exe in
sub-directory o1. For some reason the loader prefers looking for a dll over a
exe. Now you know why the name of the assembly should match the name of the
physical file. We used the assembly name in our call instruction and there is
no mapping from assembly ref to physical file and thus please keep the two the
same.
p.il
.assembly aa
{
}
.assembly extern mscorlib
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
ldstr "hi"
call void
[mscorlib]System.Console::WriteLine(string)
ret
}
}
In the above program we simply call the static function
WriteLine from the class Console in the System namespace. As the WriteLine
function in this case takes one string as a parameter, we place it within the
round brackets. To put a string on the stack we use the ldstr instruction and
there are similar instructions for other data types.
We first put the parameters on the stack and then
call the function. This function is in the assembly mscorlib and thus we have a
assembly reference to it. The code for this class is obviously in the file
mscorlib.dll.
When we run the program it displays hi. In C#, the
compiler actually checks whether the file mscorlib.dll has a function WriteLine
that accepts a string as a parameter and is present in the Console class. All
these checks have to done by you and me. If we do not, as said above we get
an exception thrown at runtime.
p.il
.assembly aa
{
}
.assembly extern p1
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
call void [p1]yyy::pqr()
ret
}
}
p1.il
.assembly p1
{
}
.class public yyy
{
.method public static void pqr()
{
.entrypoint
ldstr "p1.exe"
call void
[mscorlib]System.Console::WriteLine(string)
ret
}
}
Before we run the above set of programs, delete the
file p1.dll first. All that the p.il file does is calls the function pqr from the
assembly p1. We next compile p1.il to an exe and we get p1.exe. When we run p,
it actually calls the above pqr function as the assembly name specified is p1.
We then replace the above p1.il with the one below.
p1.il
.assembly p1
{
}
.class public yyy
{
.method public static void pqr()
{
ldstr "p1.dll"
call void
[mscorlib]System.Console::WriteLine(string)
ret
}
}
The two changes made are that we have removed the
entrypoint directive and also changed the string to be displayed. When we assemble
to a dll, and then run p, the loader first checks for the dll and as it finds
it, it executes pqr from the dll and ignores the exe file.
p1.il
.assembly p11
{
}
We change the assembly name in p1.il to p11 and the
assembler gives us no errors but the loader first loads the file p1.dll and
then immediately looks in the Assembly table for the single roe to contain the
name p1. In our case it is p11 and hence an exception gets thrown. So we hope
that from now on you will always keep the assembly name the same as the
physical file name minus the extension please.
p.il
.assembly aa
{
}
.assembly extern p1
{
}
.class zzz
{
.method public static void abc()
{
.entrypoint
call void [p1]yyy::pqr()
call void [p1]xxx::xyz()
ret
}
}
p1.il
.assembly p1
{
}
.file p2.dll
.class extern xxx
{
.file p2.dll
}
.class public yyy
{
.method public static void pqr()
{
ret
}
}
p2.il
.class public xxx
{
.method public static void xyz()
{
ret
}
}
The above example takes our understanding of assemblies
on level further. Lets start with p.il, the exe file. Here we are calling two
functions pqr and xyz from the classes yyy and xxx respectively. Also these
classes exist in a assembly p1. So what you say, nothing new, read on.
In file p1.il, we have a assembly declaration with
the same name p1. The only problem is that we have a single class yyy with the
static function pqr. There is no code for the class xxx anywhere in this
assembly. Now turn your eyes to file p2.il. Here we have a function xyz in a
class xxx.
We will compile both files to a dll and in some way
we need to relate the two. If you look at the code in p2.il, there is no
linkage with eth file p1.il. The way we do it is place all the linkages in the
assembly file or manifest. We start with the class extern directive that only
declares is a class called xxx.
The file directive inside it tells us which file
this class xxx will be found in. We cannot stop here as there has to be a
corresponding global file directive at the same time. These two file directives
tell the manifest that this assembly is now comprised of two files, p1.dll and
p2.dll.
The file p2.dll contains the code of the class xxx.
Thus when the assembly p1.dll is loaded, the file p2.dll will also be loaded.
This is how we can have multiple files comprise an assembly. There is no
explicit way of saying this file belong to this assembly only. A file can
belong to as many assemblies as it wants.
By using the class extern along with the file
directive twice, once to specify which files are part of the assembly and the
second one to specify which classes belong to which file.
The .class extern needs the file directive because
if we do not specify one, the assembler does not know the name of the file that
contains the code of the class xxx. This file directive does not actually check
for the existence of the file at assemble time like always. If we remove this
line as in.
p1.il
.class extern xxx
{
}
The assembler does not give us an error but an
warning that says that the xxx class is not added to the list of classes that
the assembly is made up of. Obviously we will get a runtime error as the class
xxx is not around.
If we remove the .file directive from the file
p1.il, the assembler gives us a error saying that there is no top level file
directive.
P1.il
.assembly p1
{
}
.file p2.dll
.class extern xxx
{
.file p2.dll
}
.class extern xxx
{
.file p3.dll
}
.class public yyy
{
.method public static void pqr()
{
ret
}
}
p2.il
.class public xxx
{
.method public static void xyz()
{
ldstr "p2.dll"
call void
[mscorlib]System.Console::WriteLine(string)
ret
}
}
p3.il
.class public xxx
{
.method public static void xyz()
{
ldstr "p3.dll"
call void
[mscorlib]System.Console::WriteLine(string)
ret
}
}