The
above example is to illustrate a simple point. The assembler does not do
extensive error checks. We have two files, p2.il and p3.il that contain the
same class xxx and function xyz. In the file p1.il, we have the same .class
extern twice and we get no error. The second class extern refers to a file
p3.dll that does not have a file directive at the top level.
No errors at all at assemble
time which means that the assembler ignores our indiscretions. Also at runtime
the xyz method gets called from p2.dll. Change the order of the class externs
and watch the assembler complain. Remember the old adage buy at your own risk
applies here also. No helping hand from the assembler. No hand holding also.
A assembly and a module lets
us group constructs together and each play a different role in the .Net world.
A set or a group of files is what we call an assembly. It is the abstract
entity that we call a manifest that keeps track of all the files present.
The assembly table only
store for us the Version, name, culture and security requirements that we have
not touched upon yet. The assembly manifest must keep track of not only the
files but also the cryptographic hash of each file. The manifest is computed
from the metadata as mentioned earlier.
The runtime also needs to
know which types are defined by other files and can be exported out of this
assembly. This is achieved using the class extern directive.
Obviously if the type is
defined in the same file that has the assembly directive, its attributes like
public, private etc decide whether it
can be exported from this assembly or not. We could use digital signatures and
a public key to compute it with the Manifest.
The problem with digital
signatures is that for some reason it did not set the Thames on fire. The major
difference between assemblies and modules is that assemblies comprise modules.
A module consists of a single file that adheres to the rules of the .net world.
It can either be a dll or a exe file, it must contain executable code.
There is one file in the
list of files that carries the assembly directive and it is this file that
gives us a list of other modules that make up the assembly. The file that
carries the assembly directive is also a module. If the assembly is a dll, then
we do not need a function that has the entrypoint directive.
However if it is a exe file,
then the file that contains the entrypoint directive should also contain the
assembly directive or manifest. The concept of namespaces may exist in
programming language, the CLI does not understand what we are talking about.
Type names are always
specified using the full name which is relative to the assembly that we have
created them in. If the file is say a text file or a bmp file or a video file
and does not contain metadata, will not have the module directive. It is
normally not a common practice that many assemblies can refer to the same
module but nothing stops us from doing so.
The advantage is that once a
dll is loaded into memory, the next assembly that references the dll will not
result in the dll being loaded again in memory.
Program13.csc
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
}
public void DisplayClassExtern
()
{
if (ExportedTypeStruct ==
null)
return;
for ( int ii = 1 ; ii <
ExportedTypeStruct.Length ; ii++)
{
string ss1 =
GetTypeAttributeFlagsForClassExtern(ExportedTypeStruct[ii].flags );
string ss =
GetString(ExportedTypeStruct[ii].nspace) ;
if ( ss.Length != 0)
ss = ss + ".";
ss = ss +
NameReserved(GetString(ExportedTypeStruct[ii].name));
int table =
ExportedTypeStruct[ii].coded & 0x03;
int index =
ExportedTypeStruct[ii].coded >> 2;
if ( index != 0)
{
Console.Write(".class
extern /*27{0}*/ {1}" , ii.ToString("X6") , ss1 );
Console.WriteLine(ss);
Console.WriteLine("{");
if ( table == 0)
{
Console.WriteLine(" .file {0}/*26{1}*/ " ,
NameReserved(GetString(FileStruct[index].name)) ,
index.ToString("X6"));
}
if ( table == 2)
{
Console.WriteLine(" .comtype '{0}' /*27{1}*/ " ,
NameReserved(GetString(ExportedTypeStruct[index].name)) ,
index.ToString("X6"));
}
if (
ExportedTypeStruct[ii].typedefindex != 0)
Console.WriteLine(" .class 0x{0}",
ExportedTypeStruct[ii].typedefindex.ToString("X8"));
Console.WriteLine("}");
}
}
}
public string
GetTypeAttributeFlagsForClassExtern(int typeattributeflags )
{
typeattributeflags =
typeattributeflags & 0x07;
string
visibiltymaskstring="";
if ( typeattributeflags ==
1)
visibiltymaskstring =
"public ";
if ( typeattributeflags ==
2)
visibiltymaskstring =
"nested public ";
return visibiltymaskstring;
}
e.il
.file mscorlib.dll
.file jj.dll
.assembly e
{
}
.file aa.dll
.file bb
.class extern o1
{
.file jj.dll
}
.file jj.dll
.class extern public jjj
{
.file aa.dll
}
.class extern nested public kkk
{
.file aa.dll
}
.class extern public lll
{
.class extern kkk
}
.class extern public ooo
{
.class extern nnn
}
.class extern public ppp
{
.class extern jjj
}
.class extern public mmm
{
.class extern System.Object
}
.class extern public System.Object
{
.file mscorlib.dll
}
.class extern public nnn
{
.file jj.dll
}
.class extern a1
{
.file jj.dll
.class 56
}
.class zzz
{
.method static void abc()
{
.entrypoint
ret
}
}
Output
.assembly
extern /*23000001*/ mscorlib
{
.ver 0:0:0:0
}
.assembly
/*20000001*/ e
{
.ver 0:0:0:0
}
.file
/*26000001*/ mscorlib.dll
.file
/*26000002*/ jj.dll
.file
/*26000003*/ aa.dll
.file
/*26000004*/ bb
.class
extern /*27000001*/ o1
{
.file jj.dll/*26000002*/
}
.class
extern /*27000002*/ public jjj
{
.file aa.dll/*26000003*/
}
.class
extern /*27000003*/ nested public kkk
{
.file aa.dll/*26000003*/
}
.class
extern /*27000004*/ public lll
{
.comtype 'kkk' /*27000003*/
}
.class
extern /*27000006*/ public ppp
{
.comtype 'jjj' /*27000002*/
}
.class
extern /*27000008*/ public System.Object
{
.file mscorlib.dll/*26000001*/
}
.class
extern /*27000009*/ public nnn
{
.file jj.dll/*26000002*/
}
.class
extern /*2700000A*/ a1
{
.file jj.dll/*26000002*/
.class 0x00000038
}
In this program we simply
display the directive class extern. First we as always add a function
DisplayClassExtern to the abc function. We then in a loop iterate through all
the rows in the ExportedType table. The first thing we need to figure out is
the export attributes that can have one of two values, public or nested public.
These specify the visibility
or who all can see and thus use this type. We have a function
GetTypeAttributeFlagsForClassExtern to do the job for us and return the string
for us. The visibility attributes are stored in the first three bits. Thus we
bit wise and the parameter passed by 7 as the other bits specify other
attributes.
We then check whether the
value is 1 or 2 which tells us whether the public or nested public attributes
are on. We then need the name or more precisely the dotted name of the extern
type. This is stored in two fields name and nspace.
We first get the name of the
namespace and then if the namespace is present we add a dot and then add the
name of the class name. The next field coded is most important as it tells us
what directives are used in the braces. We could if you remember use .file and
.class extern.
The coded field tells us the
name of the table and the row in that table. The row in the table must be non
zero if the directive class extern has to be displayed. Why this is so will be
explained in a short while by using a practical example.
Thus the entire directive
will not be displayed if the variable index is zero even though the dotted name
has a non null value. Now that the index is non zero we then check the table
that the index refers to. If its value is zero, it refers to the file table, 2
means that it refers to the Exported Type table.
A value of 1 means the
Assembly Ref table and for some reason we have found no way to simulate this
value. Now we take each individual table that make up the coded index. The
first is the file table with a value of 0. Here we simply use the index
variable as a index into the FileStruct table and display the name of the file
stored there.
We also need to display the
row number of the file table that is available to us in the index variable. A
table with number 2 means that we have used the class extern directive and lets
understand it with a specific reference to the class lll in the file e.il. Here
we had written the class extern directive
with the name of the class kkk.
The class kkk happens to be
another class extern with a file directive. This gets converted to a directive
comtype whose explanation is no where to be found. Also ilasm does not
understand such a directive. All that we know is that we have used a class name
that is a class defined somewhere else as we have used the class extern
directive.
This class name kkk is
stored in the ExportedType table and use the index variable to fetch its name.
We are not allowed to use a type name like zzz in the class extern as it is a
type we are creating in the current file. The if statement which checks the
value of the typedefindex as its always zero unless we add the .class
directive. All that this directive does is fills up the typedefindex field with
the row number of the type table which is present in the other module. As we
will explain much later this is only a hint and nobody checks the value we
write after the class directive.
Looking at the file e.il,
the first three class extern classes o1,jjj and kkk simply use file directive.
The difference is the visibility attributes. The class ooo is not present in
the output at all even though it is there in the e.il file.
Lets investigate. For this
class we have used the class extern directive with the class name nnn. This
class in turn is stated to be in the file aa.dll. Thus as class ooo does
nothing but refers to class nnn, there is no use for its independent existence
and hence ilasm being a smart cookie ignores it and does not place it in the
Exported Type table.
Makes sense as the class is
simply an alias for another class. This
class ppp is similar in concept to the
class ppp that uses the undocumented comtype directive. The class mmm is also
not found as its an alias for the class System.Object. no point cluttering up
this table with classes that do nothing.
Whenever we use the class
extern directive we should use the assembly directive also but we will not get
a error if we do not. Obviously there is internally one manifest module only
and all types exported should be present in this manifest.
The rational for this table
is that after some tool reads this table, he/she can figure out all the types
that others can use from this assembly. This manifest module will therefore
contain all the types exported from all the modules that make up the assembly.
Unfortunately this manifest module is also called the assembly.
Thus each time we create a
type in any module of an assembly, this table of ours gets one row added. In
other words, every type created in another module has a entry in the TypeDef
table and this row number will be placed in the Exported Type table. The
TypeDefID or typedefindex field in our case is always zero as mentioned earlier
unless of course we use the class directive.
The point we are making is
that this is the first time we are referring to a row in another module and
these are called foreign tokens. The fact of the matter is that the assembler
is too lazy to go the other modules and figure out what the type def row
indexes are.
The type name is stored in
the name and namespace fields and there was a plan to use the typedefid field
if the above search failed. The specs at one place say that the implementation
coded index can contain only the File and Exported Type table and at a another
place adds the Assembly Ref table also.
For the file table the
visibility mask has to be public and not nested public but we do not get an
error if we do make such a mistake. The specs also say that if the table is the
Exported type then the visibility mask has to be nested public. No error again
if we break the rule. Why have rules when no one checks for them.
Program14.csc
public void abc(string [] args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
}
public void
DisplayResources()
{
if ( ManifestResourceStruct
== null)
return;
for ( int ii = 1 ; ii < ManifestResourceStruct.Length
; ii++)
{
string flags =
GetManifestResourceAttributes(ManifestResourceStruct[ii].flags);
Console.WriteLine(".mresource
/*28{0}*/ {1}{2}" ,
ii.ToString("X6") , flags,
NameReserved(GetString(ManifestResourceStruct[ii].name)) );
Console.WriteLine("{");
string table =
GetManifestResourceTable(ManifestResourceStruct[ii].coded);
int index =
GetManifestResourceValue(ManifestResourceStruct[ii].coded);
if ( table ==
"AssemblyRef")
Console.WriteLine(" .assembly extern {0} /*23{1}*/ " ,
NameReserved(GetString(AssemblyRefStruct[index].name)) ,
index.ToString("X6"));
else if ( table ==
"File" && index > 0)
Console.WriteLine(" .file {0}/*26{1}*/ at 0x{2}" , NameReserved(GetString(FileStruct[index].name))
, index.ToString("X6") ,ManifestResourceStruct[ii].offset.ToString("X8")
);
else
Console.WriteLine(" // WARNING: managed resource file {0}
created",NameReserved(GetString(ManifestResourceStruct[ii].name) ) );
Console.WriteLine("}");
}
}
public int
GetManifestResourceValue(int manifiestvalue)
{
return
manifiestvalue>> 2;
}
public string
GetManifestResourceTable(int manifiestvalue)
{
string returnstring =
"";
short tag =
(short)(manifiestvalue & (short)0x03);
if ( tag == 0)
returnstring = returnstring
+ "File";
if ( tag == 1)
returnstring = returnstring
+ "AssemblyRef";
return returnstring;
}
public string
GetManifestResourceAttributes(int manifiestvalue)
{
string
returnstring="";
if ( (manifiestvalue &
0x001) == 0x001)
returnstring = returnstring
+ "public ";
if ( (manifiestvalue &
0x002) == 0x002)
returnstring = returnstring
+ "private ";
return returnstring;
}
e.il
.assembly e
{
}
.assembly extern a1
{
}
.file aa.dll
.mresource r1
{
}
.mresource public r1
{
}
.mresource private r2
{
.assembly extern a1
}
.mresource public r3
{
.file aa.dll at 12
}
.mresource public r4
{
.file aa.dll at 12
.assembly extern a1
}
.class zzz
{
.method static void abc()
{
.entrypoint
ret
}
}
Output
.assembly extern
/*23000001*/ a1
{
.ver 0:0:0:0
}
.assembly extern
/*23000002*/ mscorlib
{
.ver 0:0:0:0
}
.assembly /*20000001*/ e
{
.ver 0:0:0:0
}
.file /*26000001*/ aa.dll
.mresource /*28000001*/ r1
{
// WARNING: managed resource file r1 created
}
.mresource /*28000002*/
public r1
{
// WARNING: managed resource file r1 created
}
.mresource /*28000003*/ private
r2
{
.assembly extern a1 /*23000001*/
}
.mresource /*28000004*/
public r3
{
.file aa.dll/*26000001*/
at 0x0000000C
}
.mresource /*28000005*/
public r4
{
.assembly extern a1 /*23000001*/
}
The above program displays
all the resources that we have. We have added a function DisplayResources that displays all the rows
from the Manifest Resource table. This table gets filled up by the mresource
directive. The first field is the flags field which gives us the visibility
mask that is similar to the class extern directive.
The only difference is that
the two valid values are public and private. We use the function
GetManifestResourceAttributes to return one of these values. None of these
visibility attributes is mandatory. We display the visibility attributes along
with the mresource directive and the row number in the Manifest Resource table.
Nearly every table has a
coded index and the Resource table does not lag behind. In this case the coded
index either points to the File table if the value is 0 or assembly ref table
whose value is 1. The specs call this the Implementation coded index which also
included the Exported type table but for resources it is only the above two.
We use the
GetManifestResourceTable function to return the table name and the function
GetManifestResourceValue to return the row number after right shifting the
coded index by two bits. In the mresource directive we can either write the
Assembley extern directive or the file directive or both or none.
If we use the assembly
extern directive, then the index variable is the row number of the Assembly Ref
table and we display the name of the assembly. If we had had used the file
directive and this indexes a row number larger than or equal to 1, then we
display the file name along with the position of the file where the resource
starts.
The index variable is the
row number in the File table and the offset within the file is stored in the
offset field of the Resource table. Finally if we have used the file directive
and the index variable is zero, then within comments we display a warning
stating that a file with the same name as the mresource directive has been
created.
If we specify no directive,
we get no error and the above warning is displayed. If we use both, the assembly
directive is used and not the file directive. As always no error checks for the
file directive or the offset. An assembly can have lots of different data items
associated with it.
If we ever want to name a
item of data, we use the manifest resource to do so. If we do not have a
assembly directive, it is perfectly legal to use the mresource directive, but
we will not be able to execute the assembly.
The reason we specify public
or private for a manifest resource is so that the assembly knows whether this
item can be exported or seen outside this assembly or should remain visible
only within the assembly if it is flagged private.
If the resource is stored in
the file and that file is not a module, it can be a text file for example, then
we would need a separate file directive, declaring that file. In this case the
byte offset will be zero.
If the resource is defined
in another assembly, we would need to have a assembly extern directive at the
top level before we can use the assembly extern directive within the mresource
directive. The offset field is normally a valid offset which is relative from
the resource data directory entry in the COR header.
But as said earlier, this
error check is not done at present. If the index is null, it means that the
resource is stored in the current file and hence the warning.
Program15.csc
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
}
public void
DisplayModuleAndMore()
{
Console.WriteLine(".module
{0}" , NameReserved(GetString(ModuleStruct[1].Name)));
Console.Write("// MVID:
");
DisplayGuid(ModuleStruct[1].Mvid);
Console.WriteLine();
Console.WriteLine(".imagebase
0x{0}" , ImageBase.ToString("x8"));
Console.WriteLine(".subsystem
0x{0}" , subsystem.ToString("X8"));
Console.WriteLine(".file
alignment {0}" , filea);
Console.WriteLine(".corflags
0x{0}" , corflags.ToString("x8"));
Console.WriteLine("//
Image base: 0x03000000");
public void DisplayGuid (int
guidindex)
{
Console.Write("{");
Console.Write("{0}{1}{2}{3}",
guid[guidindex+2].ToString("X2") ,
guid[guidindex+1].ToString("X2") ,
guid[guidindex].ToString("X2") ,
guid[guidindex-1].ToString("X2"));
Console.Write("-{0}{1}-",guid[guidindex+4].ToString("X2")
, guid[guidindex+3].ToString("X2"));
Console.Write("{0}{1}-",guid[guidindex+6].ToString("X2")
, guid[guidindex+5].ToString("X2"));
Console.Write("{0}{1}-",guid[guidindex+7].ToString("X2")
, guid[guidindex+8].ToString("X2"));
Console.Write("{0}{1}{2}{3}{4}{5}",guid[guidindex+9].ToString("X2"),guid[guidindex+10].ToString("X2"),guid[guidindex+11].ToString("X2"),guid[guidindex+12].ToString("X2"),guid[guidindex+13].ToString("X2"),guid[guidindex+14].ToString("X2"));
Console.Write("}");
}
e.il
.assembly e
{
}
.module aaaa
.class zzz
{
.method static void abc()
{
.entrypoint
ret
}
}
output
.module aaaa
// MVID:
{EDBE9E84-F6DE-468C-B8CB-0CB099FD1EA4}
.imagebase
0x00400000
.subsystem
0x00000003
.file
alignment 512
.corflags
0x00000001
// Image
base: 0x03000000
This one is a small program
and all that it does is adds one more function call DisplayModuleAndMore to the
abc function. The .module directive is optional and this adds one record to the
Module table. As mentioned earlier we can have only one module directive and
not two and if we do not, one gets added for us automatically.
We simply display the
directive module and follow this with the words MVID. We then call a function
DisplayGuid that displays a guid for us. Every application needs to be uniquely
identified and the assembler gives it a unique 128 bit number stored in the
field Mvid. Each time we regenerate our assembly, this number changes.
The reason it is a 128 bit
number is because such a number is unique across time and space. The problem
with the guid is that it is displayed in a
certain manner and the function DisplayGuid simply displays the bytes
from the offset of the guid stream passed as a parameter.
For example we display the
third byte first followed by the second etc. This guid is calculated using the
ISO/IEC standard 11578:1996. The full form of a GUID is a Globally Unique
IDentifier. It is a concept used by CORBA and OLE in the past.
The VES (Virtual Execution
System or Runtime) does not make any use of the Guid but debuggers should use
this number to uniquely identify the module. The name of the file is not the
physical file name but the logical name stored in the metadata.
The module table is the
first one the designers of the metadata thought of as they gave it a number of
0. The generation field is reserved and has a value of zero. EncId and
EncBaseId are also reserved but they are indexes into th Guid heap and also
have a value of 0. Both Mvid and the name field have to have a non null value.
If you remember earlier we
had figured some instance variables like file alignment and subsystem. We are
simply displaying these values here. The Imagebase, subsystem and file
alignment variables we have already displayed earlier. The ImageBase is once
again displayed within comments with a constant value.
Some points that we missed
about these values we will explain now. The corflags directive was not written
by us and the CLI expects the value to be 1. For backwards compatibility the least
3 significant bits are reserved. Thus the values from 8 to 65535 will be used
by future versions.
The guys who create
experimental and or non standard versions have the blessings of the .Net
standard to use values larger than 65535. The subsystem directive is used only
when we execute the assembly and thus dll’s have no use for it. This may be a
32 bit number but it can have only two possible values.
A value of 2 means that the
program should be run using whatever conventions are fit for a GUI applications.
A value of 3 is for a console application. At this point in time there is no
third environment for a application to execute. The file alignment can only be
values that are a multiple of 512 bytes.
Program16.csc
string [] vtfixuparray;
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
DispalyVtFixup();
}
public void
ReadandDisplayVTableFixup()
{
if ( MethodStruct == null)
return;
if ( vtablerva != 0)
{
long save ;
long position =
ConvertRVA(vtablerva) ;
if ( position == -1)
return;
mfilestream.Position =
position;
Console.WriteLine("//
VTableFixup Directory:");
int count1 = vtablesize/8;
vtfixuparray = new
string[count1];
for ( int ii = 0 ; ii <
count1 ; ii++)
{
vtfixuparray[ii] =
".vtfixup ";
int fixuprva =
mbinaryreader.ReadInt32();
Console.WriteLine("// IMAGE_COR_VTABLEFIXUP[{0}]:" , ii);
Console.WriteLine("// RVA: {0}",fixuprva.ToString("x8"));
short count =
mbinaryreader.ReadInt16();
Console.WriteLine("// Count: {0}", count.ToString("x4"));
short type =
mbinaryreader.ReadInt16();
Console.WriteLine("// Type: {0}", type.ToString("x4"));
save = mfilestream.Position;
mfilestream.Position =
ConvertRVA(fixuprva) ;
int i1 ;
long [] val = new
long[count] ;
for ( i1 = 0 ; i1 < count
; i1++)
{
if ( (type&0x01) == 0x01)
val[i1] =
mbinaryreader.ReadInt32();
if ( (type&0x02) == 0x02)
val[i1] = mbinaryreader.ReadInt64();
if ( (type&0x01) == 0x01 )
Console.WriteLine("// [{0}] ({1})",i1.ToString("x4") ,
val[i1].ToString("X8"));
if ( (type&0x02) == 0x02)
Console.WriteLine("// [{0}] (
{1})",i1.ToString("x4") ,
(val[i1]&0xffffffff).ToString("X"));
}
mfilestream.Position = save;
vtfixuparray[ii] =
vtfixuparray[ii] + "[" + (i1).ToString("X") + "]
";
if ( (type&0x01) == 0x01)
vtfixuparray[ii] =
vtfixuparray[ii] + "int32 ";
if ( (type&0x02) == 0x02)
vtfixuparray[ii] =
vtfixuparray[ii] + "int64 ";
if ( (type&0x04) == 0x04)
vtfixuparray[ii] =
vtfixuparray[ii] + "fromunmanaged ";
vtfixuparray[ii] =
vtfixuparray[ii] + "at D_" + fixuprva.ToString("X8");
vtfixuparray[ii] =
vtfixuparray[ii] + " //";
for ( i1 = 0 ; i1 < count
; i1++)
{
if ( (type&0x01) == 0x01)
vtfixuparray[ii] =
vtfixuparray[ii] + " " +
val[i1].ToString("X8");
if ( (type&0x02) == 0x02)
vtfixuparray[ii] =
vtfixuparray[ii] + " " + val[i1].ToString("X16");
}
}
Console.WriteLine();
}
}
public void DispalyVtFixup()
{
if (vtfixuparray == null)
return;
for ( int ii = 0 ; ii <
vtfixuparray.Length ; ii++)
Console.WriteLine(vtfixuparray[ii]);
}
e.il
.class public a11111
{
.method public static
void adf() cil managed
{
.entrypoint
}
.method public int64 a1() cil managed
{
}
.method public int64 a2() cil managed
{
}
.method public int64 a3() cil managed
{
}
.method public int64 a4() cil managed
{
}
.method public int64 a5() cil managed
{
}
.method public int64 a6() cil managed
{
}
.method public int64 a7() cil managed
{
}
}
.vtfixup [1] int32 at
D_00008010
.vtfixup [1] int32
fromunmanaged at D_00008020
.vtfixup [1] int64 at
D_00008030
.vtfixup [1] int64
fromunmanaged at D_00008040
.vtfixup [0] int64 at
D_00008050
.vtfixup [2] int64 int64 at
D_00008060
.data D_00008010 = bytearray
( 01 00 00 06)
.data D_00008020 = bytearray
( 02 00 00 06)
.data D_00008030 = bytearray
( 03 00 00 06)
.data D_00008040 = bytearray
( 04 00 00 06)
.data D_00008050 = bytearray
( 05 00 00 06)
.data D_00008060 = bytearray
( 06 00 00 06 00 00 00 00 07 00 00 06 00 00 00 00)
Output
.vtfixup [1]
int32 at D_00004000 // 06000001
.vtfixup [1]
int32 fromunmanaged at D_00004004 // 06000002
.vtfixup [1]
int64 at D_00004008 // 0600000406000003
.vtfixup [1]
int64 fromunmanaged at D_0000400C // 0600000506000004
.vtfixup [0]
int64 at D_00004010 //
.vtfixup [2]
int64 at D_00004014 // 0000000006000006 0000000006000007
If you look at program7.csc
carefully, we displayed the directive vtfixup. If you looked at the output of
that program very carefully, you would have realized that it was all in
comments. We did not display the actual directive vtfixup at that time.
We have added a instance
array vtfixuparray and we also call a function DispalyVtFixup in the function
abc to display the directive. We have also added some more code in the
ReadandDisplayVTableFixup function that will populate the array vtfixuparray
and then display it later at the end.
As explained before the
count1 variable is a count of the items and we use this to give us an array of
the desired size. Creating an array of size is not an error and the array just
does not get created. We use the same variable count1 to also create the array
val to store the individual values.
Depending upon the type
being a int32 or int64 we read either 4 or 8 bytes into the corresponding val
array. We then create the entire string into the vtfixuparray and use the type
variable to determine the width of the table. The fixuprva gives us the data
address which we concatenate with a D_.
Then we place the comments
and after this we need the values that we wrote in the bytearray. These depend
upon the count value we specified in the square brackets. Thus we use the same
loop again and concatenate the bytearray values, reading either 4 or 8 bytes
depending as we said on the width of the table.
When we look at the
DispalyVtFixup function, it simply displays the members of the vtfixuparray.
Program17.csc
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
DispalyVtFixup();
DisplayTypeDefs();
}
public void DisplayTypeDefs
()
{
if ( TypeDefStruct.Length !=
2)
{
Console.WriteLine("//");
Console.WriteLine("//
============== CLASS STRUCTURE DECLARATION ==================");
Console.WriteLine("//");
writenamespace = true;
for ( int i = 2 ; i <
TypeDefStruct.Length ; i++)
{
if (
GetString(TypeDefStruct[i].name) == "_Deleted" && streamnames[0] == "#-")
{
continue;
}
if ( ! IsTypeNested(i) )
{
DisplayOneTypePrototype(i);
}
}
}
}
public bool IsTypeNested
(int typeindex)
{
if (NestedClassStruct ==
null)
return false;
for ( int ii = 1 ; ii< NestedClassStruct.Length ; ii++)
{
if ( NestedClassStruct[ii].nestedclass
== typeindex)
return true;
}
return false;
}
public void
DisplayOneTypePrototype (int typedefindex)
{
DisplayOneTypeDefStart(typedefindex);
DisplayNestedTypesPrototypes(typedefindex);
DisplayOneTypeDefEnd(typedefindex);
}
public void DisplayOneTypeDefStart
(int typerow)
{
string namespacename =
NameReserved(GetString(TypeDefStruct[typerow].nspace));
if ( namespacename != "")
{
if ( writenamespace )
{
Console.WriteLine(".namespace
{0}" , namespacename );
Console.WriteLine("{" );
spacefornamespace = 2;
spacesforrest = 4;
}
}
string typestring =
"";
if ( IsTypeNested(typerow))
typestring = typestring +
CreateSpaces(spacesfornested);
typestring = typestring +
CreateSpaces(spacefornamespace);
typestring = typestring +
".class /*02" + typerow.ToString("X6") + "*/ ";
string attributeflags =
GetTypeAttributeFlags(TypeDefStruct[typerow].flags , typerow);
Console.WriteLine("{0}{1}{2}"
, typestring , attributeflags ,
NameReserved(GetString(TypeDefStruct[typerow].name)));
string tablename = GetTypeDefOrRefTable(TypeDefStruct[typerow].cindex);
int index =
GetTypeDefOrRefValue(TypeDefStruct[typerow].cindex);
string typeextends =
"";
if ( tablename ==
"TypeRef" )
{
typeextends =
DisplayTypeRefExtends(index);
}
if ( tablename ==
"TypeDef" )
{
typeextends = GetNestedTypeAsString(index) +
DisplayTypeDefExtends(index);
}
if ( typeextends.Length !=
0)
{
typestring = "";
if ( IsTypeNested(typerow))
typestring = typestring +
CreateSpaces(spacesfornested);
typestring = typestring +
CreateSpaces(spacefornamespace);
typestring = typestring +
" extends " +
typeextends;
Console.WriteLine(typestring);
}
string interfacestring =
DisplayAllInterfaces(typerow);
if ( interfacestring.Length
!= 0)
{
typestring = "";
if ( IsTypeNested(typerow))
typestring = typestring +
CreateSpaces(spacesfornested);
typestring = typestring +
CreateSpaces(spacefornamespace);
typestring = typestring +
" implements " +
interfacestring;
Console.Write(typestring);
}
typestring = "";
if ( IsTypeNested(typerow))
typestring = typestring +
CreateSpaces(spacesfornested);
typestring = typestring +
CreateSpaces(spacefornamespace);
typestring = typestring +
"{";
Console.WriteLine(typestring);
}
public string
GetTypeAttributeFlags (int typeattributeflags , int typeindex)
{
string returnstring =
"";
int visibiltymask =
typeattributeflags & 0x07;
string
visibiltymaskstring="";
if ( visibiltymask == 0)
visibiltymaskstring =
"private ";
if ( visibiltymask == 1)
visibiltymaskstring =
"public ";
if ( visibiltymask == 2)
visibiltymaskstring =
"nested public ";
if ( visibiltymask == 3)
visibiltymaskstring =
"nested private ";
if ( visibiltymask == 4)
visibiltymaskstring =
"nested family ";
if ( visibiltymask == 5)
visibiltymaskstring =
"nested assembly ";
if ( visibiltymask == 6)
visibiltymaskstring =
"nested famandassem ";
if ( visibiltymask == 7)
visibiltymaskstring =
"nested famorassem ";
int classlayoutmask =
typeattributeflags & 0x18;
string classlayoutstring =
"";
if ( classlayoutmask == 0)
classlayoutstring =
"auto ";
if ( classlayoutmask ==
0x08)
classlayoutstring =
"sequential ";
if ( classlayoutmask ==
0x10)
classlayoutstring =
"explicit ";
string interfacestring =
"";
if ( (typeattributeflags
& 0x20) == 0x20)
interfacestring = "interface ";
string abstractstring =
"";
if ( (typeattributeflags
& 0x80) == 0x80)
abstractstring = "abstract ";
string sealedstring =
"";
if ( (typeattributeflags
& 0x100) == 0x100)
sealedstring = "sealed ";
string specialnamestring =
"";
if ( (typeattributeflags
& 0x400) == 0x400)
specialnamestring = "specialname ";
string importstring =
"";
if ( (typeattributeflags
& 0x1000) == 0x1000)
importstring = "import ";
string serializablestring =
"";
if ( (typeattributeflags
& 0x2000) == 0x2000)
serializablestring =
"serializable ";
int stringformatmask =
typeattributeflags & 0x30000;
string stringformastring =
"";
if ( stringformatmask == 0)
stringformastring =
"ansi ";
if ( stringformatmask ==
0x10000)
stringformastring =
"unicode ";
if ( stringformatmask ==
0x20000)
stringformastring =
"autochar ";
string beforefieldinitstring
= "";
if ( (typeattributeflags
& 0x00100000) == 0x00100000)
beforefieldinitstring =
"beforefieldinit ";
//string rtspecialnamestring
= "";
//if ( (typeattributeflags
& 0x800) == 0x800)
//rtspecialnamestring =
"rtspecialname ";
if ( IsTypeNested(typeindex)
)
returnstring =
interfacestring + abstractstring + classlayoutstring + stringformastring + serializablestring + sealedstring +
importstring + visibiltymaskstring +
beforefieldinitstring;
else
returnstring =
interfacestring + visibiltymaskstring + abstractstring + classlayoutstring +
stringformastring + importstring + serializablestring + sealedstring +
specialnamestring + beforefieldinitstring ;
return returnstring;
}
public string
DisplayTypeDefExtends (int typedefindex)
{
if ( typedefindex == 0)
return "";
string name =
NameReserved(GetString(TypeDefStruct[typedefindex].name));
string returnstring =
NameReserved(GetString(TypeDefStruct[typedefindex].nspace));
if ( returnstring.Length !=
0)
returnstring = returnstring
+ ".";
returnstring = returnstring
+ name + "/* 02" +
typedefindex.ToString("X6") + " */";
return returnstring;
}
public string
GetNestedTypeAsString(int rowindex)
{
string netsedtypestring =
"";
string
namespaceandnameparent2 = "";
string
namespaceandnameparent3= "";
if ( IsTypeNested(rowindex)
)
{
int rowindexparent =
GetParentForNestedType(rowindex);
if (
IsTypeNested(rowindexparent) )
{
int rowindexparentparent =
GetParentForNestedType(rowindexparent);
if (
IsTypeNested(rowindexparentparent) )
{
int rowindexp3 =
GetParentForNestedType(rowindexparentparent);
string nameparent3 =
NameReserved(GetString(TypeDefStruct[rowindexp3].name));
namespaceandnameparent3=
NameReserved(GetString(TypeDefStruct[rowindexp3].nspace));
if (
namespaceandnameparent3.Length != 0)
namespaceandnameparent3 =
namespaceandnameparent3 + ".";
namespaceandnameparent3=
namespaceandnameparent3 + nameparent3 + "/* 02" +
rowindexp3.ToString("X6") + " *//";
}
string nameparent2 =
NameReserved(GetString(TypeDefStruct[rowindexparentparent].name));
namespaceandnameparent2 =
NameReserved(GetString(TypeDefStruct[rowindexparentparent].nspace));
if (
namespaceandnameparent2.Length != 0)
namespaceandnameparent2 =
namespaceandnameparent2 + ".";
namespaceandnameparent2 =
namespaceandnameparent3 + namespaceandnameparent2 + nameparent2 + "/*
02" + rowindexparentparent.ToString("X6") + " *//";
}
string nameparent1 =
NameReserved(GetString(TypeDefStruct[rowindexparent].name));
netsedtypestring =
NameReserved(GetString(TypeDefStruct[rowindexparent].nspace));
if ( netsedtypestring.Length
!= 0)
netsedtypestring =
netsedtypestring + ".";
netsedtypestring =
namespaceandnameparent2 + netsedtypestring + nameparent1 + "/* 02" +
rowindexparent.ToString("X6") + " *//";
}
return netsedtypestring;
}
public int
GetParentForNestedType (int typeindex)
{
int ii = 0;
if ( NestedClassStruct ==
null)
return 0;
for ( ii = 0 ; ii <
NestedClassStruct.Length - 1 ; ii++)
{
if ( typeindex ==
NestedClassStruct[ii].nestedclass )
break;
}
return
NestedClassStruct[ii].enclosingclass;
}
public string
DisplayTypeRefExtends (int typerefindex)
{
string returnstring =
"";
int resolutionscope =
TypeRefStruct[typerefindex].resolutionscope;
string resolutionscopetable
= GetResolutionScopeTable(resolutionscope);
int resolutionscopeindex =
GetResolutionScopeValue(resolutionscope);
string dummy = "";
if ( resolutionscopetable ==
"Module")
{
}
if ( resolutionscopetable ==
"AssemblyRef")
{
returnstring = "["
+ NameReserved(GetString(AssemblyRefStruct[resolutionscopeindex].name)) ;
returnstring = returnstring
+ "/* 23" + resolutionscopeindex.ToString("X6") + "
*/]";
}
if ( resolutionscopetable ==
"ModuleRef")
{
returnstring =
"[.module " +
NameReserved(GetString(ModuleRefStruct[resolutionscopeindex].name)) ;
returnstring = returnstring
+ "/* 1A" + resolutionscopeindex.ToString("X6") + "
*/]";
}
if ( resolutionscopetable ==
"TypeRef")
{
int resolutionscopeindex1 =
GetResolutionScopeValue(TypeRefStruct[resolutionscopeindex].resolutionscope );
string resolutionscopetable1
= GetResolutionScopeTable(TypeRefStruct[resolutionscopeindex].resolutionscope
);
if ( resolutionscopetable1
== "AssemblyRef")
{
dummy = "[" +
NameReserved(GetString(AssemblyRefStruct[resolutionscopeindex1].name)) +
"/* 23" + resolutionscopeindex1.ToString("X6") + "
*/]";
string nspace1 =
NameReserved(GetString(TypeRefStruct[resolutionscopeindex].nspace));
if ( nspace1 !=
"")
nspace1 = nspace1 +
".";
dummy = dummy + nspace1 +
NameReserved(GetString(TypeRefStruct[resolutionscopeindex].name)) + "/*
01" + resolutionscopeindex.ToString("X6") + " *//";
}
}
int namespaceindex =
TypeRefStruct[typerefindex].nspace;
string nspace =
NameReserved(GetString(namespaceindex));
returnstring = returnstring
+ nspace ;
if ( nspace.Length != 0)
returnstring = returnstring
+ ".";
int nameindex =
TypeRefStruct[typerefindex].name;
returnstring = dummy +
returnstring + NameReserved(GetString(nameindex)) + "/* 01" +
typerefindex.ToString("X6") + " */";
return returnstring;
}
public string
DisplayAllInterfaces (int typeindex)
{
string returnstring =
"";
if ( InterfaceImplStruct ==
null || InterfaceImplStruct.Length == 1)
return "";
for ( int i = 1 ; i <
InterfaceImplStruct.Length ; i++)
{
if ( typeindex ==
InterfaceImplStruct[i].classindex )
{
string codedtablename =
GetTypeDefOrRefTable(InterfaceImplStruct[i].interfaceindex);
int interfaceindex =
GetTypeDefOrRefValue(InterfaceImplStruct[i].interfaceindex);
string interfacename =
"";
if ( codedtablename ==
"TypeRef" )
interfacename = DisplayTypeRefExtends(interfaceindex);
if ( codedtablename ==
"TypeDef" )
interfacename = GetNestedTypeAsString(interfaceindex) +
DisplayTypeDefExtends(interfaceindex);
returnstring = returnstring
+ interfacename;
bool nextclassindex ;
if ( i ==
(InterfaceImplStruct.Length - 1))
nextclassindex = false;
else if ( typeindex !=
InterfaceImplStruct[i+1].classindex )
nextclassindex = false;
else
nextclassindex = true;
if ( nextclassindex )
returnstring = returnstring
+ ",\r\n " +
CreateSpaces(spacefornamespace+spacesfornested);
else
returnstring = returnstring
+ "\r\n";
}
}
return returnstring;
}
public string
GetTypeDefOrRefTable (int codedvalue)
{
string returnstring =
"";
short tag =
(short)(codedvalue & (short)0x03);
if ( tag == 0)
returnstring = returnstring
+ "TypeDef";
if ( tag == 1)
returnstring = returnstring
+ "TypeRef";
if ( tag == 2)
returnstring = returnstring
+ "TypeSpec";
return returnstring;
}
public int
GetTypeDefOrRefValue(int codedvalue)
{
return codedvalue >>
2;
}
public void
DisplayNestedTypesPrototypes (int typedefindex)
{
if (NestedClassStruct ==
null)
return ;
for ( int ii = 1 ; ii <
NestedClassStruct.Length ; ii++)
{
if
(NestedClassStruct[ii].enclosingclass == typedefindex)
{
spacesfornested += 2;
DisplayOneTypePrototype(NestedClassStruct[ii].nestedclass );
spacesfornested -= 2;
}
}
}
public void
DisplayOneTypeDefEnd (int typeindex )
{
string dummy = "";
if ( IsTypeNested(typeindex)
)
dummy = dummy +
CreateSpaces(spacesfornested);
dummy = dummy +
CreateSpaces(spacefornamespace);
dummy = dummy + "} //
end of class ";
string classname =
NameReserved(GetString(TypeDefStruct[typeindex].name));
dummy = dummy + classname ;
Console.WriteLine(dummy);
string namespacename =
NameReserved(GetString(TypeDefStruct[typeindex].nspace));
Console.WriteLine();
if ( namespacename != "")
{
string nspace1 =
NameReserved(GetString(TypeDefStruct[typeindex].nspace));
int ii;
for ( ii = typeindex + 1 ;
ii < TypeDefStruct.Length - 1 ; ii++)
{
if ( IsTypeNested(ii) )
continue;
break;
}
string nspace2 =
"";
if ( ii !=
TypeDefStruct.Length )
nspace2 =
NameReserved(GetString(TypeDefStruct[ii].nspace));
if ( nspace1 != nspace2 )
{
if ( lasttypedisplayed ==
typeindex && notprototype )
{
Console.WriteLine();
Console.WriteLine("//
=============================================================");
Console.WriteLine();
placedend = true;
}
Console.Write("}");
Console.WriteLine(" //
end of namespace {0}", namespacename);
spacefornamespace = 0;
spacesforrest = 2;
writenamespace = true;
Console.WriteLine();
}
else
writenamespace = false;
}
}
e.il
.class public abstract
beforefieldinit a1
{
.method public static void
Main()
{
.entrypoint
}
}
.class private sequential
unicode explicit sealed specialname rtspecialname interface _Deleted
{
}
.namespace n1
{
.namespace n2
{
.class public autochar a2
{
.class public a3
{
.class a33
{
.class a333
{
.class a3333
{
}
}
}
}
.class a4
{
}
}
}
}
.class a5
{
}
.class a6 extends a5
{
}
.class a7 extends
n1.n2.a2/a3/a33/a333
{
}
.class a71 extends
n1.n2.a2/a3
{
}
.class a72 extends
n1.n2.a2/a3/a33
{
}
.class a8 extends
[mscorlib]aaa
{
}
.class a9 extends
[mscorlib]ppp.aaa/aa
{
}
.module extern bb
.module e.exe
.class a10 extends [.module
bb]a11
{
}
.class a12 extends [.module
e.exe]a13
{
}
.assembly ee
{
}
.class a14 extends [ee]a13
{
}
.class interface a15
{
}
.class interface a16
{
}
.class a17 implements
a15,a16,[mscorlib]aa
{
}
Output
.module extern bb
/*1A000001*/
.assembly extern
/*23000001*/ mscorlib
{
.ver 0:0:0:0
}
.assembly extern
/*23000002*/ ee
{
.ver 0:0:0:0
}
.assembly /*20000001*/ ee
{
.ver 0:0:0:0
}
.module e.exe
// MVID:
{DB25BB73-A933-4B68-9124-8F5135AC4D55}
.imagebase 0x00400000
.subsystem 0x00000003
.file alignment 512
.corflags 0x00000001
// Image base: 0x03000000
//
// ============== CLASS
STRUCTURE DECLARATION ==================
//
.class /*02000002*/ public
abstract auto ansi beforefieldinit a1
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
} // end of class a1
.class /*02000003*/
interface private abstract explicit unicode sealed specialname _Deleted
{
} // end of class _Deleted
.namespace n1.n2
{
.class /*02000004*/ public auto autochar a2
extends [mscorlib/* 23000001 */]System.Object/* 01000001
*/
{
.class /*02000005*/ auto ansi nested public a3
extends [mscorlib/* 23000001 */]System.Object/* 01000001
*/
{
.class /*02000006*/ auto ansi nested private a33
extends [mscorlib/* 23000001 */]System.Object/*
01000001 */
{
.class /*02000007*/ auto ansi nested private a333
extends [mscorlib/* 23000001 */]System.Object/*
01000001 */
{
.class /*02000008*/ auto ansi nested private a3333
extends [mscorlib/* 23000001 */]System.Object/*
01000001 */
{
} // end of class a3333
} // end of class a333
} // end of class a33
} // end of class a3
.class /*02000009*/ auto ansi nested private a4
extends [mscorlib/* 23000001 */]System.Object/* 01000001
*/
{
} // end of class a4
} // end of class a2
} // end of namespace n1.n2
.class /*0200000A*/ private
auto ansi a5
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
} // end of class a5
.class /*0200000B*/ private
auto ansi a6
extends a5/* 0200000A */
{
} // end of class a6
.class /*0200000C*/ private
auto ansi a7
extends n1.n2.a2/* 02000004 *//a3/* 02000005 *//a33/*
02000006 *//a333/* 02000007 */
{
} // end of class a7
.class /*0200000D*/ private
auto ansi a71
extends n1.n2.a2/* 02000004 *//a3/* 02000005 */
{
} // end of class a71
.class /*0200000E*/ private
auto ansi a72
extends n1.n2.a2/* 02000004 *//a3/* 02000005 *//a33/*
02000006 */
{
} // end of class a72
.class /*0200000F*/ private
auto ansi a8
extends [mscorlib/* 23000001 */]aaa/* 01000002 */
{
} // end of class a8
.class /*02000010*/ private
auto ansi a9
extends [mscorlib/* 23000001 */]ppp.aaa/* 01000003 *//aa/*
01000004 */
{
} // end of class a9
.class /*02000011*/ private
auto ansi a10
extends [.module bb/* 1A000001 */]a11/* 01000005 */
{
} // end of class a10
.class /*02000012*/ private
auto ansi a12
extends a13/* 01000006 */
{
} // end of class a12
.class /*02000013*/ private
auto ansi a14
extends [ee/* 23000002 */]a13/* 01000007 */
{
} // end of class a14
.class /*02000014*/
interface private abstract auto ansi a15
{
} // end of class a15
.class /*02000015*/
interface private abstract auto ansi a16
{
} // end of class a16
.class /*02000016*/ private
auto ansi a17
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
implements a15/* 02000014 */,
a16/*
02000015 */,
[mscorlib/* 23000001 */]aa/* 01000008 */
{
} // end of class a17
We now add the second last
function call to the function DisplayTypeDefs in the abc function. This
function simply displays the prototypes of all the classes or types that we
have created. In C# we may have types like enums, interfaces or structures, but these are alien concepts
in the IL world.
These types are all
represented by the class directive. An enum is in the IL world is a sealed
class that extends the System.Enum class. A struct is the same but instead
extends from the System.ValueType class. Finally a interface has a interface
attribute. Thus in the IL world we use the class directive to represent all
types.
In the IL world we are
allowed to have methods that are global and thus unlike the language C# we are
permitted to create a valid il file with no type defined. Each time we assemble
a il file, a type called <Module> within a null namespace gets
automatically created. Thus our TypeDef table will have contain at least one type come what happens.
If you remember we have one
more member in our array and thus we can only display a type if the length of
our array that stores the type is large than 2. There are valid cases where the
il file has only global functions and thus only type called Module that does
not have to be displayed.
We first display the heading
class structure declaration and then initialize a variable writenamespace to
true. This instance variable will be used to determine whether the directive
namespace be displayed or not.
We then iterate through the
for loop but the loop variable here starts from 2 and not 1 as the dummy class
Module is disliked by ildasm which does not display it. If the name of the
class is _Deleted and the stream name is #- and not #~, ildasm does not display
such a stream. If you remember the older assemblers called the main stream #-
and not #~.
This is what happens when
you stress test the code to catch all exceptions. Yet we are not sure whether
we have a working disassembler. A class within a class is a nested class and
the function IsTypeNested checks whether the class is a nested class. It
returns true if the class is nested.
Lets first take a peek at
what this function is all about. If you first take a quick peek at the e.il
file, you will realize that the class a1 ahs a class a3 defined within it. This
means that class a3 is nested within class a2. It however does not stop there
and class a33 is aloes created within class a3.
This means that class a33 is
nested within class a3. To make matters worse class a333 is nested within class
a33. The class a2 has one more nested class a4 which has no more classes nested
within it. Thus we are allowed to nest as many classes within each other to our
hearts content.
Each type we create is
stored in the TypeDef table. The Nested classes table has only two fields
called the Nested class and the enclosing class field which both are indexes
into the TypeDef table. The nested class is defined as a class that is
placed inside or within the text of the
enclosing class.
Thus the nested class field
is the name of the nested type and the enclosing class field is the class
within which the nested class resides in. As a example the class a2 is the
enclosing class and the a3 class the nested class.
However there will also we a
record where the class a3 will be the enclosing or parent class and the class
a33 the nested class. All that we do in this function is simply loop though
each record of the nested class table and if the type passed as the parameter
matches the nestedclass field we return a true.
If we quit out of the loop
we know that no match has been found, we return a false. The two must index
valid rows in the TypeDef table. Both the fields cannot index the TypeRef table
where we store all the types that we are referring to but exist somewhere else.
Obviously duplicate rows are not allowed.
A nested type can have only
one parent or enclosing type. Thus the nestedclass table cannot have two rows
with the same value for the nestedclass field but differing enclosing class
field values as that would mean that it is nested within two classes which is
not possible.
The class a2 will appear
twice in the nested class table as the value of the enclosing class field as it
contains two nested classes a3 and a4. The nestedclass fields have different
values however. The display of the class prototype is carried out by a function
DisplayOneTypePrototype.
We will only call this
function if the class is not a nested class. Nested classes do have an
independent existence and they are displayed within their enclosing class.
Nested classes are displayed by their parent class but the TypeDef table does
not distinguish between nested and no nested classes, each is offered one row
each.
Thus there is no column in
the typedef table that tells us that we have a nested class in our hands. The
only way we know that is class is nested is if we look at the nestedclass field
of the nested class table.
Function call after function
call, are we out of our minds or are we going to get a award for the maximum
use of functions in our code. The answer is as usual somewhere in the middle.
The point is that we are going to display the type prototypes twice, once know,
once again when we display the types with all sorts of things like fields,
methods etc within the types.
No point in writing the same
code twice. Thus the function DisplayOneTypePrototype calls three other
functions to do the job. The first DisplayOneTypeDefStart displays the class
name and the attributes. Before we display the type we need to figure out
whether it part of a namespace or not.
In the il file we have
placed the type a2 within two namespace directives, but ilasm very smartly
converts in into a single namespace with a dot within it. We use the string
namespacename to store the namespace name or null if one does not exist. We
then speck the value of this variable is not null and then write the namespace
directive.
As this is a top level
directive, it needs to spaces in front of it. We then place the open braces and
then initialize two very crucial variables.
The spacefornamespace variable is either 2 or its default 0. If the type
is within a namespace, then we have to add two spaces for indentation and if
there is no namespace, then the spaces for a namespace is zero.
The spacesforrest will be
explained in a short while. The variable writenamespace is true and hence the
inner if statement is true. We will explain the use of this variable later.
Most of the time we will not
use the WriteLine function to write something out but use a string to store
what we want to write and then use a single WriteLine to write the entire
string out for us. This is where we use
the typestring to store the entire string for the class directive.
We first need to know
whether our class is a nested class or not. If it is then we add to the
typestring variable the number of spaces stored in the spacesfornested
variable. This variable has a value of 2 at present but we will show you later
that each time we come across a nested class we increase this variable by 2.
We then add the number of
spaces zero or 2 depending whether the type has a namespace or not. Remember
this variable spacefornamespace has a value of zero or two only. Now that we
have written out the number of initial spaces we write out the words class and
then the table name plus the row number or what is called a token within
comments.
This token only gets written
out if the /ALL option is specified to the disassembler and is very useful to
as it tells us the table and row number the entity belongs to. Everything in
the metadata is stored in tables and we are simply displaying what is there in
these tables. Each type has lots of attributes and the we the function
GetTypeAttributeFlags to return all the attributes as string.
The flags member of the
TypeDef table is a bit mask of all the attributes and we also pass the row of
the type. Lets move on to this function to figure out how the bit mask is
decoded.
The designers of the type
attributes we were very meticulous. Not only do they specify which bits signify
what attributes but they also created logical groups for them. Thus the first
three bits specify the visibility attributes. We first bit wise and with 7 to
cull out the first three bits and then check which bits are on or off.
Thus we have 8 visibility
values and each one is mutually exclusive of the other. Thus both private and
public will not be on together. Strictly speaking this sets of bits are called
visibility and accessibility. A type that is not nested within a another type
will either we public or private and it will have no accessibility.
A programming language may
have introduce any visibility attributes but in IL a type can only be private
or public. Nested types instead have no
visibility but have one of six accessibility attributes. These are nested
assembly, nested family and assembly or famandassem, nested family or assembly
or famorassem, simply nested family or assembly or finally nested public or
private.
Family is protected in C#
where the derived classes have accessibility and assembly is restricts it to
classes in the same assembly or file. As always in life there are defaults,
private for non nested types and for nested types nested private.
Along with visibility and
accessibility we have one more concept called hiding or better still method
name hiding. Hiding controls which method names that we get or inherit from a
base type are available to the compiler for compile time name binding.
A nested type can have no
visibility attributes as this privilege is reserved for top level types only.
Here we have only two possibilities, visible to types within the same assembly
private or public visible to types anywhere in the world or forget about the
assembly it resides in.
Nested types are different,
as the accessibility further refines which set of methods can access this type.
The visibility is decided by the parent or enclosing or top level type. This
means that the nested type can not be more visible than the enclosing type.
This makes sense as the enclosing type cannot be private the nested type
public.
If we do specify the
enclosing type as assembly, even though the nested type is public, the
enclosing type decides and the nested type is only available within the
assembly. However if the enclosing type is public to be seen everywhere, but if
the nested type is private, the wishes of the nested type hold and it is not
visible outside the assembly.
The same logical model is
used to describe both the top level and nested types. The word class is what we
have been using all our lives and we should be using type instead. A interface,
structures or value types are not classes and hence we should use the broader
term type instead of class. Old habits die hard.
Hiding does not apply to
types but to the methods of a type and is compile time phenomena not a runtime.
The Common Type System offers us one bit to distinguish between the two
mechanism of Hiding. The first is hide by name where by specifying a method of
a certain name hides all method with that name from all derived classes.
The more complex one is name
and signature where the data types of the parameters passed decides the name of
the method. Thus two methods having the same name but different parameters as
in type and number are treated as different methods. There is no runtime
support for hiding.
The CLI treats all method as
if they used the name and signature method of hiding. The newslot attribute is
what is used to specify hide by name only. The next set of attributes are the
type layout attributes. We do what we did earlier, mask of the bits that
represent this logical family and then check which bits are on.
The type layout attributes
take any of three values, auto, explicit and sequential. If these bits are
zero, then the auto attribute is on. The only way we can determine this is by
masking off all the other bits but those that represent the type attributes. We
cannot check the byte for zero as the private visibility mask also has a value
of zero.
The type layout decides who
is in charge of arranging the fields of an instance of a type. The type can
have only one of these type attributes. If we specify no attribute, the default
is auto. This auto attribute tells the CLI that the programmer does not take
the responsibility of laying out the fields in memory.
Let the CLI place the fields
wherever it wants and the user will not lay down any conditions at all. The
only problem with this way is that we lack flexibility and at times we would
want to decide how things are laid out at the dinner table or in memory. At
these times we use the explicit attribute where we decide where the fields will
be laid out in memory.
The last option is
sequential where we let the CLI lay the fields out with one small condition.
The fields should be placed one after the after in memory and the metadata
tables decide which fields come in what order. We should normally in our
interest let the CLI decide how to lay fields in memory.
We should use sequential for
languages like C/C++ as we get the best of both worlds. We get verifiable
output and we also follow the rules of these languages also. The last option is
explicit where we are the masters of the memory layout. As we told you earlier
a type can be a class or value type or interface.
The type semantics attribute
tells us just that and it is single bit
which if on tells us that we have an interface. If the type is derived from
System.Value either directly or indirectly a value type and if none of the
above is true, a simple class. The size of a value type at runtime is 1 MB or
0x100000 and the implementation we are using takes us up to 0x3f0000 and this
value may be reduced.
Honestly why do people want
such large value type is beyond out understanding. A value type is a class that
becomes a value type for reasons of efficiency. The basic C# classes like int
and char are all value types. We now
have two inheritance attributes abstract or sealed.
These attributes are not
mutually exclusive and thus we have two separate strings to hold their values.
A abstract class cannot be used directly i.e. we cannot instantiate it using
new. This is because it contains functions and unless we implement these in a
derived class we cannot use the abstract class.
We use abstract classes when
we want to user to implement some methods before using our class. Thus an
abstract type must contain abstract methods that the derived class has to
supply code for. Sealed classes are a
different kettle of fish. These classes cannot be derived from or have
subclasses.
One of the reasons of doing
this is efficiency as the compiler knows that no one can derive from these
classes and thus generate more efficient code. Also at times I do want my classes
to be tampered with and thus by using the sealed attribute I let no programmer
modify my code. Use as is or don’t.
Virtual functions are used
so that the derived classes can override them and in a sealed class they become
common instance methods as there are no derived class to override them. The big
problem with sealed classes is that they stop the user from extending the class
hierarchy.
One reason to use sealed is
when we have a class that implements different interfaces becomes
interdependent on implementation issues that will not be visible to sub
classes. A type that is both abstract and sealed should have only static
members as the abstract is not a usable class and the sealed does not let us
derive from the class.
This is also what is called
a namespace in some languages. We now come to the three interoperation
attributes ansi, autochar or unicode. We are running in what is called managed
code and the earlier programs we ran in C/C++ before the advent of .Net were
running in unmanaged code.
We would like to call
unmanaged code from managed code and thus these attributes tell the system how
to deal with strings. Thus is the return type or a parameter is of type string,
does any specific conversion need to be done. This conversion is called marshalling.
Once again these values are mutually exclusive and the default us ansi.
This means that the
marshalling will done from either side as ansi strings. The unicode attribute
specifies that string both sides will be in unicode. Finally the best is
autochar that will use either unicode or ansi depending upon the platform we
are running. All decent platforms today support unicode as computer people have
realized that the world does not speak English forget about good English.
Finally we come to the
special handling attributes that are four in number and can be combined in any
manner that we desire. These attributes are also meant for the tools because if
the CLI treats an item in a different or special way, why hide it from the
tools.
The special name attribute
means that the item name is special not only to the CLI but also to tools. One
example is the name .ctor which stands for a non static constructor. The
attribute rtspecialname is like special name but with a small difference. It
very clearly means that the CLI understands this item.
There are no types as of
today that will be marked with this attribute as it is reserved for future use.
Any item that will be marked rtspecialname
will also be marked special name. We have placed this code in comments
as if we use this attribute, the assembler ignores it.
When a static method is
called from a type we do not have to create a instance of the type. By using
the beforefieldinit attribute we are telling the CLI that it need no initialize
the type when a static method is called. The default is that it does initialize
the type. Serialization is the art of writing data to disk or a data stream.
By specifying the
serializable attribute we are allowing the CLI serializer to write the type to
a data stream. Finally the import attribute tells the CLI that this type is
imported from a COM type library. Before the .Net came on, the mantra at
Microsoft was ActiveX which was based on COM or the Component Object Model.
Now that we have individual
strings from each logical family, we need to display them in a certain order.
This order specifies that the interface attribute comes first and the
beforefieldinit attribute last. The attributes are not displayed in the order
that we wrote them. Also the nested types follow a different order. It took us
a long time to figure out this order.
Now that we have all the
attributes, we write out the name of the type along with the attributes. Every type derives from a type and if do not
specify one, the type derives from System.Object in the assembly mscorlib. Also
a type can implement from a number of interfaces but it can derive from a
single type only.
We use the extends keyword
to specify the base type. In the same vein the implements keyword is used for
interfaces. These words are what are also used in Java, the arch enemy of the
.net world. If the interface has say five functions, the type implementing this
interface has to implement all the five functions before it can be
instantiated.
Thus the type has to fulfill
a contract of implementing all the method specified in the interface before the type can be used. The field
extends or whet we call cindex tells us which class this type extends from. An
extends specifies only one type whereas the implements clause can specify many
interfaces.
These two bits of
information are kept separately in the metadata. The cindex field specifies a
TypeDefOrRef Coded index. This coded index specifies one of three tables,
TypeDef, TypeRef or the TypeSpec table. In this case it cannot specify the
TypeSpec table as that table ahs a single field and there is no name attached
to the type.
Thus we cannot use these
types found in the TypeSpec table for the extends clause. We use the GetTypeDefOrRefTable to
figure which table the coded index points to. If the code index refers to the
TypeRef table, it means that type used is defined in another assembly and had
it been the TypeDef table, the same assembly.
We use the functions
DisplayTypeRefExtends and DisplayTypeDefExtends to figure out the name of the
class. Over to the DisplayTypeRefExtends function first.
If you have not noticed
every coded index function comes in a pair, the name ending with table gives us
the coded index table, the name ending with Value gives us the coded index
table row number. The function DisplayTypeRefExtends is a pretty large and thus
we advise to have a big hot cop of coffee before you start reading.
You have been warned. This
function is passed the type ref row number as whenever we use a type defined
somewhere else, the type ref table gets a row added. This type ref row has a
coded index called the resolution scope.
This coded index points to
four tables, Module, ModuleRef, AssemblyRef and itself, TypeRef. In spite of it
pointing to four tables we will explain why we have only checked for three
tables, ModuleRef, TypeRef and AssemblyRef. The first value we check is
AssemblyRef as this should be the most common.
When you look at class a8 we
are extending it from the class aaa in the mscorlib assembly. The
resolutionscopeindex variable now points to a row number in the AssemblyRef
table and we display the name of the assembly ref struct and the row number in
square brackets. The class a9 also meets the same fate but this class is a
nested class.
The class a10 and a12 are slightly similar. In the case of class
a10 we are specifying that the external class a11 is in a module called bb.
This module is defined by using the module extern directive. Thus in the square
brackets we can either specify a assembly ref or a external module.
The class can be in the
present in the same assembly but in a separate module. In this case we write
out the words .module and use the variable resolutionscopeindex as index into
the ModuleRef table. We also write out the table number as 1A. Thus the above
two coded index tables are similar in concept only the table changes.
We come to class a12 where
we specify the module directive again but use the name of the module as the
name of the current module e.exe. The assembler is smart enough to know that we
are referring to the same module that we are in and ignores our module
directive.
Thus the resolutionscope
index can have a coded index of Module but we have to write no code to handle
it. Remember if something is in the same module we do not have to qualify it.
We only need to specify where an entity is if it is somewhere else. This
somewhere else can be a different module than the one we are in or a another
assembly.
Now lets look at the fourth
table TypeRef which means that coded index is pointing to a row in itself. This
happens in the case of class a9 which extends from a nested class in the
assembly ref mscorlib. The class a8 also extends from the class aaa in the
assembly mscorlib but it is not nested the coded index table is straight
forward.
Now that we have a nested
class the coded index table will be a typeref. When we go to that row in the
TypeRef table, we will once again pick up the coded index and in this specific
case it will be the Assembly Ref table as our nested class is in the mscorlib
assembly.
We first write out the
assembly name like before and then figure out whether the class aaa belongs to
a namespace or not. The field nspace tells us the name of the namespace but we
use the earlier coded index row variable resolutionscopeindex and not the
second one. Thus variable nspace1 will contain the value ppp.
We then add a dot as the
class namespace separator only if the class aaa belongs to a namespace. We next
display the row number of the class aaa that happens to be 3 in this case. Then
we place a / because the next class following is the nested class and separator
between nested classes is a single /.
The problem with writing
code is that there are too many possibilities to take care of. If the first
coded index table is TypeRef, then we have to check for the other tables also.
If we place a nested class with module, then the second coded index table will
have the values of Module or ModuleRef.
We have not written code for
these cases as we have left it to you to implement. Thus a class statement as
.class a99 extends [.module bb]ppp.aaa/aa {} will not work as the second coded
index will now be ModuleRef and we have not handled such a case.
The mistake we made is that
we should have had a function that handles a resolution coded index. Then we
could call this function each time. The reason we do not is that it would make
the program more difficult to understand.
Finally if we write a line
like .class a99 extends [mscorlib]ppp.aaa/aa/bb we will also get an error as
the second coded index table will be a TypeRef. Thus each nested within nested
class will keep giving us a row in the TypeRef table.
What we are saying is that
each nested class within a nested class is a separate row in the TypeDef table
and we would need a concept called recursion to handle this. Finally we come
the end where we need to display the namespace and the name of the class.
We use the parameter
typerefindex to get at the name of the class and namespace and decide whether
to place the dot separator. As this is a common thing we always do, we will not
explain it again. The point to understand is that the class name is written in
the reverse order for a nested class.
The enclosing class is
specified first and then the nested classes. Thus for class a9, it extends
nested class aa that has a row number of 4 in the TypeRef table. This class is
nested within class aaa which is row number 3 and is in a namespace ppp.
Our program like all
programs is not complete because the following line is in error. .class a99
extends [mscorlib]ppp.aaa/qqq.aa. The assembler passes it in spite of a flaw.
The nested class aa cannot have a namespace qqq as a namespace is a top level
construct.
The disassembler is smart enough to understand and it removes
the namespace. We thus need to write code that says if the first coded index
table is TypeRef then the nested classes cannot have a namespace. We did not as
the assembler did not.
Whew. Now we come back to
the function DisplayOneTypeDefStart and
check the second table name of the TypeDefOrRef coded index TypeDef. This
happens when we extend from a type that is created in the same module. This
type will obviously have a row in the TypeDef table.
We simply have to display
the contents of the TypeDef table and this is what the function
DisplayTypeDefExtends does. If ever the row number of any table is 0, this
means that this is not a valid row number. This is a simple function as the name and nspace fields give us the name
and namespace and we add the token within comments and return this string.
The problem is not for
simple types but for nested types. Thus we have a function
GetNestedTypeAsString that first figures out the nested type. The problem is
with class a7 which extends nested class a333 that is nested within classes a33
and a3. If you are not clear about the tokens which is displayed immediately
after the class name.
The class a2 has a row
number of 4 which you can verify by seeing the class 4 definition. The
parameter rowindex is that of class a333 and its value is row 7. We first check
if this class is nested using the function IsTypeNested. Now that we have
nested class, we would want to figure out the parent of this nested class a333.
We use a function
GetParentForNestedType that returns the enclosing class for us. This function
is a no brainer. We told you eons ago that the Nested Class table has two
fields, one that was the type index of the nested class and other the enclosing class.
Thus we loop though the
entire array NestedClassStruct and break when we meet a row where the parameter
typeindex equals the field nestedclass, the row number of the nested class. We
return the enclosingclass field as this is the row number of the enclosing
class.
We build no error check as
both fields have to valid indexes in the TypeDef table. We then check whether
row 6 or class a33 is a nested class, and if it is we again determine the
parent of the nested class a33 which is a3 or row number 5. Finally we come to
the parent of class a33 that is class a3 or row number 4.
The string nameparent3 and
namespaceandnameparent3 will display the name and namespace for the top most
class a2 .i.e. row 4 as it uses the variable rowindexp3. This is responsible
for the namespace n1.n2. We then place the nested class separator and then use
the variable rowindexparentparent to give us the name of class a33.
Then we use variable
rowindexparent to give us the next nested class a33 and the namespace if any
and finally the variable rowindexparent for the actual name of the class a333.
The second namespace is used in the case of class a72. Thus depending upon the
number of levels of nested classes we use the relevant namespace field.
In the case of class a71,
the netsedtypestring variable gives us the namespace n1.n2. This is why we need
to place the namespace twice in our code. Obviously all is not right in our
program as we have assumed a certain level of nesting.
Thus if we introduce a class
.class a72 extends n1.n2.a2/a3/a33/a333/a3333 we will get an error as we have
one more level of nesting. A better way would be to use recursion as we
mentioned earlier. To solve the above problem add one more level of nesting but
yet it would be a imperfect solution.
A similar problem arises
with the TypeRef class and we have only handled one level of nested classes.
Thus the class .class a91 extends [mscorlib]ppp.aaa/aa/bb will give us a
problem as we have not handled a case where there are two levels of nesting.
The coded index table will have a value of TypeRef in the second case.
A point to understand is the
difference in how nested classes are handled by a TypeRef and a TypeDef. Each
nested class exists in the TypeRef table as a separate row and the coded index
tells us that there are more types nested by a value of TypeRef. In the case of
the TypeDef table, the Nested Class table tells us whether the levels of
nesting.
The typeextends variable by
now contains the class name that we write after the word extends. The problem
is that if we do not specify a class name, ilasm adds the class name Object
automatically. The problem comes in when we have a interface. Interfaces do not
automatically derive from the class Object.
Thus if and only if the
typeextends variable is non null do we write out the keyword extends along with
the type. Before we write the words extends on a new line we need to first
write spaces for nested classes followed by the space taken if the class is
within a namespace.
The last part of the class
directive is the implements that carry the list of interfaces and not interface
that we implement from. The function DisplayAllInterfaces is what we use. Each
time we create an interface it is stored in the InterfaceImpl table. We loop
though all the records in this table and check for the field called classindex.
A interface is also a class
and has a valid row number in the TypeDef table. This same row number is also
the value of the classindex field. If a class implements more than one
interface, the classindex field will repeat. Thus if a class implements from
three interfaces, this table will have three rows with the same value of the
classindex field.
The extends and the
implements classes are stored in different tables as the implements shares a
one to many relationship. The field interfaceindex is the TypeDefOrRef coded
index and from now on the same code we used earlier to retrieve the class name
is also used here. The last problem in this function is one of indentation.
Each interface needs a line
of its own to be displayed. The problem on hand is how do we know that this is
the last interface to be displayed. We take a Boolean variable nextclassindex
to tell us whether we should display an enter and spaces after the interface
name. We first check if it is the last row, there cannot be any more interfaces
and hence we set variable nextclassindex to false.
Also if the classindex of
the next row is not the same as the parameter typeindex we set nextclassindex
to false. This assumes that for the same type, all the interfaces are stored
one after another. If none of the above are true, we set variable
nextclassindex to true. We then use this variable to write out a new line with
spaces or not.
The last thing we do is
display the { with the right number of spaces. Within the class prototype we
display nothing else but prototypes of nested classes.
Coming back to one function
DisplayOneTypePrototype we now call the function DisplayNestedTypesPrototypes.
We will use this function to display all the nested types for which the typeindex
is a enclosing class.
The important thing to
understand here is that nested classes are to be displayed in the same way as
the enclosing class but only need to be indented by 2 each time. Thus the code
that we wrote in the function DisplayOneTypeDefStart can be reused.
Also unlike the way we have
been writing code so far, we cannot assume that the level of nesting will be 3
or 4 or 5. This is the first time we will use the concept of recursion. The
basic premise of nesting is that write the code once call it again. This make
recursion like a function call. True but with a slight difference.
We will call the same
function within the same function. Complex, lets explain with a actual
example. In the function
DisplayNestedTypesPrototypes we have no idea of how many classes are nested
within this class or does it have any nested classes.
There is one table the
Nested classes table that has the nested classes details stored. We iterate
though this table and check whether the parent or enclosing class field has the
typeindex passed as a parameter. If yes, then this class of ours has nested
classes.
We then call the function
DisplayOneTypePrototype that displays the prototype of a type. We pass the
nestedclass field as that field contains the nested class to be displayed. A
class can have 6 nested classes in it and this if statement will be true six
times.
The major point is that each
nested class in turn may have a million levels of nesting. In the function
DisplayOneTypePrototype we are first calling the function
DisplayOneTypeDefStart which displays the class directive.
Before calling this function
we increase the instance variable spacesfornested by 2 and thus this nested
class will indent by 2 as its value will be 4 and not 2 as earlier. After the {
is displayed, the function DisplayNestedTypesPrototypes will be called again.
Remember the first DisplayNestedTypesPrototypes is not yet over.
This is the flavor of
recursion, calling the same function again in spite of the fact that the
earlier call is not over and lies suspended in memory waiting for the second
call to be over. Now in the for loop the typeindex value is of the nested class
and now it is the enclosing class.
If the nested class has any
further nests, then the second DisplayNestedTypesPrototypes is suspended and
the variable spacesfornested increase by 2. At some point in time the for loop
will not match and the function DisplayOneTypePrototype will not get called.
This will result in the function DisplayOneTypeDefEnd be called which displays
the end of the class closing brace.
We will then move out of the
function DisplayOneTypePrototype as there are no more functions left to be
called. This will result in the variable spacesfornested having a value of two
less. The problem with recursion is that some take a million tries before they
get it but no one gets it in one or two.
So read again and then come
back to the last function DisplayOneTypeDefEnd.
This function simply has to
display the close brace, end of class and the end of namespace if any. The
writenamespace variable starts with a value of true and if and only if its
value is true do we display the namespace directive in the function
DisplayOneTypeDefStart.
We first check whether the
namespace variable contains a valid namespace name. If it is null we make the
variable writenamespace to false so that the directive namespace is not
displayed. We then need to find out the next class that will be displayed after
this class.
We cannot assume that it
will be the next row as that could be a nested type within the current type.
Thus we need to scan the next row from the type we have right now displayed and
keep going till we reach the end. We check each succeeding type and loop back
if it is a nested type under the type we have just displayed.
If the type is not nested we
break. We are assuming that the enclosing types and nested types are placed one
after the other. Now that we have left the for loop we are on a type that will
be displayed after this type. We add a check that we are not at the end of the
table and find out the namespace of this type.
We write out the closing
brace of the namespace and also its name. As this is the end of a namespace we
reset the variable spacefornamespace to 0 as there is no indentation for the
namespace and the spacesforrest is reset to 2 as this variable gives us the
indentation for the succeeding entity to be placed as we shall soon see.
As the writenamespace is set
to true, the next time we come across a class that has a namespace the
namespace directive will be displayed. The point is that if two classes fall
under the same namespace, the namespace directive is not repeated at all.
We display the first
namespace directive and then only display it if we have written out the end of
namespace as the namespace directive cannot be nested. Hence the writenamespace
variable is set to null in the else of the first if.
The TypeRef table contains
only three fields and the ResolutionScope coed index as per the specs can take
five possible values. Out of these four we could demonstrate i.e. the TypeRef,
Module, ModuleRef and AssemblyRef
tables. The fifth is when the value is null which means that the table
is the ExportedType table. A type in this table is not a valid type after the
extends keyword. We cannot have two types in this table where the name and
namespace fields are the same.
If we did not mention this
before the InterfaceImpl table only has two fields class which is the index
into the TypeDef table and the field interface that is a TypeDefOrRef coded
index. If we have 10 rows in this table, it means that the type denoted by the
class field implements a certain interface.
Obviously the class field
must be non null and if is by mistake null, we should assume that this row does
not exist at all. It happens when a class is deleted and the metadata is not
updated and rewritten when the compiler incrementally compiles. A time saver.
The interface field indexes into the TypeDef or TypeRef table and not TypeSpec
as TypeSpecs do not have a name.
Also the class must have the
interface flag on and cannot be a formal class or Value Type. The class and
interface values together cannot be duplicate, but the class by itself and be
more than one as a class can implement lots of interfaces. Vice Versa the
interface field can be multiple as one interface can be used by multiple
classes.
Program18.csc
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
DispalyVtFixup();
DisplayTypeDefs();
DisplayTypeDefsAndMethods
();
}
public void
DisplayTypeDefsAndMethods ()
{
notprototype = true;
if ( TypeDefStruct.Length !=
2)
{
Console.WriteLine();
Console.WriteLine("//
=============================================================");
Console.WriteLine();
}
Console.WriteLine();
Console.WriteLine("//
=============== GLOBAL FIELDS AND METHODS ===================");
Console.WriteLine();
//DisplayGlobalFields();
//DisplayGlobalMethods();
if ( TypeDefStruct.Length !=
2)
{
Console.WriteLine();
Console.WriteLine("//
=============================================================");
Console.WriteLine();
Console.WriteLine();
Console.WriteLine("//
=============== CLASS MEMBERS DECLARATION ===================");
Console.WriteLine("// note that class flags, 'extends' and
'implements' clauses");
Console.WriteLine("// are provided here for information
only");
Console.WriteLine();
int kk =
TypeDefStruct.Length ;
for ( int i = 2 ; i < kk
; i++)
{
if (
GetString(TypeDefStruct[i].name) == "_Deleted" &&
streamnames[0] == "#-")
continue;
if ( ! IsTypeNested(i) )
{
DisplayOneType(i);
}
}
}
DisplayEnd ();
}
public void DisplayOneType
(int typedefindex)
{
DisplayOneTypeDefStart(typedefindex);
DisplayNestedTypes(typedefindex);
DisplayOneTypeDefEnd(typedefindex
);
}
public void
DisplayNestedTypes (int typedefindex)
{
if (NestedClassStruct ==
null)
return ;
for ( int ii = 1 ; ii <
NestedClassStruct.Length ; ii++)
{
if (NestedClassStruct[ii].enclosingclass
== typedefindex)
{
spacesfornested += 2;
DisplayOneType(NestedClassStruct[ii].nestedclass );
spacesfornested -= 2;
}
}
}
public void DisplayEnd()
{
string nspace =
NameReserved(GetString(TypeDefStruct[TypeDefStruct.Length-1].nspace));
if ( ! placedend)
{
Console.WriteLine();
Console.WriteLine("//
=============================================================");
Console.WriteLine();
placedend = true;
}
Console.WriteLine("//***********
DISASSEMBLY COMPLETE ***********************");
if (datadirectoryrva[2] !=
0)
Console.WriteLine("//
WARNING: Created Win32 resource file a.res");
}
public void
ReadTablesIntoStructures()
{
int ii ;
for ( ii = 1 ; ii <=
TypeDefStruct.Length - 1 ; ii++)
{
//Console.WriteLine("........{0}
{1} {2}" , TypeDefStruct.Length , IsTypeNested(ii) , ii);
if ( ! IsTypeNested(ii) )
lasttypedisplayed = ii;
}
}
e.il
.namespace aa
{
.class a1
{
.class a33
{
}
}
}
.class a2
{
}
Output
//
=============================================================
// =============== GLOBAL
FIELDS AND METHODS ===================
//
=============================================================
// =============== CLASS
MEMBERS DECLARATION ===================
// note that class flags, 'extends' and 'implements' clauses
// are provided here for information only
.namespace aa
{
.class /*02000002*/ private auto ansi a1
extends [mscorlib/* 23000001 */]System.Object/* 01000001
*/
{
.class /*02000003*/ auto ansi nested private a33
extends [mscorlib/* 23000001 */]System.Object/* 01000001
*/
{
} // end of class a33
} // end of class a1
} // end of namespace aa
.class /*02000004*/ private
auto ansi a2
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
} // end of class a2
//
=============================================================
//*********** DISASSEMBLY
COMPLETE ***********************
After a long time program18
gives us a output that matches the output displayed by the original
disassembler. We have a namespace aa that has a class a1 which in turn has a
nested class a33. The class a2 is not enclosed in a namespace. The above il
file has no methods and thus we need to compile it into a dll using the /DLL
option.
Now lets look at code that
we have added to get the above magic. Th excitement now starts and step by step
we will keep adding more and more to the il file and write code that matches
the output by the disassembler. We call the function DisplayTypeDefsAndMethods
in the function abc to display the contents of the types for us.
In this function we first
set the variable notprototype to true which if you remember was set to false
earlier and thus some code which we did not explain but will explain now did
not get called. We also told you some time back that a dummy class called
module always gets created even though you do not have a single class defined.
Thus we first check that the
user has created at least one class and then only display the = signs in
comments. We then write out the comments for global Fields and Methods. These
global entities are allowed in languages like IL, C++ but not in C#.
We use two functions to
display these global fields and methods but comment them out for the moment as
we will explain them later. We have more pressing thing to do at this moment.
Once again if there are classes to be displayed we write out the words class
members declaration with the required comments for extends and implements.
A waste of code and space if
you ask us. We then place a blank line and come to the core of this function
using a for loop that starts at 2 and not 1 and till the number of rows in the
array TypeDefStruct. As before we take care of classes called _Deleted and let
the function DisplayOneType take care of the displaying one Type.
This is like the earlier
Type Prototype display function and thus we will not explain the if statement
checking for nested classes. Finally when we leave the for loop the function
DisplayEnd will display the last lines of the output. After this it going back
to bed.
No more function calls.
Looking at the DisplayOneType function, all that we do is call two functions we
have called before DisplayOneTypeDefStart and DisplayOneTypeDefEnd. This is how
we reuse code.
We also call the
DisplayNestedTypes function to display nested types with the only difference
from its prototype cousin is that is calls the DisplayOneType function instead
of the prototype cousin. Lets explain the role of a variable lasttypedisplayed
that we initialize in the function ReadTablesIntoStructures.
At the end of this function
we have a for loop that loop through
the type table. We would like to know the last type we are displaying. We
cannot assume that it is the last physical type as that type could be a nested
type that will display within another type.
Thus set the variable
lasttypedisplayed to the loop variable ii as long as the type is not nested. We
do this once and then do not change the value of this variable at all. To
understand better we would like to flip lots of pages and move over to
DisplayOneTypeDefEnd function of the earlier program. Lets take a different
e.il for this case.
e.il
.namespace aa
{
.class a1
{
}
}
We have to write out a
series of = equal to signs with a enter before and after. If the last type to
be displayed falls within a namespace we have to write out the equal to signs
before we close the namespace.
Thus if we are displaying
the last type which happens when the variables lasttypedisplayed equals
typeindex then we write out the many equal to signs. As this function gets
called twice once for prototypes if you have not forgotten, the notprototype
variable has to be set to true.
The placedend variable is
set to true as we have written out the equal to signs. The above il file calls
the first if statement as the type to be displayed a1 is also the last type. We
must remember that the two namespaces nspace1 and nspace2 must not be
null. Finally we come to the function
DisplayEnd which as said before is the last function to be called.
If we have a il file with no
namespace or better still the last class is not within a namespace the earlier
code will not place the = signs. Thus we first check the value of the placedend
variable. If it is false, it means that we have to write out the closing equal
to signs.
We do have to initialize the
placedend variable to true as no code gets called after this but even if we do
it only means that we are lousy programmers. If the second data directory
member is non zero, it means that we have a the disassembler create a resource
file for us and we need to display the value of this file that is always called
a.res.
Thus we have now written a
program that matches the output generated by the disassembler but do not start
the celebrations as we have miles an miles to go.
Program19.csc
public void DisplayOneType
(int typedefindex)
{
DisplayOneTypeDefStart(typedefindex);
DisplaySizeAndPack
(typedefindex);
DisplayNestedTypes(typedefindex);
DisplayOneTypeDefEnd(typedefindex
);
}
public void
DisplaySizeAndPack (int typeindex)
{
if ( ClassLayoutStruct ==
null)
return;
for ( int ii = 1 ; ii <
ClassLayoutStruct.Length ; ii++)
{
if (
ClassLayoutStruct[ii].parent == typeindex )
{
Console.Write(CreateSpaces(spacesfornested
+ spacesforrest));
Console.WriteLine(".pack
{0}" , ClassLayoutStruct[ii].packingsize);
Console.Write(CreateSpaces(spacesfornested
+ spacesforrest));
Console.WriteLine(".size
{0}" , ClassLayoutStruct[ii].classsize);
}
}
}
e.il
.class a1
{
.pack 2
}
.class a2
{
.size 2
}
.class a3
{
.size 2
.pack 2
}
Output
.class /*02000002*/ private
auto ansi a1
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.pack 2
.size 0
} // end of class a1
.class /*02000003*/ private
auto ansi a2
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.pack 1
.size 2
} // end of class a2
.class /*02000004*/ private
auto ansi a3
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.pack 2
.size 2
} // end of class a3
We have added a function
DisplaySizeAndPack that displays the directive size and pack that we have added
in our class. A size and pack directive fill up a row in Class Layout table.
This table has three fields, two to store the size and pack directives and the
third to tell us which type carries these directives.
The reason we do not store
these pack and size directives in the TypeDef table itself as they are
optional. The Packing Size field is a short whereas the class size is larger at
four bytes. We use the class layout table to tell the compiler how the fields in
IL and instance variables in Il should be laid out or arranged in memory.
These directives only apply
to a class or value type and not to an interface. Normally the CLI is free to
place the fields wherever it wants in memory and leave as many gas it likes. If
it so pleases it is also allowed to move the fields in memory.
The managed world of .net
has to give you most of the features available with the unmanaged world of
C/C++. In these languages we had the freedom of placing our fields or
structures the way we liked. By allowing us the same flexibility, we can now
access unmanaged code structures in the same way using managed code.
There are if you have been
reading this book sequentially, three types of layout attributes, Auto,
sequential and explicit. The default is auto if we do not specify a layout.
Coming to our program, the first class a1 has a pack size of 2 and no size
directive and the Output shows us that the default size of 0..
A value of 0 does not mean
that the class size is zero, it means that the CLI will figure it out. The
second class a2 has a size and no pack and the default pack size is 1 and not
0. In our function DisplaySizeAndPack all that we do is scan the class layout
table and check for that single record that has the parent field being equal to
the parameter typeindex.
If a record does not match
the type, we do not display these directives. We first check whether the
packingsize is non zero as the default is zero when we do not specify a size
and only a pack as mentioned before.
We write out the indentation
and the spacesforrest takes care of the indentation other than that for the
nested classes. We write out the pack and size directives. Lets now understand
what the pack directive is all about. If we have a pack size of say 16, this
means that every field in memory at runtime should start at a address which is
a multiple of 16 or a natural alignment of the field type.
Unlike life, the CLO chose
whichever is smaller and not larger.
Thus if we specify a pack 2, then a 32 bit field will begin at an
address that is a multiple of 2 and 4 which would happen naturally if there was
no pack. The pack can only have values of 0, 1,2 , 4 , 8 , 16, 32 , 64 or 128.
A value of zero does not
mean no pack but the pack size used should be decided by the platform we are
running on. Obviously the pack directive and the explicit attribute where we
are being explicit cannot be used together.
The size directive is easier
to understand as it specifies the size of memory allocated to th fields of the
class and not for the methods. This value should be larger or at the very best
equal to the calculated size of the class. The size of the class is the sum of
the individual fields and the extra gaps due to the pack directive.
The pack and size directives
are not hints and the system better obey our values or else … . The class
layout table may be empty and normally is. Obviously the class containing these
two directives must not have the auto layout as gave have, but we get a warning
by the assembler and not a error.
If the class size is larger
than the actual size, padding is provided at the end of the class by the
compiler. A class size of zero specifies the system can figure out the size of
the class as it normally does. Even though we use the Explicit layout
attribute, we can yet have a verifiable type if out type does not have a union.
A union is a entity that
allows different fields to start at the same location in memory. For a explicit
layout attribute the packing size is 0 as we are explicitly specifying each
offset. If you have forgotten all classes derive from System.Object and value
types from System.ValueType.
A layout has to start from
the first class that derives from class Object and it cannot start from any
other point in the inheritance hierarchy. We can stop the layout anywhere in
the chain but from then on no class can have layout. We cannot stop the layout
and two classes later start again.
Thus no holes are allowed in
the layout of classes. Thus the two rules we have specified are no holes and
also that the layout starts from the highest class.
Program20.csc
public void DisplayOneType
(int typedefindex)
{
DisplayOneTypeDefStart(typedefindex);
DisplaySizeAndPack(typedefindex);
DisplayNestedTypes(typedefindex);
DisplayAllMethods(typedefindex);
DisplayOneTypeDefEnd(typedefindex
);
}
public void
DisplayAllMethods (int typerow)
{
if ( TypeDefStruct == null)
return;
if ( MethodStruct == null)
return;
int start , startofnext=0;
start = TypeDefStruct[typerow].mindex ;
if ( typerow ==
(TypeDefStruct.Length -1) )
{
startofnext=
MethodStruct.Length;
}
else
startofnext =
TypeDefStruct[typerow+1].mindex ;
for ( int methodindex =
start ; methodindex < startofnext ; methodindex++)
{
string methodstring =
CreateSpaces(spacesforrest);
if ( IsTypeNested(typerow))
methodstring = methodstring
+ CreateSpaces(spacesfornested);
methodstring =
methodstring + ".method ";
methodstring = methodstring
+ "/*06" + methodindex.ToString("X6") + "*/ " ;
string methodattribute =
GetMethodAttribute(MethodStruct[methodindex].flags , methodindex);
Console.WriteLine(methodstring
+ methodattribute);
}
}
public string
GetMethodAttribute (int methodflags , int methodrow)
{
string returnstring =
"";
methodaccessattribute=""
;
methodhidebysigattribute=
"";
methodpinvokestring =
"";
methodunmanagedexpattribute
= "";
methodreqsecobjattribute =
"";
methodstaticinstanceattr="";
methodnewslotattr =
"";
methodspecialnameattr =
"";
methodrtspecialnameattr =
"";
methodpinvokeimplattr =
"";
methodfinalattr =
"";
methodvirtualattr =
"";
methodabstractattr =
"";
if ( (methodflags &
0x0006) == 0x0006)
returnstring = "public
";
else
if ( (methodflags &
0x0005) == 0x0005)
returnstring =
"famorassem ";
else
if ( (methodflags &
0x0003) == 0x0003)
returnstring =
"assembly ";
else
if ( (methodflags &
0x0004) == 0x0004)
returnstring = "family
";
else
if ( (methodflags &
0x0001) == 0x0001)
returnstring = "private
";
else if ( (methodflags &
0x0002) == 0x0002)
returnstring =
"famandassem ";
else
returnstring =
"privatescope ";
methodaccessattribute =
returnstring;
if ( (methodflags &
0x0080) == 0x0080)
{
methodhidebysigattribute =
"hidebysig " + methodstaticinstanceattr;
returnstring = returnstring
+ "hidebysig ";
}
if ( (methodflags &
0x0100) == 0x0100)
{
methodnewslotattr =
"newslot " ;
returnstring = returnstring
+ "newslot ";
}
if ( (methodflags &
0x0800) == 0x0800 || (methodflags & 0x0200) == 0x0200 )
{
methodspecialnameattr =
"specialname ";
returnstring = returnstring
+ "specialname ";
}
if ( (methodflags &
0x1000) == 0x1000)
{
methodrtspecialnameattr =
"rtspecialname " ;
returnstring = returnstring
+ "rtspecialname ";
}
if ( (methodflags &
0x0010) == 0x0010)
{
methodstaticinstanceattr =
"static " + methodstaticinstanceattr ;
returnstring = returnstring
+ "static ";
}
else
{
methodstaticinstanceattr =
"instance " + methodstaticinstanceattr;
returnstring = returnstring
+ "instance ";
}
if ( (methodflags &
0x0020) == 0x0020)
{
methodfinalattr =
"final " ;
returnstring = returnstring
+ "final ";
}
if ( (methodflags &
0x0040) == 0x0040)
{
methodvirtualattr =
"virtual " ;
returnstring = returnstring
+ "virtual ";
}
if ( (methodflags &
0x0400) == 0x0400)
{
methodabstractattr =
"abstract " ;
returnstring = returnstring
+ "abstract ";
}
if ( (methodflags &
0x2000) == 0x2000)
{
methodpinvokeimplattr =
"pinvokeimpl " ;
returnstring = returnstring
+ "pinvokeimpl(";
int ii;
if ( ImplMapStruct == null)
{
returnstring = returnstring
+ "/* No map */) ";
return returnstring;
}
else
{
for ( ii=1; ii <
ImplMapStruct.Length ; ii++)
{
int index = ImplMapStruct[ii].cindex;
index = index >> 1;
if ( index == methodrow )
break;
}
if ( ii ==
ImplMapStruct.Length )
{
returnstring = returnstring
+ "/* No map */) ";
return returnstring;
}
string methodname =
NameReserved(GetString(MethodStruct[methodrow].name));
string name =
NameReserved(GetString(ImplMapStruct[ii].name));
int scope =
ImplMapStruct[ii].scope;
string modulename =
NameReserved(GetString(ModuleRefStruct[scope].name));
modulename =
modulename.Replace("\\" , "\\\\");
returnstring = returnstring
+ "\"" + modulename + "\"" ;
if (
String.Compare(methodname , name) != 0)
returnstring = returnstring
+ " as \"" + name + "\"";
string pinvokeattribute1;
string pinvokeattribute =
GetPinvokeAttributes(ImplMapStruct[ii].attr , out pinvokeattribute1);
returnstring = returnstring
+ pinvokeattribute1;
if
(pinvokeattribute.IndexOf("stdcall") == -1)
returnstring = returnstring
+ " " + pinvokeattribute;
returnstring = returnstring
+ ") ";
int index1 =
returnstring.IndexOf("pinvok") ;
methodpinvokestring = returnstring.Remove(0, index1);
}
}
if ( (methodflags &
0x08) == 0x08)
{
methodunmanagedexpattribute
= "unmanagedexp ";
returnstring = returnstring
+ "unmanagedexp ";
}
if ( (methodflags &
0xffff8000) == 0xffff8000)
{
methodreqsecobjattribute =
"reqsecobj ";
returnstring = returnstring
+ "reqsecobj ";
}
return returnstring;
}
public string
GetPinvokeAttributes (int attribute , out string returnattribute)
{
returnattribute =
"";
if ( (attribute & 0x001)
== 0x0001)
returnattribute = "
nomangle";
if ( (attribute & 0x006)
== 0x0006)
returnattribute =
returnattribute+ " autochar";
else if ( (attribute &
0x002) == 0x0002)
returnattribute =
returnattribute + " ansi";
else if ( (attribute &
0x004) == 0x0004)
returnattribute =
returnattribute + " unicode";
if ( (attribute & 0x040)
== 0x0040)
returnattribute =
returnattribute + " lasterr";
string returnstring =
"";
if ( (attribute &
0x0500) == 0x0500)
returnstring = returnstring+
"fastcall";
else if ( (attribute &
0x0300) == 0x0300)
returnstring= returnstring +
"stdcall";
else if ( (attribute &
0x0100) == 0x0100)
returnstring = returnstring
+ "winapi";
else if ( (attribute &
0x0200) == 0x0200)
returnstring = returnstring
+ "cdecl";
else if ( (attribute &
0x0400) == 0x0400)
returnstring = returnstring
+ "thiscall";
return returnstring;
}
e.il
.class a1
{
.method public
pinvokeimpl("Ole32.dll" as "CoCreateInstance" autochar
winapi) int32 CoCreateInstance2()
{
}
.method public
pinvokeimpl("Ole322.dll" as "CoCreateInstance" autochar
winapi) int32 CoCreateInstance3()
{
}
.method public pinvokeimpl("Ole322.dll"
as "CoCreateInstance" autochar winapi) int32 CoCreateInstance()
{
}
.method public
pinvokeimpl("Ole322.dll" as "CoCreateInstance" autochar
stdcall) int32 CoCreateInstance4()
{
}
}
Output
.module extern Ole322.dll
/*1A000001*/
.module extern Ole32.dll
/*1A000002*/
.class /*02000002*/ private
auto ansi a1
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.method /*06000001*/ public instance
pinvokeimpl("Ole32.dll" as "CoCreateInstance" autochar
winapi)
.method /*06000002*/ public instance
pinvokeimpl("Ole322.dll" as "CoCreateInstance" autochar
winapi)
.method /*06000003*/ public instance
pinvokeimpl("Ole322.dll" autochar winapi)
.method /*06000004*/ public instance
pinvokeimpl("Ole322.dll" as "CoCreateInstance" autochar)
} // end of class a1
This program onwards will
display the method directive by means of which we define a method or function
as we knew them earlier. The problem with the method directive is that they are
really complex and if we did all of the method directive the program will run
into 1500 lines.
Thus we break up this
directive by having smaller programs deal with each individual feature and then
like Humty Dumty put them together again. Lets start with displaying the
attributes that a method can have. First things first.
In the method DisplayOneType
we add a call to the method passing it the type row number that carries the
methods. This function will display all the method owned by a type and a little
later all the global methods also that have a type of 1. Each method directive
adds one row in the method table.
The question in your mind is
how do we represent all the methods a type contains. The simplest way in our
mind is to have to have two fields, one field for the starting method number in
the method table and the second field for the last method number in the methods
table.
The problem with out way is
that it is too space consuming and where we can use one field why use two. The
field mindex tells us the first method number in the methods table but there is
no field for the last method. We infer this by looking at the next mindex field
of the next type.
We thus have the starting
method numbers owned by two succeeding types.
We thus have two row numbers in the method table. The first is the
starting point and the second minus 1
is the last method that this type owns. Thus the variable start is the
first method row number and variable startofnext is the method row of the
succeeding type.
We need to check if the type
we are displaying method for is the last type, then the last method owned by
this type is the last row number or the length of the Method table. In the for
loop the methodindex variable is the loop variable and we start at the variable start and go one
less than the value of startofnext.
The other reason that we do
not have two fields is that a type can have no methods at all. In this case
this type and the next type will have the same value for the field mindex. Why
use two when one can do the job.
If there is one variable we
have used the most it is the variable spacesforrest. This variable has
basically three different values 0, 2 or 4. the value of 0 is when we are
displaying global functions and there is no indentation needed then. The value
of two is most common as there is 2 spaces between a type and a method.
Finally if we in a namespace
the indentation is 4. The spacesfornested store the spaces needed if there is a
nested class. Thus whenever we write something out these two variable will
always be used for the initial spaces to be written out. We write out the
directive method and then its row number and we then use a function
GetMethodAttribute to give us the method attributes which are like the class
attribute.
We write out the method
directive and the attributes and nothing else. Thus the output give by the
disassembler will not match as we are as we said before only displaying parts.
These parts will however match the same part displayed by the
disassembler. Lets move on to this
method GetMethodAttribute.
The Method table has a
member called flags that tells us which attributes this method carries like in
the case of the type, this field has to be read bitwise and not the byte wise.
The problem that we will come across later is whether the abstract attribute
gets displayed first or the newslot attribute.
Thus we work within this
function in two ways, we use a common variable returnstring to store the
individual attributes and also another series of variables like
methodaccessattribute that store the individual attribute values. We will use
these variables to write out the attributes in a certain order.
The return string also
contains all the attributes but in the wrong order. Some programs later we will
tell you what the right order of writing out the attributes is all about. These
attributes are divided into families and the first is the accessibility
attributes.
These are attributes that
decide who can access these methods and even though they are seven of them, the
seventh compilercontrolled gives us an ilasm error.
This is because ilasm does not support this attribute in its first release and
we are advised by the specs to use privatescope instead as the effect is the
same. It is hidden and cannot be referenced.
The other six like assembly,
famandassem , family, famorassem , private and public have been touched upon
earlier. The assembly attribute has the first and second bit on and thus we
bitwise and with 3. The trick here is that the first bit on means private, the
second bit on means famandassem.
Only if both bits are on,
its means assembly. This means that the order of if statements is important
because if we check for private and famandassem first, when the attribute is
assembly, they will match. Thus we first check for the combination bit being on
and then the single bits.
That is why the single bits
check is carried out at the end and those attributes that have more than one
bit on at the beginning. Another way of handling it would be like the way we
did earlier, extract the bits for accessibility and then check the absolute
varies and not bite. To each his own.
The above attributes are
mutually exclusive and cannot be combined. This is because a method can have
only one attribute. The next attribute is the attribute that lets us override
methods called newslot and is only used by virtual functions. We will delve
into the newslot and virtual methods some time later.
Abstract classes we
explained earlier we incomplete and could not be used. This is because they had
abstract functions. This abstract attribute can only be used with virtual
functions and these must not be marked final as abstract methods must be
overwritten as its in they genes.
A abstract says that there
is no code or implementation for this method and the class that overrides this
class containing a abstract method will supply the body of the code. Obviously
abstract methods must be present in only abstract types.
One important attribute that
we would do later is the pinvokeimpl attribute and a major part of this
function deals with this attribute. The nest family is the method contact
attribute that has four members, final, hidebysig, static and virtual. These
attributes can be combined keeping some conditions in mind.
A method cannot be static
and virtual and if it is virtual then only can it be final. A final method is a
method that a subclass is not allowed to override. It is a way of saying use
the function as is, thou can override the class but not this method. We
explained earlier that the hidebysig is ignored by the VES or system but is
used by tools to do whatever they like with the name.
This attribute specifies
that this method will hide all the methods from the inheritance hierarchy that
have the same method signature. If this attribute is not present, the hiding
yet takes place but the signature is not taken into account and all methods
having the same name are hidden.
A method name is its name
plus signature and this is what gives the object oriented world a big advantage
over plain C which hide method only by name. Finally we have
the special handling attributes that are rtspecialname and specialname. A
constructor is called .ctor and a static constructor .cctor.
These names are recognized
by the runtime and treated in a special way. This is the meaning of the
rtspecialname attribute and the specialname is for use by tools and not the
runtime. There are two more special attributes bought in by Microsoft which are
unmanagedexp and reqsecobj.
The unmanagedexp says that
the method is exported to unmanaged code using COM interop that we will explain
in a short while. Reqsecobj says that this method calls another method which
has security attributes. Finally some final conditions on applying attributes,
static cannot be sued with final, virtual or newslot.
This makes sense as static
functions belong to a type or class and not to an instance. Abstract functions
have no code and hence cannot be used with pinvokeimpl and final. When
compilercontrolled when implemented cannot be used with virtual, final,
specialname or rtspecialname.
A small bit of information,
if we use the attribute specialname, the flags field has a value of 0x800 as
the specs tell us. In our file isymwrapper.dll, the specialname attribute was
written out by ildasm even though the flags field had a value of 0x200.
This value of 0x200, the
specs are silent on so we put two and two together and assumed that a value of
0x200 or 0x800 stand for the attribute specialname. We have also not checked
for a value of 0x4000 which is the security attributes as we did not know the
name of the attribute.
A little later we will
trouble you with what is called the custom attribute and this directive adds
the security attribute. The problem is that ildasm ignores it and so do we. Now
lets move on to the pinvokeimpl attribute that we have been promising we would
do.
We first check whether we
have the attribute pinvokeimpl which has a value of 0x2000. We first write out
the name of the attribute and then tell ourselves that each time we have such
an attribute, a extra row gets added to the ImplMap table. Lets start with a
simple error check first.
We assume that we only one
method that uses the pinvokeimpl attribute and this attribute requires some
values in within brackets. If we do not supply these values, the ImplMap table
has no rows and hence we need to return the words No map within brackets. If
the ImplMap table has records we iterate through each and every one of
them.
But first a small note on
the use of the pinvokeimpl attribute. For years programmers have written code
that runs under windows and this code has passed the test of time. It makes no
sense that this tried and tested code goes waste just because something new has
come about.
Thus Microsoft invested lots
of time to come with a method by means of which, managed code i.e. .net code
can call the earlier unmanaged code. This method is called Pinvoke or platform
invoke. Thus code written in the past
and present in Dll’s can now be executed in the manage world of .net.
Thus Pinvoke switches from
managed to unmanaged, makes the function call and then back to managed code.
The .net world may have a data type that needs to be transformed to another
data type in the unmanaged world and the return value in the unmanaged world may
need to be transformed to another data type as we enter the managed world.
This transformation is
called data marshalling. These functions that really do not exist in the
managed world as they carry no code but are a gateway to the unmanaged world
are marked as pinvokeimpl. A little later we will talk of another set of
attributes called implementation attributes that come after the function
parameters.
A pinvokeimpl attribute must
have the implementation attributes of native and unmanaged. If you look closely
at the il file, after the pinvokeimpl attribute name we have a set of
parameters in brackets. This is the only attribute that excepts parameters. The
first is the name of the dll that carries the code that will get executed.
This is a quoted string or
what the docs call a QSTRING, a string in double and not single inverted
commas. In our case we are specifying that the code of the function CoCreateInstance2 actually
lies in this dll. The second string is optional and specifies the real name of
the function as it exists on that platform.
Lets take you back in time
and in the days of programming in C a function name was simply its name. Then
came in C++ and the name of a function was the name plus data types of
parameters. A function abc that took two ints as a parameter was renamed by the
C++ compiler from Borland to abc$qii.
In the same vein the same
function abc with a single int was renamed to abc$qi. Thus the Borland compiler
added a $q and then the data types of each parameter. This changed the name of
the function beyond recognition and is called name mangling.
The Microsoft complier
Visual C++ would however use a different set of symbols to represent the
parameter types. There is no standard that describes name mangling.
Thus in our case we are
saying that the function name in the .net world is CoCreateInstance2 but in
Ole32.dll it will be seen as CoCreateInstance instead due to name mangling. The
.net law says that if a method is to be marked by the pinvokeimpl attribute
then it should be a global method i.e. outside a type and must be static.
Fortunately for us the
assembler does not seem to care. Also pinvokeimpl methods as they are a stand
in for methods in unmanaged code, need not contain any code as this code they
carry is not to be called. We now scan our ImplMap table and move our eyes to
the cindex field.
This field as the name
suggests is a coded index field called the MemberForwarded coded index that
points to two tables, method or field. For some reason, a field will never have
a pinvokeimpl attribute and thus the coded index always represents a row in the
Method table.
We right shift the coded
index by 1 and then check whether it is the row number of pour method. Each
time a method has the pinvokeimpl attribute, the field cindex will carry the
method number. We now break out of the for loop if a match is found. We need to
check whether our method row number actually exits in this table.
This happens if the
pinvokeimpl attribute exits on the method but has no parameters in brackets.
This was the same condition that we checked for earlier but here the check is a
little. There we had only method and this method did not have pinvokeimpl
parameters.
In this case we have at
least one method that has the pinvokeimpl parameters, but the current method
has the pinvokeimpl attribute but no parameters. We first pick up the name of
the method as stored in the method table and store it in the methodname
variable.
The field name in the
ImplMapStruct stores the actual name of the function we wrote in the as clause.
The first thing we wrote was the name of the Dll that carries the code of this
function and there is no field that gives us this dll name. The pinvokeimpl
attribute is a lot more complex.
Every dll name that we
write, a row gets added to the ModuleRef table. This row number is the value of
the scope field. Thus if we have two pinvokeimpl attributes, with different dll
name, Ole32.dll and Ole322.dll in our case, the output carries two module
extern directives.
If the module name caries a
backslash, we need to replace it with two backslashes. We then compare the name
of the method as stored in the method table with the name of the actual
unmanaged code method name. If they different, we put the as clause and the
original method name.
Thus in our il file, the
third method has the as method name the same as the method name and thus the as
is omitted form the pinvokeimpl attribute.
There are some more attributes that we write out after the as clause and
we use the function GetPinvokeAttributes to get at these extra attributes
stored in the attr field.
In this function we are
returning one set of attributes as the return value and another set through the
second out parameter returnattribute. The our parameter is one of five
attributes, nomangle, autochar, ansi, unicode that we did earlier and lasterr.
The second set of attributes that we actually return do with the calling
convention and they are fastcall, stdcall, winapi ,cdecl and thiscall.
We first write out the out
parameter first and then the calling convention. The only problem with the
calling convention stdcall is that even if write it like for the function
CoCreateInstance4, the assembler does not spit it out for us. Thus the if
statement makes sure that all calling conventions but the stdcall will get
written out.
We then write out the close
bracket and then find out the index of the words pinvok in the returnstring. If
you have not forgotten so far, the words pinvokeimpl are preceded with the
other attributes. We need to knock off all the other attributes before the
pinvokeimpl attribute and thus use the Remove function of the string class.
The first parameter is the
starting point in the string. We store this pinvokeimpl attribute in a
variable in the variable
methodpinvokestring and this variable will be used later to actually write out
the method attributes. We finally check for the last two attributes and then
return the value of returnstring that we finally display.
One of the things that we
have explained above is the ability of the CLI to call pre existing native code
from a platform that we call unmanaged code. The platform will decide what the
rules are and hence specific to a operating system.
What this entails is
deciding a file format so that function pointer to managed code can be called
from unmanaged code. What we have seen so far is a way for specifying methods
to be implemented in unmanaged code.
We also need a way for
marking call sites i.e. calling functions through instructions to indicate that
the function to be called is actually in unmanaged code. The call and calli
instructions will be explained in detail later.
What the specification
finally specifies is a set of pre defined data types that can be marshaled
across irrespective of where the CLI has been implemented. We can however
extend these small number of data types using custom attributes and modifiers.
These extensions are
specific to each platform and are not guaranteed to work across platforms. Lets
take up the attributes ansi, autochar and unicode that we have dealt with
earlier. First like before these attributes are exclusive and we know that we
are repeating ourselves in a manner of speaking.
These attributes decide how
strings will be passed across or marshaled to the other side. A value of ansi
mean that the native code or unmanaged code will receive or return the string
as an ansi string. This normally is the way C/C++ stores strings. Unicode
specifies that the string is in unicode the international standard which everyone
follows today.
The safest is autochar that
chooses whatever is natural for that platform, ansi for Win 95, unicode for Win
2000. The calling conventions specify issues like how the parameters are seen
on the stack and they are a large number of them.
The oldest is cdecl which is
the standard originally followed by the
C programming language. Windows programming in C bought in the stdcall
calling convention. The this pointer will be explained later and this introduced
the thiscall. There are variations of the C calling conventions like fastcall.
To get out of the mess we
have platformapi that says like autochar, use whatever is appropriate for the platform. Once again we have to use
winapi instead of platformapi. These calling conventions are for native code
and not for managed code where we have no choice and take whatever Microsoft
gives us.
Like always, there are two
attributes specific to Windows, lasterr and nomangle. In the good days of C
programming we would use lasterr to get at the last error. When Windows first came about, there was a function
called MessageBox amongst others that gave us what the function name says a
Message Box.
All was well until unicode
arrived on the seen. This created a problem as strings had to be passed to this
function. The solution was that lets have two functions, MessageBoxA for the
ascii or ansi verison and MessageBoxW for the unicode version.
I would write my code using
the function name MessageBox and depending upon the platform I run on, the
appropriate A(ascii) or W(widechar)
would be added to the function name. Thus the programmer was insulated for
knowing anything about ansi or unicode. Remember there is no function called
MessageBox in Windows any more.
We know all this because we
have used computers for a very long time. Thus the attribute nomangle indicates
that the name of the dll should be used as we have written it and no adding the
ending A or W that normally would happen. We can also call unmanaged functions
as briefly mentioned above using function pointers.
The way we call functions
using function pointers is the same for managed or unmanaged functions. The
little we do is tag the unmanaged function with the pinvokeimpl attribute.
There is only one table that
stores the information about unmanaged functions that can be called from
managed functions using the PInvoke dispatch. To sum up again, each row in the
ImplMap table tells us the method row number in the method table and the name
of the method in the dll whose is specified in the module ref table.
Thus each time a call is
made to any method, the CLI will first look at this table and if the coded
index MemberForwarded matches the method number, it will call the function
specified by the field InportName that resides in the extern module specified
by the ImportScope field.
Finally there is the
MappingFlags field. In the Microsoft world, the calling convention attribute
can only have the values winapi, cdecl and stdcall. The values fastcall and
thiscall are not allowed. This is for information purposes only as most of time
ilasm does not like to follow the specs we are reading.
Program21.csc
public void
DisplayAllMethods (int typerow)
{
methodstring = methodstring
+ "/*06" + methodindex.ToString("X6") + "*/ " ;
string parammarshalstring =
"";
parammarshalstring =
GetParamAttrforMethodMarshal(methodindex , 0);
Console.WriteLine(methodstring
+ parammarshalstring);
}
}
public string
GetParamAttrforMethodMarshal (int methodindex , int seq )
{
string returnstring =
"";
if (ParamStruct == null)
return returnstring;
int end;
int start =
MethodStruct[methodindex].param;
if ( methodindex ==
(MethodStruct.Length - 1) )
end = ParamStruct.Length +
1;
else
end = MethodStruct[methodindex+1].param;
if ( start ==
ParamStruct.Length)
return returnstring;
if ( start == end)
return returnstring;
if ( seq == 0 &&
ParamStruct[start].sequence != 0)
return "";
int pattr =
ParamStruct[start].pattr;
returnstring =
DecodeParamAttributes(pattr , 1 , start , 0x2000);
if ( returnstring !=
"" && returnstring[0] == 32)
returnstring =
returnstring.Remove(0 , 1);
return returnstring;
}
public string
DecodeParamAttributes(int pattr , int tabletype , int start , int bytemask)
{
string returnstring =
"";
if ( (pattr & bytemask)
== bytemask)
{
int ii ;
for ( ii = 1 ; ii <=
FieldMarshalStruct.Length ; ii++)
{
int coded =
FieldMarshalStruct[ii].coded;
int table =
FieldMarshalStruct[ii].coded & 0x01;
coded = coded >> 1;
if ( coded == start
&& tabletype == table)
break;
}
int blobindex =
FieldMarshalStruct[ii].index;
int length , howmanybytes;
howmanybytes =
CorSigUncompressData(blob , blobindex, out length);
//Console.WriteLine("{0}
{1} {2} {3}" ,blob[blobindex].ToString("X") ,
blob[blobindex+1].ToString("X"),blob[blobindex+2].ToString("X"),blob[blobindex+3].ToString("X")
);
int blobvalue =
blob[blobindex+howmanybytes];
string ss1 =
GetMarshallType(blob[blobindex+howmanybytes] , howmanybytes , blobindex);
if ( ss1 == "[]"
|| ss1.IndexOf("[ + ") != -1 || ss1 == "" || ( ss1.Length
>= 2 && ss1[0] == '[' && ss1[ss1.Length-1] == ']' ))
returnstring = "
marshal(" + ss1;
else
returnstring = "
marshal( " + ss1;
returnstring = returnstring
+ ")";
}
if ( returnstring !=
"")
returnstring = returnstring
+ " ";
return returnstring ;
}
public string
GetMarshallType (byte marshalflags , int howmanybytes , int blobindex)
{
//Console.WriteLine("...{0}
{1} {2} {3} {4}" ,blob[blobindex] ,
blob[blobindex+1].ToString("X"),
blob[blobindex+2].ToString("X"),
blob[blobindex+3].ToString("X") ,
blob[blobindex+4].ToString("X") );
if ( blob[blobindex] == 0)
return "";
if ( marshalflags == 0x01)
return "void";
if ( marshalflags == 0x02)
return "bool";
if ( marshalflags == 0x03)
return "int8";
if ( marshalflags == 0x04)
return "unsigned
int8";
if ( marshalflags == 0x05)
return "int16";
if ( marshalflags == 0x06)
return "unsigned
int16";
if ( marshalflags == 0x07)
return "int32";
if ( marshalflags == 0x08)
return "unsigned
int32";
if ( marshalflags == 0x09)
return "int64";
if ( marshalflags == 0x0a)
return "unsigned
int64";
if ( marshalflags == 0x0b)
return "float32";
if ( marshalflags == 0x0c)
return "float64";
if ( marshalflags == 0x0D)
return "syschar";
if ( marshalflags == 0x0e)
return "variant";
if ( marshalflags == 0x0f)
return "currency";
if ( marshalflags == 0x10)
return "*";
if ( marshalflags == 0x11)
return "decimal";
if ( marshalflags == 0x12)
return "date";
if ( marshalflags == 0x13)
return "bstr";
if ( marshalflags == 0x14)
return "lpstr";
if ( marshalflags == 0x15)
return "lpwstr";
if ( marshalflags == 0x16)
return "lptstr";
if ( marshalflags == 0x17)
{
int uncompressedbyte;
CorSigUncompressData(blob ,
blobindex+howmanybytes+1 , out uncompressedbyte);
return "fixed sysstring
[" + uncompressedbyte.ToString() + "]";
}
if ( marshalflags == 0x18)
return
"objectref";
if ( marshalflags == 0x19)
return "iunknown";
if ( marshalflags == 0x1a)
return
"idispatch";
if ( marshalflags == 0x1b)
return "struct";
if ( marshalflags == 0x1c)
return
"interface";
if ( marshalflags == 0x1d)
{
string returnstring =
"safearray";
if ( blob[blobindex] > 1)
{
string dummy =
GetSafeArrayType(blob[blobindex+howmanybytes+1]);
if ( dummy != "")
returnstring = returnstring
+ " " + dummy;
}
int len = blob[blobindex] -
3;
if ( len > 0)
{
returnstring = returnstring
+ ", \"" ;
for ( int iii = 0 ; iii <
len ; iii++)
returnstring = returnstring
+ (char)blob[blobindex+iii+howmanybytes+3] ;
returnstring = returnstring
+ "\"" ;
}
return returnstring;
}
if ( marshalflags == 0x1e)
{
int uncompressedbyte;
CorSigUncompressData(blob ,
blobindex+howmanybytes+1 , out uncompressedbyte);
return "fixed array
[" + uncompressedbyte.ToString() + "]";
}
if ( marshalflags == 0x1f)
return "int";
if ( marshalflags == 0x20)
return "unsigned
int";
if ( marshalflags == 0x21)
return "nested
struct";
if ( marshalflags == 0x22)
return "byvalstr";
if ( marshalflags == 0x23)
return "ansi
bstr";
if ( marshalflags == 0x24)
return "tbstr";
if ( marshalflags == 0x25)
return "variant
bool";
if ( marshalflags == 0x26)
return "method";
if ( marshalflags == 0x27)
return "";
if ( marshalflags == 0x28)
return "as any";
if ( marshalflags == 0x29)
return "";
if ( marshalflags == 0x2a)
{
/*
for ( int i = 0 ; i <=
blob[blobindex] ; i++)
Console.Write("{0}
" , blob[blobindex+i].ToString("X"));
Console.WriteLine();
*/
string returnstring =
"";
string arrays =
"[]";
string dummy1 =
"";
if ( blob[blobindex] == 3)
{
dummy1 = " ";
arrays = "[ + " +
blob[blobindex+2+howmanybytes].ToString() + "]";
}
if ( blob[blobindex] == 4)
{
dummy1 = "";
if (
blob[blobindex+2+howmanybytes] != 0)
arrays = "[" +
blob[blobindex+3+howmanybytes].ToString() + " + " +
blob[blobindex+2+howmanybytes].ToString() +
"]";
else
arrays = "[" +
blob[blobindex+3+howmanybytes].ToString() + "]";
}
if ( blob[blobindex] >=
7)
{
int howmanytypes =
blob[blobindex]/3;
returnstring =
GetMarshallType(blob[blobindex+howmanybytes+howmanytypes] ,howmanybytes ,
blobindex);
if (
blob[blobindex+1+howmanybytes+howmanytypes] != 0)
arrays = "[" +
blob[blobindex+2+howmanybytes+howmanytypes].ToString() + " + " +
blob[blobindex+1+howmanybytes+howmanytypes].ToString() + "]";
else
arrays = "[" +
blob[blobindex+2+howmanybytes+howmanytypes].ToString() + "]";
returnstring = returnstring
+ arrays ;
for ( int i = 1 ; i <
howmanytypes ; i++)
{
if (
blob[blobindex+howmanybytes+howmanytypes+i*2+2] == 0)
returnstring = returnstring
+ " " + GetMarshallType(blob[blobindex+howmanybytes+howmanytypes+i*2+1]
,howmanybytes , blobindex);
else
returnstring = returnstring
+ " " +
GetMarshallType(blob[blobindex+howmanybytes+howmanytypes+i*2+2] ,howmanybytes ,
blobindex);
}
return returnstring;
}
if ( blob[blobindex+howmanybytes+1]
== 0x50)
returnstring = arrays;
else
returnstring = dummy1 +
GetMarshallType(blob[blobindex+howmanybytes+1] ,howmanybytes , blobindex) +
arrays;
return returnstring;
}
if ( marshalflags == 0x2b)
return "lpstruct";
if ( marshalflags == 0x2c)
{
int len = 0;
int howmanybytes1 = 0;
howmanybytes1 =
CorSigUncompressData(blob , blobindex + howmanybytes+3 , out len );
string returnstring =
"custom (\"";
for ( int ii1 = 0 ; ii1 <
len ; ii1++)
returnstring = returnstring
+ (char)blob[blobindex+3+ii1+howmanybytes+howmanybytes1];
returnstring = returnstring
+ "\"" + "," ;
int len1 = len;
int bytes = 1;
if ( len1 >= 128)
bytes = 2;
howmanybytes1 =
CorSigUncompressData(blob , blobindex + howmanybytes+3+len1+bytes , out len );
returnstring = returnstring
+ "\"" ;
for ( int ii1 = 1 ; ii1
<= len ; ii1++)
returnstring = returnstring
+
(char)blob[blobindex+3+len1+ii1+howmanybytes+howmanybytes1];
returnstring = returnstring
+ "\")";
return returnstring;
}
if ( marshalflags == 0x2d)
return "error";
return "Unknown";
}
public string
GetSafeArrayType (byte safearraytype)
{
string returnstring =
"";
if (safearraytype == 0)
returnstring = "";
if (safearraytype == 1)
returnstring =
"null";
if (safearraytype == 2)
returnstring =
"int16";
if (safearraytype == 3)
returnstring =
"int32";
if (safearraytype == 4)
returnstring =
"float32";
if (safearraytype == 5)
returnstring =
"float34";
if (safearraytype == 6)
returnstring =
"currency";
if (safearraytype == 7)
returnstring =
"date";
if (safearraytype == 8)
returnstring =
"bstr";
if (safearraytype == 9)
returnstring =
"idispatch";
if (safearraytype == 0x0a)
returnstring =
"error";
if (safearraytype == 0x0b)
returnstring =
"bool";
if (safearraytype == 0x0c)
returnstring =
"variant";
if (safearraytype == 0x0d)
returnstring =
"iunknown";
if (safearraytype == 0x0e)
returnstring =
"decimal";
if (safearraytype == 0x0f)
returnstring =
"illegal";
if (safearraytype == 0x10)
returnstring =
"int8";
if (safearraytype == 0x11)
returnstring =
"unsigned int8";
if (safearraytype == 0x12)
returnstring =
"unsigned int16";
if (safearraytype == 0x13)
returnstring =
"unsigned int32";
if (safearraytype == 0x14)
returnstring =
"int64";
if (safearraytype == 0x15)
returnstring =
"unsigned int64";
if (safearraytype == 0x16)
returnstring = "int";
if (safearraytype == 0x17)
returnstring =
"unsigned int";
if (safearraytype == 0x18)
returnstring =
"void";
if (safearraytype == 0x19)
returnstring =
"hresult";
if (safearraytype == 0x1a)
returnstring =
"*";
if (safearraytype == 0x1b)
returnstring = "safearray";
if (safearraytype == 0x1c)
returnstring =
"carray";
if (safearraytype == 0x1d)
returnstring =
"userdefined";
if (safearraytype == 0x1e)
returnstring =
"lpstr";
if (safearraytype == 0x1f)
returnstring =
"lpwstr";
if (safearraytype == 0x20)
returnstring =
"illegal";
if (safearraytype == 0x21)
returnstring =
"illegal";
if (safearraytype == 0x22)
returnstring =
"illegal";
if (safearraytype == 0x23)
returnstring =
"illegal";
if (safearraytype == 0x24)
returnstring =
"record";
if (safearraytype >=
0x25)
returnstring =
"illegal";
return returnstring;
}
e.il
.class a1
{
.method void marshal () a1()
{
}
.method void marshal ( int8) a2()
{
}
.method void marshal ( fixed sysstring [12]) a3()
{
}
.method void marshal ( safearray) a4()
{
}
.method void marshal ( safearray int8) a5()
{
}
.method void marshal ( safearray int8 ,
"hi") a6()
{
}
.method void marshal ( int16 [+4] ) a7()
{
}
.method void marshal ( int16 [] ) a8()
{
}
.method void marshal ( int8 [0] ) a9()
{
}
.method void marshal ([7] ) a10()
{
}
.method void marshal (custom ("AB" ,
"CDEF") ) a11()
{
}
.method void marshal (custom ("AB" ,
"") ) a11()
{
}
.method void marshal (int8 [4 + 5] ) a12()
{
}
}
Output
.method /*06000001*/ marshal()
.method /*06000002*/ marshal( int8)
.method /*06000003*/ marshal( fixed sysstring [12])
.method /*06000004*/ marshal( safearray)
.method /*06000005*/ marshal( safearray int8)
.method /*06000006*/ marshal( safearray int8, "hi")
.method /*06000007*/ marshal( int16[ + 4])
.method /*06000008*/ marshal( int16[])
.method /*06000009*/ marshal( int8[0])
.method /*0600000A*/ marshal([7])
.method /*0600000B*/ marshal( custom
("AB","CDEF"))
.method /*0600000C*/ marshal( custom
("AB",""))
.method /*0600000D*/ marshal( int8[4 + 5])
e.il
.class a1
{
.method void marshal (int16 [4+5] [2] ) a122()
{
}
.method void marshal (int32 [4] [2][3] ) a123()
{
}
.method void marshal (unsigned int8 [4+5] [2][3][4]
) a124()
{
}
.method void marshal (unsigned int16 [+5]
[2][3][4][5] ) a125()
{
}
}
Output
.method /*06000001*/ marshal( int16[4 + 5] bool)
.method /*06000002*/ marshal( int32[4] bool int8)
.method /*06000003*/ marshal( unsigned int8[4 + 5] bool int8
unsigned int8)
.method /*06000004*/ marshal( unsigned int16[0 + 5] bool int8
unsigned int8 int16)
In this program we display
the marshal keyword. As explained before we have broken the various parts of
what comprises a method declaration and are doing each on separately. We are
not displaying the entire function DisplayAllMethods as most of the code is
repetitive.
All that we would like you
to do is remove the last two lines at the end or after the methodstring
variable and replace them with what we have above. We are calling a function
GetParamAttrforMethodMarshal that will contain the marshal attribute. But,
wait, we are moving ahead and first lets be clear on what this marshal is all
about.
It would be ideal if the CLI
ran on its own and did not need or been hosted on top of another operating
system. We are running .Net or the CLI under a operating system Windows 2000.
Under these operating systems some data types have a certain specific meaning
or they perform certain functions.
Thus we need a way to
convert the built in or our user defined data types to the native data types of
that operating system. This marshalling information is specified using the
keyword marshal. Every function may have a return type which is the value
returned by the native code in that operating system and we can use the marshal
keyword to convert it into a CLI data type.
Thus the marshal keyword
tells us the original data type in th operating system and we will use the
marshal keyword to convert it into a CLI data type that we have specified. If
we did not have the marshal keyword, how does our CLI know what is the return
value. It would assume the return value of the native code is equivalent to our
functions return type.
The marshal keyword comes in
to tell the CLI that the native function will return a certain data type and we
need to convert this data type to the data type that our IL function requires.
The same holds good for parameters to functions in a slightly different way.
Here we need to specify what the CLI data type needs to be converted to as the
marshal keyword specifies the data type that the native functions expects.
Just the reverse of what we
explained earlier. This means that in our function DisplayAllMethods, the GetParamAttrforMethodMarshal
functions gets called more than once and hence the different parameters and the
complexity of our code increases. The first parameter is the method row number
as this is what identifies each method uniquely in the metadata.
The second number is the
what we cal the sequence number. This value is either 0 or 1. At this point in
time its value is 0 and we will see its use a little later. At little while ago
we explained the concept of how a type can have a more than one method and how
a single field can tell us which method is owned by which type.
This is a one to many relationship
which also applies to methods and parameters as one method can have many
parameters. The param field stores the starting parameter index in the param
table of each method. When you have a good idea its nice to use it everywhere.
Thus we get two variables start and end that tell us the first and last param
index owned by this method in the param table.
We use these values for
error checks only. If the value of start points to the end of the param table,
it means that we have methods that have parameters as the ParamStruct table is
not empty, but this method and the ones following have no parameters at all and
also no marshal keyword. The second error check is when both the start and end
variables have the same value.
This means that these
methods have no parameters at all. The question uppermost in your mind is what
does parameters have to do with the marshal keyword. At this point we are
talking not about the marshalling of parameters but that of the return value.
Does not matter. If the return value
has a marshal keyword, a row gets added into the param table.
But this method does not
have a parameter at all and thus the sequence field of the param table which
otherwise tells us the param number is
now zero. We have one last error check which checks for those cases where we
have parameters but no marshal keyword for the return type.
This checks says that if seq
is 0, we are checking for the return type being marshaled and thus the sequence
field must also be 0. We now need to decode the marshal keyword and use the
method DecodeParamAttributes to do the job for us. As both methods and fields
can be marshaled this function gets called more than once with different
parameters.
It is this function that
actually gives us the marshal keyword and we first check whether the first
character of the return string is a space using the read only indexer. If in
the affirmative, we remove this space. We do this because at times we do not
need the first space that we write out before the keyword marshal.
If you look closely at the
function DecodeParamAttributes we have the marshal keyword starting with a
space and for methods we remove this space. The return string may be null and
hence before accessing the string we need to make sure that it is non null. Let us now move on to the function
DecodeParamAttributes that does the bilk of the work in figuring out the
marshal keyword.
The first and last
parameters are used together. The Param table has a flags field that tells us
all about the attributes on a parameter. The return value is also taken to be a
parameter with a sequence number of 0. A bit mask of 0x2000 means that this
parameter has a marshal attribute.
Therefore the last parameter
also has a value of 0x2000. If a field has a marshal attribute associated with
it, the bit mask is different. Thus any parameter that has a marshal attribute,
the pattr field of the param table will have a bit mask of 0x2000.
This is why we start the
function by checking whether the pattr parameter has a bit mask of 0x2000 or if
there is a marshal attribute associated with the parameter or in this case the
return value. If the answer is in the affirmative, we now scan the FieldMarshal
table that has one row for every marshal keyword used.
This table uses a coded
index that has a single bit that points to the field or param table. Looking at
our code the value of 1 means the param table and 0 means the field table. The
second parameter to this function is the param table number owned by this
parameter or return value.
Thus to find a matching row
in the FieldMarshal table we need to check the param or field row number as
well as the table type at the same time. If we meet a match, we then use the
index variable ii to get at the index field which is nothing but an offset of
the marshal signature in the blob stream.
The variable blobindex now
holds an offset into the blob stream which as always starts with a length of
bytes that it controls. This is as a blob means something that is not defined
has no known structure. We first need to now two things, the number of bytes
that make up the blob marshal signature and its value.
We use our good old trusty
method CorSigUncompressData that returns if you forget the number of bytes that
make up the signature and the last out parameter the actual bytes. Normally the
marshal blob signature will not be greater than 127 bytes and in the following
code at times we have directly read the length and not used the
CorSigUncompressData method.
We can like you get sloppy
at times. That is why we ask you to also make changes in the code and treat it
like a joint venture, between you and us. We then have a comment that displays
the first four bytes of the marshal signature. Where we use this will become
clear in a short while.
Stay with us as you will
then appreciate how we get things working. We call a function GetMarshallType
that takes three parameters. It is actually this function that decodes the blob
signature. The marshal signature starts with the length byte and then a byte
that describes the actual data type which will be marshaled to at the other
end.
It is this byte that we pass
to the GetMarshallType method followed by the length of the signature and also
the starting blob index. This function returns the marshal signature starting
with space that we may remove in the earlier function if you remember.
If the marshal
signature is the empty array braces []
or contains an array and a plus sign, the powers to be decided that the space
after the open bracket after the word marshal should not be there. Thus the if
statement removes a pesky space. We have spend a lot of if statements removing
a space or adding a space. What a life.
We then close the bracket
and if there is a valid marshal signature we add a space at the end of the marshal
signature. See one more if statement. Now lets move on the GetMarshallType
function. The first thing we do in this function is check the length of the
blob signature.
If it is zero, then we have
no signature and bail out. This means that we have the marshal keyword but no
data type within brackets like in method a1 in the il file. This for some
reason is a valid marshal keyword. Normally we use the marshal keyword as in
function a2 where we specify a data type like int8.
Thus we now have a series of
if statements that check the value of the first parameter marshalflags with a
predefined set of values. When we come to a value of 0x17 which has a data type
of fixed sysstring, where we have to pass it a number in square brackets like
method a3. While most of the data types are simple a value of 0x1d is a little
more complex as it stands for safearray.
The safearray data type may
or may not followed by data type like method a4 or a5. We first call a method
GetSafeArrayType that we pass the byte following the length and the marshal
type, i.e. the third byte of the blob signature. This function once again
checks the byte against a predefined set of values and returns is the safe
array data type.
There is nothing in this
function that will surprise us and it is a long series of if statements. We
only add a extra space if the GetSafeArrayType method returns a data type. The
if statement is important as we have a safearray data type only if the length
of the marshal signature is larger than 2. We then tell ourselves that the
safearray data type can have a comma followed by a string in double inverted
commas.
This is unique to the
safearray data type and will occur only if the length is larger than 3. The 3
bytes are taken by the length, the safearray data type and the data type
following the safe array. We thus reduce the length byte by 3 and if this
result is larger than 0, we print out the string in double quotes with a comma.
The value 0x1e is for a fixed array which is similar to the fixed sysstring.
A value of 0x2a in our
opinion is the most complex as it deals
with an array. A array starts with a data type and then may have a number in
brackets. The data type however is optional. Thus the array data type if it is
three bytes large would mean that the first byte is the length that we do not
count, the second the array data type value 0x2a, the third the data type
before the array and the last the
number in brackets with the plus sign.
This is represented by the
method a7.
However for the method a8
the length is 2 bytes and the second byte is the byte 2a followed by the array
data type. This is as the brackets have no value. For function a9 we have a
length of four as the first two bytes are the same as above and we are followed
by two zeroes. The last method a10 is special as we have no data type.
Here the first two bytes are
the same followed by a 0x50 which means no data type, then followed by a zero
and then the number in the brackets. In our code the variable dummy1 simply
contains a space or null depending upon the length of the blob signature. A
value of three means a space, 4 no space. The string arrays has the number in
array brackets with the plus sign with or without the space before the plus
sign.
The default is the empty
array brackets. This number is either the second or third byte of the blob
signature depending upon the length of the signature. The first byte after the marshal data type is always the data
type of the array and thus we use the GetMarshallType type function to return
this value to us.
Once again a use of
recursion as we are in the same function and calling the same function. Lets
take function a12 to demonstrate all the bytes of a blob signature for a array.
We have added two numbers 4 and 5 together. The blob signature reads as 4 2A 3
5 4. The first as always is the length of the blob signature without
considering the length byte itself.
This is followed by the
array type 0x2a. Then comes the data type of the array int8 or 3. this is
followed by the two number in brackets 5 and 4. Lets now turn our attention to
the second il file and lets understand the complexities here and why we have
broken our array signature into two separate il files.
We start our array if
statement with a for loop that simply displays the array signature using the
first byte as the length. In our program we have commented it out, but this is
the way we have seen the blob signature. For the first function a122 the
marshal keyword is marshal (int16 [4+5] [2] ).
We have added a 2 in square
brackets after the first array gets over. This results in the blob signature
look like 7 2A 2A 5 5 4 0 2. The length which is normally 4 gets increased by
3. The three bytes are made up of a extra 0x2a as well as a 0 and the number 2
which we wrote.
If you now look at function
a123 which has the marshal keyword as marshal (int32 [4] [2][3] ), we have
added two array dimensions to it. The bytes it creates are A 2A 2A 2A 7 0 4 0 2
0 3. There is a further increase of 3 bytes with a 0x2a in the beginning and a
0 and the number we write 3 at the end.
Thus each extra dimensions
adds a 0x2a as well as two bytes representing the dimension at the very end. We
now need to handle this special case. As mentioned earlier, the length is
normally 3 or 4. If it is greater than 7, we kick off a new if statement. We
start with figuring out how many dimensions we have.
This number can be obtained
by dividing by 3 as each new dimension increases the length by 3. We store this
value in the variable howmanytypes. The main data type of the array is stored
after all the 0x2a’s get over and thus we now need the offset of where the
array data type starts.
We start at blobindex and
then add the number of bytes this length takes and then the number of 0x2a’s
that we have. We do not add by 1 as there is always one 0x2a. We use the
GetMarshallType method to return this data type as a string and then use the
same method earlier to get at the first array dimension with or without the
plus sign.
We then need to add the
final data types. The variable how many bytes gives us one more than the number
of data types and hence the for loop is one less. We use the variable
howmanytypes to get at the end of the normal signature and then we hit a
roadblock.
The length of the array
signature can be 3 or 4 and hence if we hit a 0, we know that the length is 3
and hence we increase by 1 and not 2. we display these dimensions as a data
types. We now move on to the original file e.il.
If the first data type is
0x50, this means that we do not have a data type and hence the returnstring is
only the square brackets. The last value is the custom data type which is
nothing but the words custom followed by two strings. The format is the length
of the string, followed by 0x2c and then two zeros.
This is followed by the
length of the first string, then the actual string. When this string gets over,
we then have the second string and its contents. Thus we first figure out the
length of the first string that is 3 bytes from the data type and in a for loop
print out the string.
We then jump to the length
of the second string knowing that its length is stored in the variable len1 and
we need to go one more and hence we add by the value stored in the variable
bytes. This can be 1 or 2 depending upon the length of the two strings.
If the sum exceeds 128, then
the length of the blob signature byte will be 2 and not one and hence if we
assume a value of 1, we will be reading the last byte of the first string as
the length byte. The rest of the code remains the same.
There are a minimum set of
data types that have to b supported by the CLI and these are int8, int16,
unsigned int8, bool , char and all the native integer data types. Remember
under Windows all code is written in the C programming language.
The list of data types goes
on and this includes enums that are glorified constants and the floating point
data types float32 and float64. Even though C and c++ do not support the string
data type it is common enough to be included in the list of mandatory data
types.
Obviously pointer to the
above data types are also included along with one dimensional arrays who start
counting from zero. These conversions are from managed to unmanaged and need
not be supported form unmanaged to managed i.e. for the return types. Delegates
and pointers to functions are not the same and thus a delegate cannot be used
in unmanaged code.
The marshal keyword is the
only interoperable keyword available and lets us work closely with older legacy
code. It is platform specific and will not work across the board. This means
that the windows implementation will never ever work say on Linux. Once again
the marshal keyword specifies the data type that the managed code will be
converted to when it goes to unmanaged code.
The system however has a lot
of default rules that govern what happens when we do not use this keyword. The
problem arises with the use of user defined types or classes and the CLI does
not require this marshalling from all its conforming implementations.
Each implementation decides
how to marshal user defined types and the system imposes no restrictions. This
will guarantee that code generated will not be portable but it is a price to
pay as user defined types are too generic for anyone to impose rules. The FieldMarshal
table has only columns which we have used earlier.
It is obvious that this
table is only used by code that calls into unmanaged code. Once the code calls
unmanaged code, we are lying outside the regime of the CLI and thus we are
assuming that the code we call does not break any rules.
The question uppermost in
your mind is how did we figure out all the data types that we could use with
the marshal. Elementary, you might say, peek into the specs. We did just that
and realized that there were huge gaps in the data types that we specified in
the docs. Thus we were in a quandary.
How do we figure out all of
them. Even though we went though 5000 file, we were yet not sure whether we had
it all sewed up. Thus we first displayed the marshal signature bytes and then
searched for them in a hex editor like ultra edit that can be downloaded free
from the net.
These bytes will always be
after the BSJB signature. Now that we have found the bytes, we change the byte
that contains the say data type and then save the file. We next call the ildasm
program which now tells us what data type that value stood for. Simple is it
not. This is how we could figure out what the specs did not contain.
We are telling you all this
as this as the best way to learn. Change the bytes in the table itself and see
what the disassembler has to say about the change. This is why everywhere you
will find us display the bytes.
Program22.csc
public void
DisplayAllMethods (int typerow)
{
methodstring = methodstring
+ "/*06" + methodindex.ToString("X6") + "*/ " ;
string paramattrstring =
"";
paramattrstring =
GetParamAttrforMethodCalling (methodindex);
Console.WriteLine(methodstring
+ paramattrstring);
}
}
public string
GetParamAttrforMethodCalling (int methodindex)
{
string returnstring =
"";
if (ParamStruct == null)
return returnstring;
int end;
int start =
MethodStruct[methodindex].param;
if ( methodindex ==
(MethodStruct.Length -1) )
end = ParamStruct.Length +
1;
else
end = MethodStruct[methodindex+1].param;
if ( start == ParamStruct.Length)
return returnstring;
if ( start == end)
return returnstring;
if
(ParamStruct[start].sequence != 0)
return "";
int pattr =
ParamStruct[start].pattr;
if ( (pattr & 0x01) ==
0x01)
returnstring = returnstring
+ "[in]" ;
if ( (pattr & 0x02) ==
0x02)
returnstring = returnstring
+ "[out]" ;
if ( (pattr & 0x10) ==
0x10)
returnstring = returnstring
+ "[opt]" ;
if ( returnstring !=
"")
returnstring = returnstring
+ " ";
return returnstring ;
}
e.il
.class a1
{
.method [in] void a1()
{
}
.method [out][opt] void a2()
{
}
}
Output
.method /*06000001*/ [in]
.method /*06000002*/ [out][opt]
Parameters and return values
can have a parameter attribute of in, out or optional. These are part of the
parameter definition and not part of the method signature. The above attributes
are associated with parameters and not really with return values. The in and
out attributes apply to pointers of either managed or unmanaged types.
All that they say is whether
the parameter supplies a value to the function in or the function fills it up
with a value or both. The default is in. However the CLI does not worry about
whether this contract is being enforced.
This helps the CLI in
optimizations for distributed computing, specially if it is a in parameter, the
value needs to send across to another computer if the called function resides
there and we do not worry about the return value. For a out parameter it is the
reverse, we do not send a value across but the return value is meaningful.
The opt value means that
from the programmers point of view, the
value is optional. A little later we will deal with the .param keyword that
will supply opt parameters with values.
The method DisplayAllMethods now calls a method
GetParamAttrforMethodCalling to figure out these values.
In this method we start with
the mandatory error checks and then check whether we have a attribute or not.
If the sequence field is not zero we abort with a null string as this means our
return value does not have any param attributes.
We then bit wise and with 1,
2 or 0x10 to check which of the 3 param attributes we have. The only thing to
remember is that these attributes are not mutually exclusive. A small program
after a long time. Do not expect such small mercies for a long time.
Progarm23.csc
string []
methoddefreturnarray;
string []methoddeftypearray;
int [] methoddefparamcount;
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
DisplayModuleRefs();
DisplayAssembleyRefs();
CreateSignatures();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
DispalyVtFixup();
DisplayTypeDefs();
DisplayTypeDefsAndMethods();
}
public void CreateSignatures
()
{
if (MethodStruct != null)
{
methoddefreturnarray = new
string[MethodStruct.Length];
methoddeftypearray = new string[MethodStruct.Length];
methoddefparamcount = new
int[MethodStruct.Length];
for ( int l = 1 ; l <
MethodStruct.Length ; l++)
{
CreateSignatureForEachType
(1 , MethodStruct[l].signature, l);
}
}
}
public void
CreateSignatureForEachType (byte type , int index , int row)
{
//Console.WriteLine(".......type={0}
row={1} index={2} blob.Length={3} {4}" , type ,
row.ToString("X") , (ushort)index , blob.Length , (uint)index);
int uncompressedbyte , count
, howmanybytes;
howmanybytes = CorSigUncompressData
(blob , index , out uncompressedbyte);
count = uncompressedbyte;
byte [] blob1 = new
byte[count];
Array.Copy(blob , index +
howmanybytes , blob1 , 0 , count);
if ( type == 1)
CreateMethodDefSignature
(blob1 , row);
}
public void DisplayAllMethods
(int typerow)
{
methodstring = methodstring
+ "/*06" + methodindex.ToString("X6") + "*/ " ;
string s = methodstring +
" " +
methoddefreturnarray[methodindex]+
" " + methoddeftypearray[methodindex] ;
Console.WriteLine(s);
}
}
public void CreateMethodDefSignature
(byte [] blobarray , int row)
{
//Console.WriteLine("CreateMethodDefSignature
Array Length={0} method row={1} name={2}" , blobarray.Length , row ,
GetString(MethodStruct[row].name));
int aa = -1;
if ( row == aa)
{
Console.WriteLine(GetString(MethodStruct[row].name));
for ( int l = 0 ; l <
blobarray.Length ; l++)
Console.Write("{0}
" , blobarray[l].ToString("X"));
Console.WriteLine();
Console.WriteLine("Length
of array is {0}" , blobarray.Length);
}
int
howmanybytes,uncompressedbyte , count , index;
index = 0;
howmanybytes =
CorSigUncompressData (blobarray , index , out uncompressedbyte);
methoddeftypearray [row] =
DecodeFirstByteofMethodSignature (uncompressedbyte , row);
index = index +
howmanybytes;
howmanybytes =
CorSigUncompressData (blobarray , index , out uncompressedbyte);
count = uncompressedbyte;
methoddefparamcount[row] =
count;
index = index +
howmanybytes;
string returntypestring =
"";
returntypestring =
GetElementType(index , blobarray , out howmanybytes );
methoddefreturnarray [row] =
returntypestring;
}
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type >= 0x01
&& type <= 0x0e )
{
returnstring =
GetType(type);
howmanybytes = 1;
}
return returnstring;
}
public string
DecodeFirstByteofMethodSignature (int firstbyte , int methodrow)
{
string returnstring =
"";
if ( (firstbyte& 0x20 )
== 0x20 )
returnstring =
"instance ";
if ( (firstbyte & 0x40 )
== 0x40 )
returnstring =
"explicit instance ";
int firstbits = firstbyte
& 0xf;
if ( firstbits == 0x02 )
returnstring = returnstring
+ "unmanaged stdcall ";
else if ( firstbits == 0x03 )
returnstring = returnstring
+ "unmanaged thiscall ";
else if ( firstbits == 0x05 )
returnstring = returnstring
+ "vararg ";
else if ( firstbits == 0x01 )
returnstring = returnstring
+ "unmanaged cdecl ";
else if ( firstbits == 0x04 )
returnstring = returnstring
+ "unmanaged fastcall ";
return returnstring;
}
public string GetType (int
typebyte)
{
if ( typebyte == 0x01)
return "void";
if ( typebyte == 0x02)
return "bool";
if ( typebyte == 0x03)
return "char";
if ( typebyte == 0x04)
return "int8";
if ( typebyte == 0x05)
return "unsigned
int8";
if ( typebyte == 0x06)
return "int16";
if ( typebyte == 0x07)
return "unsigned
int16";
if ( typebyte == 0x08)
return "int32";
if ( typebyte == 0x09)
return "unsigned
int32";
if ( typebyte == 0x0a)
return "int64";
if ( typebyte == 0x0b)
return "unsigned
int64";
if ( typebyte == 0x0c)
return "float32";
if ( typebyte == 0x0d)
return "float64";
if ( typebyte == 0x0e)
return "string";
return "unknown";
}
e.il
.class a1
{
.method explicit instance bool a1()
{
}
.method instance int16 a2()
{
}
.method vararg void a3()
{
}
.method default int8 a4()
{
}
.method unmanaged stdcall int8 a5()
{
}
.method unmanaged thiscall int8 a6()
{
}
.method unmanaged cdecl int8 a7()
{
}
.method unmanaged fastcall int8 a8()
{
}
}
Output
.method /*06000001*/
bool explicit instance
.method /*06000002*/
int16 instance
.method /*06000003*/
void instance vararg
.method /*06000004*/
int8 instance
.method /*06000005*/
int8 instance unmanaged stdcall
.method /*06000006*/
int8 instance unmanaged thiscall
.method /*06000007*/
int8 instance unmanaged cdecl
.method /*06000008*/
int8 instance unmanaged fastcall
In this program we display
some more stuff about a method like its calling convention as well as the data
type of the return value. The problem with the data type is that it fills up hundreds
of pages and hence we have broken up the data types into dozens of programs.
Thus the next couple of
programs only focus on the data types that a return value can carry. We have
three instance arrays that will carry the calling convention methoddeftypearray,
the data type of the return value methoddeftypearray and finally the number of
parameters that the method has.
These variables are arrays
as we will have scores of functions in our il code. If you look at the new abc
method, we have a added a method CreateSignatures that simply create all the
method signatures in one go and populate our arrays. Thus in our code later on
we simply display the relevant array members.
In the CreateSignatures
method we first make sure that we have at least one method in our code as there
is no point in computing signatures if we have no methods to deal. It is here
that we first create the three arrays of the desired sizes using the length of
the message table as the size of the array.
We then use a for loop to call
another method CreateSignatureForEachType that does the actual work. We have
broken up our code into different functions as we need to calculate different
types of signatures. These include those for local variables, fields etc.
Thus the first parameter is
1 as we have used this number to denote a method def signature. A method def is
a method that we have defined, a method ref a method we are calling that is
defined somewhere else. The second parameter is the field signature that is an
offset into the blob heap where the signature of the method is located.
The third is the row number
of each method. We now move on the function CreateSignatureForEachType where we
do some work. All the signature to start with have a common rule. The first
byte is the length of the signature as we are dealing with the blob stream.
We use our function
CorSigUncompressData to get at the length as the signature may cross 127 bytes.
Now that we know the length of the signature, we create an array blob1 that is
of the same size. We copy the signature bytes minus the length byte into this
array using the static Copy method of the array class.
This methods first parameter
is the source array that contains the original data blob, then we have the
starting point in the original array blob from where we want to start the copy.
The index variable tells us the starting point and the howmanybytes the length
of the count bytes.
The third parameter is the
destination array blob1 and the fourth the starting point in the destination
array. We use 0 as we want to start the
copy from the beginning. Finally we have the length of the number of bytes to
copy which is the count variable. Thus all that we have done is create an array
blob1 that contains only the signature.
As the type parameter is 1
we now call the method CreateMethodDefSignature with the newly created array
and the row number. It is this function that does the actual grunt work.
The idea being that its
easier to work with a array that contains only the method signature than an
array that contains the same signature but at an offset. You do have to agree
with us on this one. Lets take a short detour and first move on the function
DisplayAllMethods.
All that we do here is
display the contents of two of the three arrays that the
CreateMethodDefSignature populates. In the method CreateMethodDefSignature we
start with displaying the entire signature bytes of the function. The row
number can never be –1 and hence the display code never gets code.
We change the aa variable to
the row number whose signature we want to look at. The first byte of the method
def signature includes two things. In a method we use either the words explicit
or explicit instance like in method a1 or only instance in method a2.
Then comes in the variable
number of arguments vararg in method a3 or the default which is default as in
method a4. The hasthis and explicitthis is ored with the two calling convention
values default or vararg. Even though we know that this byte will always be
one, we yet use the CorSigUncompressData to extract its value.
To figure out which bits
stand for what we use the DecodeFirstByteofMethodSignature to populate the
methoddeftypearray array. This method is like what we have done before and we
hope you understand why the order of the second round of if statements is very
important.
Even though the method def
rules only use two calling conventions, we have included four more as the
others signatures use them. Also if we tag our methods with these calling
conventions like unmanaged stdcall, ildasm does not complain. Also using the
default keyword is not an error but it does not show up in the disassembled
output.
The second byte is the
number of parameters that we have. We may have more than 127 parameters and
hence we use the CorSigUncompressData to get at this value. Before using this
function we need to add to the index variable the number of bytes taken up by
the earlier field which in this case is 1.
We will keep adding the
variable howmanybytes to the index variable. We store the number of parameters
in the array methoddefparamcount for future use and now call a very important
method GetElementType that will give us the data type of the return value of
the function.
This is the next bit of
information stored in the signature. We pass the GetElementType method three
parameters. The first is the starting byte of the return data type stored in
the index variable, the array blobarray and finally the number of bytes of the
signature the return data type takes up.
This number could be in the
hundreds as you will soon see. The return value of this function, the actual
data type is stored in the array methoddefreturnarray. The first byte of the
data type signature tells us about the rest of the bytes.
If its value is between is 1
and 14, then the data type is a very simple data type like bool or string or
int8. We thus use a if statement to tell us whether it is a simple data type
and use the GetType method to return this type. The GetType method is simply a
series of 14 if statements.
We return the value stored
in the variable returnstring and set the out variable howmanybytes to 1 as our
data type being simple takes only one byte in the signature. This is how we
take care of the elementary data types. The next example takes on more complex
data types and unless we finish all of them we will not proceed to do anything
else.
Program24.csc
public void
DisplayAllMethods (int typerow)
{
string s = methodstring +
" " +
methoddefreturnarray[methodindex] ;
Console.WriteLine(s);
}
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type >= 0x01
&& type <= 0x0e )
{
returnstring =
GetType(type);
howmanybytes = 1;
}
if ( type == 0x13 )
{
returnstring = "!"
+ blobarray[index+1].ToString();
howmanybytes = 2;
}
if ( type == 0x15 || type == 0x17 || type == 0x1e || type ==
0x21 )
{
returnstring = "/*
UNKNOWN TYPE (0x" + type.ToString("X") + ")*/";
howmanybytes = 1;
}
if ( type == 0x16)
{
returnstring =
"typedref";
howmanybytes = 1;
}
if ( type == 0x18)
{
returnstring = "native
int";
howmanybytes = 1;
}
if ( type == 0x19)
{
returnstring = "native
unsigned int";
howmanybytes = 1;
}
if ( type == 0x1a)
{
returnstring = "native
float";
howmanybytes = 1;
}
if ( type == 0x1c)
{
returnstring =
"object";
howmanybytes = 1;
}
if ( type == 0x45 )
{
int howmanybytes2 ;
returnstring =
GetElementType( index + 1 , blobarray , out howmanybytes2) + "
pinned";
howmanybytes = howmanybytes2
+ 1;
}
return returnstring;
}
e.il
.class a1
{
.method !20 a1()
{
}
.method typedref
a2()
{
}
.method native int a3()
{
}
.method native unsigned int a4()
{
}
.method native float a5()
{
}
.method object a6()
{
}
.method int8 pinned a7()
{
}
}
Output
.method /*06000001*/ !20
.method /*06000002*/
typedref
.method /*06000003*/
native int
.method /*06000004*/
native unsigned int
.method /*06000005*/
native float
.method /*06000006*/
object
.method /*06000007*/
int8 pinned
In this example we like in
the earlier one have simply handled some of the more easier types. All the code
remains the same but some if statement that we have added to the GetElementType
method. The first function a1 uses a depreciated type ! that stands for the var
type.
The specifications do not
specify this type and we followed the advice we gave you some time ago. We went
to the third byte of the signature and put 0x13 there. We then ran the
disassembler which came up with the ! type. This type is followed by a number
and hence the howmanybytes variable should be 2 as this is the length of the
type.
The values 15, 17, 1e and 21
are unknown. The others but the last are simple types. The last one which is 45
is a type but followed by the words pinned. This type 0x45 is followed by the
type and hence we call the GetElementType method with one added to the index
variable. This method returns the type and we simply add the words pinned to
it.
The howmanybytes variable is
what the GetElementType method returns plus 1.
Program25.csc
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type == 0x1d)
{
returnstring =
GetSzArray(index , blobarray , out howmanybytes);
}
return returnstring;
}
public string GetSzArray
(int index , byte [] blobarray , out int howmanybytes)
{
string returnstring =
"";
int i = 1;
returnstring =
"[]";
while ( true )
{
byte next =
blobarray[index+i];
if ( next != 0x1d )
break;
returnstring = returnstring
+ "[]";
i = i +1 ;
}
int howmanybytes2;
returnstring =
GetElementType(index + i , blobarray , out howmanybytes2) + returnstring;
howmanybytes = i +
howmanybytes2;
return returnstring;
}
e.il
.class a1
{
.method int8 [][][] a1()
{
}
}
Output
.method /*06000001*/
int8[][][]
In this example and the next
we deal with arrays. In the method GetElementType we add one more if statement
that checks if the type byte or the first byte is 0x1d. This type is for a
simple array that has no specific dimensions values. We can have as many
dimensions as we like.
If we see the signature we
find the following 1D 1D 1D 04. Thus each dimension we add brings in a extra
0x1d. At the end if the type signature is the actual data type. The fact that
the GetSzArray gets called simple means
that we have at the very least one dimension which is a must but can have more.
We initialize the
returnstring variable to a empty array brackets and then set out in a loop that
is indefinite as we do not know how many dimensions the array will have. We at
the beginning at the loop check whether the following byte is not a 0x1d. If
the answer is yes, we know that all is over and we exit from the loop.
If the next byte is a 0x1d,
we add a extra pair of array brackets or a extra dimension to the variable
returnstring. When we finally leave the indefinite while loop, we have the
returnstring variable contain the right number of array brackets and the
variable I tells us how many 0x1d’s we there.
We now call the method
GetElementType to get the type stored after the last 0x1d. This is why we add
the value of I to the index variable. The bytes to be returned are the number
of bytes taken up by the type itself and variable I, which as we told you
earlier gives us the number of 0x1d or the array dimensions.
Program26.csc
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type == 0x14 )
{
int howmanybytes2;
returnstring = GetArrayType(
blobarray , index , out howmanybytes2);
howmanybytes = howmanybytes2
+ 1;
}
return returnstring;
}
public string GetArrayType
(byte [] blobarray , int index , out
int howmanybytes)
{
string returnstring ;
int total = 1;
int uncompressedbyte;
int rank;
int numsizes;
int howmanybytes1;
returnstring =
GetElementType(index +1 , blobarray ,
out howmanybytes);
total = total +
howmanybytes;
returnstring = returnstring
+ "[";
howmanybytes1 =
CorSigUncompressData(blobarray , index + total, out uncompressedbyte);
total = total +
howmanybytes1;
rank = uncompressedbyte;
howmanybytes1 =
CorSigUncompressData(blobarray , index + total, out uncompressedbyte);
total = total +
howmanybytes1;
numsizes = uncompressedbyte;
int [] sizearray = new
int[numsizes];
for ( int l = 1 ; l <=
numsizes ; l++)
{
howmanybytes1 =
CorSigUncompressData(blobarray , index + total, out uncompressedbyte);
total = total +
howmanybytes1;
sizearray[l-1] =
uncompressedbyte;
}
howmanybytes1 =
CorSigUncompressData(blobarray , index + total, out uncompressedbyte);
total = total +
howmanybytes1;
int bounds =
uncompressedbyte;
int [] boundsarray = new
int[bounds];
//Console.WriteLine(".....rank={0}
numsizes={1} bounds={2} " , rank, numsizes,bounds);
if ( rank != 0 &&
bounds == 0 && numsizes == 0)
{
for ( int i = 1 ; i <
rank ; i++)
returnstring = returnstring
+ ",";
returnstring = returnstring
+ "]";
return returnstring;
}
int dots = 0;
for ( int l = 1 ; l <=
bounds ; l++)
{
howmanybytes1 =
CorSigUncompressData(blobarray , index + total, out uncompressedbyte);
total = total +
howmanybytes1;
int ulSigned =
uncompressedbyte & 0x1;
uncompressedbyte = uncompressedbyte >> 1;
boundsarray[l-1] =
uncompressedbyte ;
}
if ( numsizes == 0)
{
for ( int l = 0 ; l <
bounds ; l++)
{
returnstring = returnstring
+ boundsarray[l] + "..." ;
if ( l != (bounds-1) )
returnstring = returnstring
+ ",";
}
}
else
{
for ( int l = 0 ; l <
bounds ; l++)
{
if ( l < numsizes )
{
int upper = boundsarray[l] + sizearray[l] - 1 ;
if ( boundsarray[l] == 0 && sizearray[l] != 0 )
returnstring = returnstring + sizearray[l] ;
if (boundsarray[l] == 0 && sizearray[l] == 0)
returnstring = returnstring
+ "0" ;
else if (boundsarray[l] != 0 && sizearray[l] != 0)
returnstring = returnstring
+ boundsarray[l] + "..." + upper.ToString() ;
else if (boundsarray[l] != 0
&& sizearray[l] == 0)
returnstring = returnstring
+ boundsarray[l] + "..." ;
}
else
{
dots++;
returnstring = returnstring
+ boundsarray[l] + "..." ;
}
if ( l != bounds - 1 )
returnstring = returnstring
+ ",";
}
}
if ( numsizes != 0) //
method a6
{
int leftover = rank -
numsizes - dots ;
for ( int l = 1 ; l <=
leftover ; l++)
returnstring = returnstring
+ ",";
}
returnstring = returnstring
+ "]";
howmanybytes = total-1;
return returnstring;
}
e.il
.class a1
{
.method int8 [4] a1()
{
}
.method int16 [5] a2()
{
}
.method int32 [5,7,12] a3()
{
}
.method int32 [7,,,] a4()
{
}
.method int32 [0...3, 3...8 , 10...14] a5()
{
}
.method int32 [3...] a6()
{
}
.method int32 [6...9,1,13] a7()
{
}
.method int32 [,,,] a8()
{
}
.method int32 [6,,13] a9()
{
}
.method int32 [, 3...8 , 4... , 8...] a10()
{
}
.method int32 [, 3...8 , 4... , , 8... ,,,] a11()
{
}
.method int32 [,
, 4... , , 8...] a12()
{
}
.method int32 [,,6...9,1,13] a13()
{
}
.method int32 [8... , 4 , 5] a14()
{
}
}
Output
.method /*06000001*/ int8[4]
.method /*06000002*/
int16[5]
.method /*06000003*/
int32[5,7,12]
.method /*06000004*/
int32[7,,,]
.method /*06000005*/
int32[4,3...8,10...14]
.method /*06000006*/
int32[3...]
.method /*06000007*/
int32[6...9,1,13]
.method /*06000008*/
int32[,,,]
.method /*06000009*/
int32[6,0,13]
.method /*0600000A*/
int32[0,3...8,4...,8...]
.method /*0600000B*/
int32[0,3...8,4...,0...,8...,,,]
.method /*0600000C*/
int32[0...,0...,4...,0...,8...]
.method /*0600000D*/
int32[0,0,6...9,1,13]
.method /*0600000E*/
int32[8...,4,5]
In this example we deal with
how the array data type is handled. This is different from the earlier where we
did not specify a dimension along with the array. All that we have done is add
a if statement that checks for type 0x14 that represents a array.
We then call a method
GetArrayType that understands arrays. Lets first look at the method a1 that has
the simplest array int8 [4]. We will be extremely practical as the arrays can
be a pain in the neck. Thus each time we will see the signature also. In this
case it is 14 04 01 and this is only part of the signature, the part we are
trying to explain to you.
A array signature starts
with the number 0x14 and then follows the data type of the array. A number of 4
stands for int8. Thus the first thing we do in our GetArrayType function is
call the GetElementType method passing index+1 as index points to the array
type 0x14.
This is how we figure out
the data type of the array and we increase the variable total by howmanybytes
as the array data type can be as complex as we please. The second function a2
has the type int16 [5] and its signature
is 14 06 01 as the data type for a int16 is 06. The byte following the data
type is called the rank.
This specifies the number of dimensions which has to be 1
or more. We use the method CorSigUncompressData to pick it up for us as it may
be larger than 127. Looking at the method a3 whose array type is int16[5,7,12]
and signature is 14 06 03. The rank here is 3 as we have three dimensions in
the array.
This is different from a
double dimensional array which ahs multiple []. We will consider them later.
The rank is stored in our program in a variable called rank and the
returnstring variable will contain our actual array signature that we return
and each time we increase total by the return value of the CorSigUncompressData
method.
After the rank is the number
of sizes member. This gives us the number of dimensions that have a size. Lets
start with method a8, its type is int32[,,,] and its signature is 14 08 04 00.
The numsizes is zero as no dimension has any size at all. If we take method a9,
type int32 [6,,13], signature 14 08 03 03.
The numsizes is 3 and not 2
as the dimension that has no size is in the middle. Method a4 type int32[7,,,]
signature 14 08 04 01 says it better. The numsizes is 1 as only one dimension
has a value and it is the first. Thus the numsizes is the number of dimensions
that do not have a value, but counting from the last one that has no value.
The next series of bytes
tell us the size of each dimension. If the numsizes field is 3, the next three
bytes tell us the size of each dimension. Lets look at method a11, type int32
[, 3...8 , 4... , , 8... ,,,] and signature 14 08 08 02 00 06. As the numsizes
is only 2, the next two bytes tell us the size of each dimension.
The rank however is 8. The
first dimension is empty and hence it is a zero. The second dimension starts at
3 and end at 8. Thus its size is 6 as we count both 3 and 6. The ones that have
a upper undefined limit like 3… have no size as we have not specified a upper
bound.
Take another example,
method a7 type
int32[6...9,1,13]signature 14 08 03 03 04 01 0D. We have 3 sizes, the first is
from 6 to 9 and hence 4. The second is a 1 and the third 13. We do not know in
advance how many sizes we have an hence we create an array sizearray that is
numsizes large.
In a for loop we read each
byte and store it into the array sizearray. We also increase total by the
number of bytes each dimension takes up. We now handle a special case where we
have a rank which is a must but no numsizes or bounds. This means that the
array has no dimension that has a size and also no dimension that has a lower
bound.
This could only happen in
case of method a8 type int32[,,,] signature 14 08 04 00 00. Here we have a rank
of 4 and the next two numbers 0. we loop depending upon the number of
dimensions and keep adding a comma and then return the string. A special case
and now lets move to the last field.
When we leave the loop we
are at the next field that tells us how many dimensions have a lower bound as
the upper bound is optional. If we look at method a7 type int32[6...9,1,13]
bytes 14 08 03 03 04 01 0D 03 0C 00 00. We have to move to the 8th byte that is
a three. The have three lower bounds and the last two are zero as they are a
single bound.
Only the three dots come
into the picture here. The lower bound is 3 but we see the value 0x0c. Why the
discrepancy. This is because the lower bound is stored in a compressed form.
Here is how it works. We first take the lower bound and bit wise and with a one
to check whether the first bit is on or not. If it is on or set, then the byte
is compressed. In all our cases, there is no compression on the byte at all.
Thus the first bit is not
used to store the lower bound and is always zero. We then right shift the bytes
by 1. Thus 12 becomes 6 as by right shifting we are dividing by 2. Finally
lets take method a5, type
int32[4,3...8,10...14], signature 20 00 14 08 03 03 04 06 05 03 00 06 14.
We take the fourth last
number which tells is that we have three upper bounds. The first dimension is
not a range and hence its value is 0, the second is 6, we divide by 3 and we
get 3 the lower bound and the last is 20 divided by 2 is 10.
Finally method a11 sums it
up, type int32[0,3...8,4...,0...,8...,,,], signature 14 08 08 02 00 06 05 00 06
08 00 10. We start at the beginning as the signature is complex and we see a
array type 0x14. The first 08 is the data type for in32 and the
second 8 is the number of ranks as we have 8 dimensions. Count if you do not
believe us.
Then we have 2 sizes as only
the second has a size and the first who size is 0 gets in because of the
second. Then we have two bytes for the size of the dimensions, 0 and 6. This is
followed by the number of lower bounds which are 5.
The first has no lower bound
as it is a actual value and the second and third have a lower bound of 3 and 4
that show as double 6 and 8. The next has a lower bound of 0 and hence its zero
and this is followed by a lower bound of 8 that doubles to 16 or 0x10.
The last three dimensions
have no values and hence to save on signature space by not specifying a endless
number of zeroes they are ignored. This causes trouble for us as we know have
to account for all this optimizations in our code. We store the uncompressed
value of the lower bound in an array boundsarray like we did for the sizes.
Now is the time for actually
creating the array signature. For the moment we have simply filled up two
arrays. Lets look at function a12 type int32 [, , 4... , , 8...] , ildasm shows us int32[0...,0...,4...,0...,8...] signature 14 08 05 00 05 00 00 08
00 10.
As none of the dimension has
a size as they are either empty or do not have a upper bound, the numsizes
field is zero and hence the first if statement gets called. Thus we start with
a for loop and simply take what is there in the boundsarray and add a … to it.
The bounds and rank members
will be the same and if the dimension is 0 or empty, a zero gets displayed
instead. At the end if it is the last dimension, we do not place the comma and the if statement handles it. Now lets
move on to the else statement that has some complex code.
Like in the if, in the else
we also iterate in the for loop using the number of bounds as the index
variable. We do this as the rank is the theoretical number of dimensions. The
bounds are those that have a lower dimension which is a must. The difference
between rank and bounds are the last empty dimensions.
If there are no empty
dimensions both rank and bounds will be the same value. The numsizes have no
significance. We have a if statement as the sizearray array will be less than
the bounds array as every dimension does not have a size. This happens for two
reasons, it is empty or the upper bound is not specified.
However as specified before,
the empty ones fall into the purview of the numsizes if they are before a sized
dimension. Thus the if statement makes sure that we are not accessing a
sizearray member that does not exist.
When the if statement is
false it could mean that all the valid sizes are over and the dimensions
following are either empty or have no upper dimension value. This happens with
method a11, type int32 [, 3...8 , 4... , , 8... ,,,] signature 14 08 08 02 00
06 05 00 06 08 00 10 and actual answer is int32[0,3...8,4...,0...,8...,,,].
Here the rank is 8, numsizes
is 02 and the number of bounds is 05. Thus for values of l from 2,3 and 4, the
else gets called. For these dimensions we have no upper bounds at all and the
third empty dimension also gets displayed with a range starting with 0.
We also increase the variable dots by one that will tell us how
many times the else gets called. These remember are at the end of all the
dimensions that have sizes. The If the if statement is true which means we have
a size as well as lower bound, the upper variable stores for us the upper
bound.
This is calculated as the
lower bound plus the size minus 1 gives us the upper bound. We now need to
figure out whether we place the three dots or is it is single dimension. This
is achieved by the next series of three if statements. Lets take a method a13
type int32 [,,6...9,1,13] signature 14 08 05 05 00 00 04 01 0D 05 00 00 0C 00
00.
The rank, numsizes and
bounds are all 5. If the lower bound is zero and the size is non zero, this is
a single dimension value. Thus the if statement gets called for the last two
dimensions where the size array has values 1 and 13. These are sizes that do
not have a range and hence the lower bound is zero.
If there is a single value
this is stored in the sizearray. We also check whether the comma needs to be
placed at the end. We then check if both the bounds and size array are zero.
This can happen in the first two dimensions and we need to place a 0.
Finally we check whether both the size and bounds are non
zero which means that it is a range like the middle case and here we place the
…. We should place the comma only if it is not at the end and that explains the
final if statement. We finally need to place the final empty commas if any.
We first need to find out if
there we any empty dimensions at the very end. This we do by subtracting rank
form the number of sizes. We also need to subtract dots as this variable
contains the number of range dimensions at the end that do not have a upper
bound. In this for loop we only fill up the returnstring by a certain number of
commas.
We place this code within a
if statement for a method like a6 that have a rank of 1, numsizes of 0 not to
activate this code. We finally come to method a14 that has the type int32 [8...
, 4 , 5], signature 14 08 03 03 00 04 05 03 10 00 00 and the answer by ildasm
is int32[8...7,4,5].
Thus we get the right answer
and ildasm the wrong one and yes we are gloating. This is because we have one
more if statement that checks whether the size array is zero which is in this
case and the bounds array is not zero which means that we have a range
dimension with no upper bound.
Finally we have to ask
ourselves can our code handle double dimension arrays like int32[5][4]. The
signature for the above will be pretty large as 14 14 08 01 01 05 01 00 01 01
04 01 00. The first 14 is the array type and the data type for the array is
another array.
We read this array using the
GetElementType method and it is followed by the 8 which specifies a int32. The
rank and number of sizes are 1 and the 5 is the array size. Thus the first
array is the inner one and the array dimension
4 is the outer array.
Program27.csc.txt
public void abc(string []
args)
{
ReadPEStructures(args);
DisplayPEStructures();
ReadandDisplayImportAdressTable();
ReadandDisplayCLRHeader();
ReadStreamsData();
FillTableSizes();
ReadTablesIntoStructures();
DisplayTablesForDebugging();
ReadandDisplayVTableFixup();
ReadandDisplayExportAddressTableJumps();
FillArray();
DisplayModuleRefs();
DisplayAssembleyRefs();
CreateSignatures();
DisplayAssembley();
DisplayFileTable();
DisplayClassExtern();
DisplayResources();
DisplayModuleAndMore();
DispalyVtFixup();
DisplayTypeDefs();
DisplayTypeDefsAndMethods();
}
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type == 0x12 )
{
int howmanybytes2;
returnstring = GetTokenType(
blobarray , index , out howmanybytes2);
howmanybytes = howmanybytes2
+ 1;
}
public string GetTokenType (
byte [] blobarray , int index , out int howmanybytes)
{
string returnstring =
"";
int uncompressedbyte;
int howmanybytes1 = 0;
howmanybytes1=
howmanybytes1 + CorSigUncompressData
(blobarray , index + 1 , out uncompressedbyte);
string dummy1 = DecodeToken(uncompressedbyte ,
blobarray[index]);
returnstring = "class
" + dummy1;
howmanybytes =
howmanybytes1;
return returnstring;
}
public string DecodeToken
(int token , int type)
{
byte tabletype =
(byte)(token & 0x03);
int tableindex = token
>> 2;
string returnstring =
"";
if ( tabletype == 0)
returnstring =
typedefnames[tableindex];
return returnstring;
}
string [] typedefnames;
public void FillArray ()
{
int old = tableoffset;
bool tablehasrows =
tablepresent(2);
int offs = tableoffset;
tableoffset = old;
if ( tablehasrows )
{
typedefnames = new
string[rows[2]+1];
for ( int k = 1 ; k <=
rows[2] ; k++)
{
int name =
TypeDefStruct[k].name;
offs += offsetstring;
int nspace =
TypeDefStruct[k].nspace;
offs += offsetstring;
string nestedtypestring = "";
nestedtypestring = GetNestedTypeAsString(k);
string namestring = GetString(name);
string namespacestring =
NameReserved(GetString(nspace));
if ( namespacestring.Length
!= 0)
namespacestring =
namespacestring + ".";
namestring = NameReserved(namestring );
typedefnames[k] =
nestedtypestring + namespacestring + namestring + "/* 02" + k.ToString("X6") + "
*/";
}
}
}
e.il
.class yyy
{
}
.class zzz
{
.method class zzz a1()
{
}
.method class yyy a2()
{
}
}
Output
.class /*02000002*/ private
auto ansi yyy
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
} // end of class yyy
.class /*02000003*/ private
auto ansi zzz
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.method /*06000001*/
class zzz/* 02000003 */
.method /*06000002*/
class yyy/* 02000002 */
} // end of class zzz
In this program we show you
how to work with return values that are a instance of a type that we create
ourselves in our code. If you take a close look at the il file, method a1
returns a predefined type zzz and method a2 returns a type yyy. If we look at
the signatures, both start with a value 0x12 and then some number that we will
explain soon.
If you look closely at the
abc function we have called a new method FillArray. This is the last method we
will add in the abc function and we will explain this method very soon. In the
GetElementType method we have simply added a if statement that check for the
type being 0x12.
If it is, we call a function
called GetTokenType to figure out the type for us which we simple return. The
GetTokenType method takes a out parameter that tells us how many bytes the
numbers that represent the type take up. We add one to this value to account
for the 0x12 data type.
What we are saying is that
the minute we find a 0x12, this signifies some type. The method GetTokenType is
passed the blob array as well as the index in the array of the 0x12. We first
call the CorSigUncompressData method which signifies that the bytes following
may be compressed.
We then pass this
uncompressed byte that in our case is a single byte as the value of howmanybytes1 will confirm. We pass this
byte to the method DecodeToken and also the value 0x12 that is the start of the
type signature. The return string we simply add the word class and return it.
The number of bytes taken up
is simply the return value of the CorSigUncompressData function. Thus all the
action now moves to the method DecodeToken. After the byte 0x12 is stored what
the specs call a token. A token is a efficient way of storing a table and row
number together.
Thus the first two bits are
the table number and the remaining the row number. We thus bit wise and with
0x3 to extract the table number and right shift by 2 to get the row number. If
the tabletype is 0, the token rows point to the type def table. Thus the value
of 0xc denotes table 0 and by dividing by 4, we get a value of 3.
The class zzz has a row
number of 3. The second token value was 8, divided by 4 gives us 2 and class
yyy is row 2 in the type def table.
As we simply have to read
the name and namespace from the type def table why can we not create a array
typedefnames that simply store the name of the type as a string and we simply
read the type name by using the appropriate index into the array. This what the
FillArray method does.
We start with defining a
instance array of strings typedefnames. We then use the same old, tableoffset
variables to position us on the starting point of the type def table which is
known as number 2.
We will have at least one
row and we create an array typedefnames that is one larger then the number of
rows, bearing in mind the global type that gets automatically created. We now
iterate in a for loop depending upon the number of rows we have. We store the
name and namespace fields for later use to get at the name and namespace names
as strings.
We have to concatenate the
name and namespace names and place a dot between them if and only if there
exists a non null namespace name. This we check by adding a dot to the namespace name only it has a valid length.
We could have used the NameReserved function after the GetString function but
chose to break it up on two lines.
A type may also be nested as
hence we use our trusted method GetNestedTypeAsString passing it the type so
that it returns the names of the nested types this type falls in. Program17 is
where we first introduced this method.
We now fill up the
typedefnames array by first starting with the nested type, then the
namespacename with or without the dot, followed by the name of the type and its
number in comments.
Program28.csc.txt
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type == 0x12 || type ==
0x11 )
{
int howmanybytes2;
returnstring = GetTokenType(
blobarray , index , out howmanybytes2);
howmanybytes = howmanybytes2
+ 1;
}
return returnstring;
}
public string GetTokenType (
byte [] blobarray , int index , out int howmanybytes)
{
string returnstring =
"";
int uncompressedbyte;
int howmanybytes1 = 0;
howmanybytes1 =
howmanybytes1 +
CorSigUncompressData(blobarray , index + 1 , out uncompressedbyte);
string dummy1 = DecodeToken(uncompressedbyte ,
blobarray[index]);
if ( blobarray[index] ==
0x12)
returnstring = "class
" + dummy1;
else if ( blobarray[index]
== 0x11)
returnstring =
"valuetype " + dummy1;
howmanybytes =
howmanybytes1;
return returnstring;
}
e.il
.class yyy
{
}
.class zzz
{
.method valuetype zzz a1()
{
}
.method valuetype yyy a2()
{
}
}
Output
.class /*02000002*/ private
auto ansi yyy
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
} // end of class yyy
.class /*02000003*/ private
auto ansi zzz
extends [mscorlib/* 23000001 */]System.Object/* 01000001 */
{
.method /*06000001*/ valuetype
zzz/* 02000003 */
.method /*06000002*/
valuetype yyy/* 02000002 */
} // end of class zzz
This example is a slight
variation of the earlier one. In the il file, instead of using the word class
we use the word valuetype. There are two basic data types in the il world.
Those that are created on the stack and are simple are called value types. The
others are represented by the word class.
A class denotes a object
whose actual value is not passed but a reference. Thus each time we want to
access a class/object we have to de-reference the value which is pointed at by
the reference. A value type instead stores the actual value and hence is
faster. A value type object is extended from the ValueType class.
These data types are simpler
and faster to access and there is check to make sure that the data type is
derived from the ValueType class. There is only one way to create a data type
and that is using the class directive. In the GetElementType method we simply
add the check for a type 0x11.
Thus both the class and
value type are followed by a type token.
In the class GetTokenType we use a if statement to figure out whether we
add the words class or valuetype. This is why we pass the first byte or type
byte to this function.
Program29.csc
public string GetElementType
( int index , byte [] blobarray , out int howmanybytes)
{
howmanybytes = 0;
string returnstring =
"";
byte type =
blobarray[index];
if ( type == 0x10)
returnstring =
GetByrefToken(index, blobarray , out howmanybytes);
return returnstring;
}
public string GetByrefToken
(int index , byte [] blobarray , out int howmanybytes)
{
string returnstring =
"";
int howmanybytes2;
returnstring =
GetElementType (index+1 , blobarray , out howmanybytes2) + "&";
howmanybytes = howmanybytes2
+ 1;
return returnstring;
}
e.il
.class zzz
{
.method int32 & a1()
{
}
.method class zzz & a2()
{
}
.method int32
[12][3,5] & a3()
{
}
}
Output
.method /*06000001*/
int32&
.method /*06000002*/
class zzz/* 02000002 */&
.method /*06000003*/
int32[12][3,5]&
If you look at il file we
have a & following the data type. This makes the data type a unmanaged
pointer. We will study the difference between managed and unmanaged pointers in
greater detail later. All that we would like to say is that all programmers if
they do not work with pointers need to go back to school.
The only difference we see
by adding a & is that the first type byte is 0x10. Then we have the same
type signature as we have worked with before. In method a3, an array follows
and thus we have a 0x14 following the 0x10. In the GetElementType method we
call the method GetByrefToken which
does the grunt work.
We first call the
GetElementType method to figure out the type for us and then return the same
type followed by a &. We also initialize the howmanybytes variable to one
larger than the value set by the GetElementType method as in the GetElementType
method we do not increase it by one keeping the extra 0x10 in mind.
Everything else remains the
same and you can now see how we use recursion to call the same code over and
over again.