-4-
Keywords and Operators
Code that is placed after the
return statement never gets executed. In the first program given below, you
will notice that there is a WriteLine function call in C# but is not visible in
our IL code. This is because the compiler is aware that any statements after
return is not executed and hence, it serves no purpose to convert it into IL.
a.cs
class zzz
{
public static void Main()
{
return;
System.Console.WriteLine("hi");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
br.s IL_0002
IL_0002: ret
}
}
The compiler does not waste time
compiling code that will never get executed, instead generates a warning when
it encounters such a situation.
a.cs
class zzz
{
public static void Main()
{
}
zzz( int i)
{
System.Console.WriteLine("hi");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
.method private hidebysig specialname rtspecialname instance
void .ctor(int32 i) il managed {
ldarg.0
call instance void [mscorlib]System.Object::.ctor()
ldstr "hi"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
If a constructor is not present
in the source code, a constructor with no parameters gets generated. If a
constructor is present, the one with no parameters is eliminated from the code.
The base class constructor
always gets called without any parameters and it gets called first. The above
IL code proves this fact.
a.cs
namespace vijay
{
namespace mukhi
{
class zzz
{
public static void Main()
{
}
}
}
}
a.il
.assembly mukhi {}
.namespace vijay.mukhi
{
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
}
We may write a namespace within
a namespace, but the compiler converts it all into one namespace in the IL
file. Thus, the two namespaces vijay and mukhi in the C# file get merged into a
single namespace vijay.mukhi in the IL file.
a.il
.assembly mukhi {}
.namespace vijay
{
.namespace mukhi
{
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
}
}
In C#, one namespace can be
present within another namespace but, the C# compiler prefers using only a
single namespace, hence the il ouput displays only one namespace. The
.namespace directive in IL is similar in concept to the namespace keyword in
C#. The idea of a namespace originally germinated in IL, and not in programming
language such as C#.
a.cs
namespace mukhi
{
class zzz
{
public static void Main()
{
}
}
}
namespace mukhi
{
class pqr
{
}
}
a.il
.assembly mukhi {}
.namespace mukhi
{
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
.class private auto ansi pqr extends [mscorlib]System.Object
{
}
}
We may have two namespaces
called mukhi in the C# file, but they become one large namespace in the IL file
and their contents get merged. This facility of merging namespaces is offered
by the C# compiler.
Had the designers deemed it fit,
they could have flagged the above program as an error instead.
a.cs
class zzz
{
public static void Main()
{
int i = 6;
zzz a = new zzz();
a.abc(ref i);
System.Console.WriteLine(i);
}
public void abc(ref int i)
{
i = 10;
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32 V_0,class zzz V_1)
ldc.i4.6
stloc.0
newobj instance void zzz::.ctor()
stloc.1
ldloc.1
ldloca.s V_0
call instance void zzz::abc(int32&)
ldloc.0
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
.method public hidebysig instance void abc(int32& i) il managed
{
ldarg.1
ldc.i4.s 10
stind.i4
ret
}
}
Output
10
We will now explain how IL
implements passing by reference. Unlike C#, it is very convenient to work with
pointers in IL. It has three types of pointers.
When the function abc is called,
the variable i is passed to it as a reference parameter. In IL, the instruction
ldloca.s gets called, which places the address of the variable on the stack.
Had the instruction been ldloc instead, the value of the variable would be
placed on the stack.
In the function call, we add the
symbol & at the end of the type name to indicate the address of a variable.
& suffixed to a data type indicates the memory location of a variable, and
not the value contained in it.
In the function itself, ldarg.1
is used to place the address of parameter 1 on the stack. Then, we place the
number that we want to initialise it with, on the stack. In the above example,
we have first placed the address of the variable i on the stack, followed by
the value that we want to initialize it with i.e. 10.
The instruction stind places the
value that is present on top of the stack i.e. 10 in the variable whose address
is stored as the second item on the stack. In this case, as we have passed the
address of the variable i on the stack, the variable i is assigned the value
10.
The instruction stind is used
when an address is given on the stack. It fills up that memory location with
the specified value.
If the word ref is replaced with
the word out, IL shows the same output because, in either case, the address of
a variable is being put on the stack. Thus, ref and out are artificial concepts
implemented in C# and have no equivalent representation in IL.
The IL code has no way of
knowing whether the original program used ref or out. Thus, on disassembling
this program, we will have no way of differentiating between ref and out as
this information is lost on conversion from C# code into IL code.
a.cs
class zzz
{
public static void Main()
{
string s = "hi" + "bye";
System.Console.WriteLine(s);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.String V_0)
ldstr
"hibye"
stloc.0
ldloc.0
call void [mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Output
hibye
The next focus is on
concatenating two strings. The C# compiler does this by converting them into
one string. This occurs due to the compiler's zest to optimise constants. The
value is stored in a local variable and then placed on the stack. Thus, at runtime,
the C# compiler optimises the code as far as possible.
a.cs
class zzz
{
public static void Main()
{
string s = "hi" ;
string t = s + "bye";
System.Console.WriteLine(t);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.String V_0,class System.String V_1)
ldstr "hi"
stloc.0
ldloc.0
ldstr "bye"
call class System.String [mscorlib]System.String::Concat(class
System.String,class System.String)
stloc.1
ldloc.1
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
Output
hibye
Whenever the compiler deals with
variables, it is ignorant of their values at compile time. The following steps
are executed in the above program:
• Two variables s and t are converted
into the local variables V_0 and V_1 respectively.
• The local variable V_0 is assigned the string "hi".
• This variable is then pushed onto the
stack.
• Next, the constant string
"bye" is put on the stack.
• Thereafter, the + operator is converted
into a static function Concat, which belongs to the String class.
• This function concatenates the two
strings and creates a new string on the stack.
• This concatenated string is stored in
the variable V_1.
• The concatenated string is finally
printed out.
There are two PLUS (+) operators
in C#:
• One handles strings. This operator gets
converted into the function Concat from the String class in IL.
• The other one handles numbers. This
operator gets converted to the add instruction in IL.
Thus, the String class and its
functions are built into the C# compiler. We can therefore conclude that, C#
can understand and handle String operations.
a.cs
class zzz
{
public static void Main()
{
string a = "bye";
string b = "bye";
System.Console.WriteLine(a == b);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.String V_0,class System.String V_1)
ldstr
"bye"
stloc.0
ldstr
"bye"
stloc.1
ldloc.0
ldloc.1
call bool [mscorlib]System.String::Equals(class
System.String,class System.String)
call void [mscorlib]System.Console::WriteLine(bool)
ret
}
}
Output
True
Like the + operator, when the ==
operator is used with strings, the compiler converts it into the function
Equals.
From the above examples, we can
deduce that the C# compiler is totally at ease with strings. The next version
will introduce many more of such classes which the compiler shall understand
intuitively.
a.cs
class zzz
{
public static void Main()
{
System.Console.WriteLine((char)65);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldc.i4.s 65
call void [mscorlib]System.Console::WriteLine(wchar)
ret
}
}
Output
A
Whenever we cast a variable,
like a numeric value to a character value, internally, the program merely calls
the function with the data type of the cast. A cast does not modify the
original variable. What actually happens is that, instead of the WriteLine function
being called with an int, it gets called with a wchar. Thus a cast does not
incur any run-time overhead.
a.cs
class zzz
{
public static void Main()
{
char i = 'a';
System.Console.WriteLine((char)i);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (wchar V_0)
ldc.i4.s 97
stloc.0
ldloc.0
call void [mscorlib]System.Console::WriteLine(wchar)
ret
}
}
Output
a
The char data type of C# has a
size of 16 bytes. It is converted into a wchar on conversion to IL. The
character 'a' gets converted into the ASCII number 97. This is placed on the
stack and the variable V_0 is initialised to this value. Thereafter, the program
displays the value 'a' on the screen.
a.cs
class zzz
{
public static void Main()
{
System.Console.WriteLine('\u0041');
System.Console.WriteLine(0x41);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldc.i4.s 65
call void [mscorlib]System.Console::WriteLine(wchar)
ldc.i4.s 65
call void [mscorlib]System.Console::WriteLine(int32)
ret
ret
}
}
Output
A
65
il cannot understand UNICODE
characters or HEXADECIMAL numbers. It prefers plain and simple decimals. The \u
escape sequence is provided as a convenience to C# programmers, to enhance
their productivity.
You may have noticed that, even
though the above program has two ret instructions, no error is generated. The
criteria is that at least one ret instruction should be present.
a.cs
class zzz
{
public static void Main()
{
int @int;
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32 V_0)
ret
}
}
Variables created on the stack
in C# are not given the same names on conversion to IL. So, the situation where
a reserved word of C# could create a problem in IL, does not arise.
a.cs
class zzz
{
int @int;
public static void Main()
{
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field private int32 'int'
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
In the above program, the local
variable @int becomes a field named int and the int datatype is changed to
int32, which is a reserved word in IL. Thereafter, the compiler writes the
fieldname in single inverted commas. On conversion to IL, the @ sign simply
disappears from the name of the variable.
a.cs
// hi this is comment
class zzz {
public static void Main() // allowed here
{
/*
A comment over
two lines
*/
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
When you see the above code, you
will realize why programmers the world over have an aversion to writing
comments. All comments in C# are stripped off when the IL file is generated. Not
a single comment is copied over into the IL code.
The compiler has scant respect
for comments, and it throws all of them away. There is little wonder that
programmers consider writing comments as an exercise in futility, and their
frustration is well founded.
a.cs
class zzz
{
public static void Main()
{
System.Console.WriteLine("hi \nBye\tNo");
System.Console.WriteLine("\\");
System.Console.WriteLine(@"hi \nBye\tNo");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldstr "hi
\nBye\tNo"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ldstr "\\"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ldstr "hi
\\nBye\\tNo"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
Output
hi
Bye No
\
hi \nBye\tNo
The String handling capabilities
of C# have been inherited from IL. The escape sequences like \n have been
simply copied over.
The two backslashes (\\) result
in a single backslash when displayed.
If a string is prefaced with an
@ sign, the special meaning of the escape sequences in the string is ignored
and they are displayed verbatim, as shown in the program above.
If IL had not provided support
for string formatting, it would have been vexed with the predicament of
handling most of the modern programming languages.
a.cs
#define vijay
class zzz {
public static void Main()
{
#if vijay
System.Console.WriteLine("1");
#else
System.Console.WriteLine("2");
#endif
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed {
.entrypoint
ldstr "1"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
ret
}
}
Output
1
The next series of programs
deals with the pre-processor directives, that are alien to the C# compiler.
Only the pre-processor is capable of comprehending them.
In the above .cs program, the
#define directive creates a word called "vijay". The compiler knows
that the #if statement is TRUE and therefore, it ignores the #else statement.
Thus, the IL file that is generated contains only the WriteLine function that
has the parameter '1' and not the one that has the parameter '2'.
This is the power of compile
time knowledge. A large amount of the code that is never going to be used, is
simply eliminated by the pre-processor prior to converting it into IL.
a.cs
#define vijay
#undef vijay
#undef vijay
class zzz {
public static void Main() {
#if vijay
System.Console.WriteLine("1");
#endif
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
We can use as many #undef
statements as we like. The compiler knows that the word 'vijay' has been
undefined and therefore, it ignores the code in the #if statement.
There is no way the original pre-processor directives can be
recovered on re-conversion of code from IL to C#.
a.cs
#warning We have a code red
class zzz
{
public static void Main()
{
}
}
The pre-processor directive
#warning in C# is used to display warnings for the benefit of the programmer
who runs the compiler.
The pre-processor directives
#line and #error also do not produce any executable output. They are used
merely for providing information.
Inheritance
a.cs
class zzz
{
public static void Main()
{
xxx a = new xxx();
a.abc();
}
}
class yyy
{
public void abc()
{
System.Console.WriteLine("yyy abc");
}
}
class xxx : yyy
{
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class xxx V_0)
newobj instance void
xxx::.ctor()
stloc.0
ldloc.0
call instance void yyy::abc()
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
.method public hidebysig instance void abc() il managed
{
ldstr "yyy
abc"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
.class private auto ansi xxx extends yyy
{
}
Output
yyy abc
The concept of inheritance is
identical in all programming languages that support it. The word extends has
originated in IL and Java and not in C#.
When we write a.abc(), the
compiler decides on the abc function to call based on the following criteria:
• If the class xxx has a function abc,
then the call in function vijay will have the prefix xxx.
• If the class yyy has a function abc,
then the call in function vijay will have the prefix yyy.
Therefore, the intelligence that
decides as to which function abc is to be called, resides in the compiler and
not in the generated IL code.
a.cs
class zzz {
public static void Main()
{
yyy a = new xxx();
a.abc();
}
}
class yyy
{
public virtual void abc()
{
System.Console.WriteLine("yyy abc");
}
}
class xxx : yyy
{
public new void abc()
{
System.Console.WriteLine("xxx abc");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0)
newobj instance void xxx::.ctor()
stloc.0
ldloc.0
callvirt instance void yyy::abc()
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
.method public hidebysig newslot virtual instance void abc() il
managed
{
ldstr "yyy
abc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
.class private auto ansi xxx extends yyy
{
.method public hidebysig instance void abc() il managed
{
ldstr "xxx
abc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Output
yyy abc
In the context of the above
program, a small explanation would not be out of place for the benefit of C#
neophytes.
We can equate an object a of a
base class yyy to a derived class xxx. We have called the function a.abc(). The
question that comes to the fore is: which of the following two versions of the
function abc will be called ?
• The function abc present in the base
class yyy, to which the calling object belongs.
OR
• The function abc present in the class
xxx, which is the type that it has been initialised to.
In other words, is the compile
time type significant or the runtime type ?
The base class function has a
modifier called virtual implying that the derived classes can override this
function. The derived class, by adding the modifier new, informs the compiler
that, this function abc has nothing to do with the function abc of the derived
class. It is to treat them as separate
entities.
First, the this pointer is put
on the stack using ldloc.0. Then, inplace of a call instruction there is a callvirt instead. This is because the
function abc is virtual. Other than this, there exists no difference. The
function abc in class yyy is declared virtual and is also tagged with newslot.
This signifies that it is a new virtual function. The word new is placed in the
derived class in C#.
IL also uses a mechanism similar
to that of C#, to figure out as to which version of abc is to be called.
a.cs
class zzz
{
public static void Main()
{
yyy a = new xxx();
a.abc();
}
}
class yyy
{
public virtual void abc()
{
System.Console.WriteLine("yyy abc");
}
}
class xxx : yyy
{
public override void abc()
{
System.Console.WriteLine("xxx abc");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0)
newobj instance void
xxx::.ctor()
stloc.0
ldloc.0
callvirt instance void
yyy::abc()
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
.method public hidebysig newslot virtual instance void abc() il
managed
{
ldstr "yyy
abc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
.class private auto ansi xxx extends yyy
{
.method public hidebysig virtual instance void abc() il managed
{
ldstr "xxx
abc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void yyy::.ctor()
ret
}
}
Output
xxx abc
If the base constructor of class
xxx is not called, no output is displayed in the output window. As a rule, we
have not included the free constructor code in our IL programs.
In absence of the keywords new
or override, the default keyword used is new. In the above function abc, in
class xxx, we have used the override keyword, which implies that this function
abc overrides the function of the base class.
By default, IL calls the virtual
function from the class which the object looks like and uses the compile time
type. In this case, it is yyy.
The first change that occurs
with override in the derived class is the addition of the word virtual to the
function prototype. This was not supplied earlier with new because a new
function got created altogether which isolated itself from the base class.
The use of override effectively
results in the overriding of the base class function. This makes the function
abc a virtual function in the class xxx. In other words, override becomes
virtual whereas, new becomes nothing.
As there is a newslot modifier
in the base class and a virtual function of the same name in the derived class,
the derived class gets called.
In a virtual function, the run
time type of the object gets preference. The instruction callvirt resolves this
issue at run-time and not at compile time.
a.cs
class zzz
{
public static void Main()
{
yyy a = new xxx();
a.abc();
}
}
class yyy
{
public virtual void abc()
{
System.Console.WriteLine("yyy abc");
}
}
class xxx : yyy
{
public override void abc()
{
base.abc();
System.Console.WriteLine("xxx abc");
}
}
a.il
.method public hidebysig virtual instance void abc() il managed
{
ldarg.0
call instance void
yyy::abc()
ldstr "xxx
abc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
Only the code of the function
abc in class xxx has been shown above. The rest of the IL code has been
omitted. base.abc() calls the function abc from the base class, i.e. class yyy.
The keyword base is a reference to the object in memory. This keyword of C# is
not understood by IL as it is a compile time issue. Base does not care whether
the function is virtual or not.
Whenever we make a function
virtual for the first time, it is a good idea to mark it as newslot, solely to
signify a break from all the functions with the same name present in the
superclasses.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
newobj instance void yyy::.ctor()
callvirt instance void iii::pqr()
ret
}
}
.class interface iii
{
.method public virtual abstract void pqr() il managed
{
}
}
.class public yyy implements iii
{
.override iii::pqr with instance void yyy::abc()
.method public virtual hidebysig newslot instance void abc() il
managed
{
ldstr "yyy abc"
call void System.Console::WriteLine(class System.String)
ret
}
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void [mscorlib]System.Object::.ctor()
ret
}
}
Output
yyy abc
We have created an interface iii
with just one function called pqr. Then, the class yyy implements from
interface iii but does not implement function pqr. Instead it adds a function
called abc. In the entrypoint function vijay, function pqr is called off the
interface iii.
The reason we get no errors is
due to the presence of the override directive. This directive informs the
assembler to redirect any call made to the function pqr off interface iii, to
the class yyy function abc. The assembler is very serious about the override
directive. This can be gauged from the fact that without the implements iii in
the definition of class yyy we are given the following exception:
Output
Exception occurred: System.TypeLoadException: Class yyy tried
to override method pqr but does not implement or inherit that methods.
at zzz.vijay()
Destructors
a.cs
class zzz
{
public static void Main()
{
}
~zzz()
{
System.Console.WriteLine("hi");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
.method family hidebysig virtual instance void Finalize() il
managed
{
ldstr "hi"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ldarg.0
call instance void
[mscorlib]System.Object::Finalize()
ret
}
}
No output
A destructor gets converted into
a function called Finalize. This piece of information is also laid down in the
C# documentation. The Finalize function calls the original from Object. The
text "hi" does not get displayed because the function is called as
and when the runtime decides. All we know is that it gets called at its demise.
Thus, whenever the object dies, it calls Finalize. There is no way of
destroying anyone or anything, including .NET objects.
a.cs
class zzz
{
public zzz()
{
}
public zzz(int i)
{
}
public static void Main()
{
}
~zzz()
{
System.Console.WriteLine("hi");
}
}
class yyy : zzz
{
}
a.il
.class private auto ansi yyy extends zzz
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void zzz::.ctor()
ret
}
}
In the above code, we’ve
diplayed only the yyy class. Even though we have 2 constructors and 1
destructor, the class yyy only receives the free constructor with no
parameters. Thus, derived classes do not inherit constructors or destructors of
the base class.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
call void yyy::abc()
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Array
{
.method public hidebysig static void abc() il managed
{
ldstr "hi"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
Output
hi
In C#, we are not allowed to
derive a class from certain classes like System.Array. However, in IL there is
no such restriction. Thus, the above code does not generate any error.
We can safely conclude that the
C# compiler has added the above restrictions and that IL is less restrictive.
The rules of a language are decided by the compiler at compile time.
For your information, the other
classes that we cannot derive from, in C#, are Delegate, Enum and ValueType.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class aa V_0)
newobj instance void aa::.ctor()
stloc.0
ret
}
}
.class public auto ansi aa extends bb
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void
bb::.ctor()
ldstr "aa"
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
.class public auto ansi bb extends cc
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void
cc::.ctor()
ldstr "bb"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
.class public auto ansi cc extends aa
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void aa::.ctor()
ldstr "cc"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Error
Exception occurred: System.TypeLoadException: Could not load
class 'aa' because the format is bad (too long?)
at zzz.vijay()
We are forbidden to have a
circular reference in C#. The compiler checks for it and if found, reports an
error. IL, however, does not check for a circular reference because, Microsoft
does not expect all programmers to use pure IL.
Hence, class aa extends bb,
class bb extends cc and finally class cc extends aa. This completes the
circular reference. The exception that is
thrown at runtime does not give any indication of a circular reference. Thus,
if we had not unravelled this mystery for you here, the exception would have
most probably left you baffled. We do not intend to disclose the fact that we
have understood IL deeply, but there is no harm in giving oneself a pat on the
back, once in a while.
a.cs
internal class zzz
{
public static void Main()
{
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
Access modifiers, like the
keyword internal, are only part of the C# lexicon and have no relevance in IL.
The keyword internal signifies that the particular class can only be accessed
from within the file in which it is present.
Thus, by mastering IL, we are in
a position to differentiate between the
core belongings of .NET and features existing in the realms of C#.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ret
}
}
.class public auto ansi yyy extends xxx
{
}
.class private auto ansi xxx extends [mscorlib]System.Object
{
}
In C#, there is a rule : the
base class has to be more accessible than the derived class. This rule is not
adhered to in IL. Thus even though the base class xxx is private and the
derived class yyy is public, no error is generated in IL.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ret
}
}
A function in C# cannot be more
accessible than the class within which it resides. The function vijay is
public, whereas the class that it is located in is private. Thus, the class is
more restrictive than the function contained in it. Again, there is no such
restriction imposed in IL.
a.cs
class zzz
{
public static void Main()
{
yyy a = new yyy();
xxx b = new xxx();
a = b;
b = (xxx) a;
}
}
class yyy
{
}
class xxx : yyy
{
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0,class xxx V_1)
newobj instance void
yyy::.ctor()
stloc.0
newobj instance void
xxx::.ctor()
stloc.1
ldloc.1
stloc.0
ldloc.0
castclass xxx
stloc.1
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
}
.class private auto ansi xxx extends yyy
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void
yyy::.ctor()
ret
}
}
Without a constructor in xxx,
the following exception is thrown:
Output
Exception occurred: System.InvalidCastException: An exception
of type System.InvalidCastException was thrown.
at zzz.vijay()
In the above example, we are
creating two objects a and b, that are instances of classes yyy and xxx
respectively. The class xxx is the derived class and yyy is the base class. We
can write a = b but, if we equate a derived class to a base class, an error is
generated. Thus, a cast operator is required.
A cast in C# gets converted to
the instruction castclass, followed by the name of the derived class that the
class has to be cast into. If it cannot be casted, the above mentioned
exception will be raised.
In the above code, there is no
constructor, and hence, the exception is generated.
Thus, IL has a number of higher
level primitives that deal with objects and classes.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0,class xxx V_1)
newobj instance void
yyy::.ctor()
stloc.0
newobj instance void
xxx::.ctor()
stloc.1
ldloc.1
stloc.0
ldloc.0
castclass xxx
stloc.1
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
}
.class private auto ansi xxx extends [mscorlib]System.Object
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void System.Object::.ctor()
ret
}
}
In the above case, the class xxx
does not derive from class yyy anymore. They both extend from the Object class.
Yet, we are allowed to cast the class yyy to class xxx. No error is generated
with a constructor in the class xxx. but on removal of the constructor, an
exception is generated. IL too has its own strange way of working.
a.il
.assembly mukhi {}
.class private auto ansi sealed zzz extends
[mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ret
}
}
.class private auto ansi yyy extends zzz
{
}
The documentation states very
clearly that a sealed class cannot be extended or sub-classed any further. In
this case, an error was expected but none was generated. We must remind you
that we are working on a beta copy. The next version may generate an error.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0)
newobj instance void
yyy::.ctor()
stloc.0
ret
}
}
.class private auto ansi abstract yyy
{
}
An abstract class cannot be used
directly. It can only be derived from. The above code should have generated an
error, but it does not.
a.cs
public class zzz
{
const int i = 10;
public static void Main()
{
System.Console.WriteLine(i);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldc.i4.s 10
call void [mscorlib]System.Console::WriteLine(int32)
ret
ret
}
}
Output
10
A constant is an entity that
only exists at compile time. It is not visible
at run-time. This proves that the compiler removes all traces of compile
time objects. On conversion to IL, all occurrences of int i in the C# code get
replaced by the number 10.
a.cs
public class zzz
{
const int i = j + 4;
const int j = k - 1;
const int k = 3;
public static void Main() {
System.Console.WriteLine(k);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field private static literal int32 i = int32(0x00000006)
.field private static literal int32 j = int32(0x00000002)
.field private static literal int32 k = int32(0x00000003)
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldc.i4.3
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Ouput
3
All the constants are evaluated
by the compiler and, even though, they may refer to other constants, they are
given absolute values. The IL runtime does not allocate any memory for literal
fields. This falls in the realm of metadata, which we shall explain later.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field private static literal int32 i = int32(0x00000006)
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldc.i4.6
stsfld int32 zzz::i
ret
}
}
Output
Exception occurred: System.MissingFieldException: zzz.i
at zzz.vijay()
A literal field represents a
constant value. In IL, we are not allowed to access any literal field. The
assembler does not generate any error at the time of assembling, but an
exception is thrown at run time. We expected a compile time error, since we have
used a literal field in the instruction stsfld.
a.cs
public class zzz
{
public static readonly
int i = 10;
public static void Main()
{
System.Console.WriteLine(i);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field public static initonly int32 i
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldsfld int32 zzz::i
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
.method public hidebysig specialname rtspecialname static void
.cctor() il managed
{
ldc.i4.s 10
stsfld int32 zzz::i
ret
}
}
Output
10
A readonly field cannot be
modified. In IL, we have a modifier called initonly which implements the same
concept.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field public static initonly int32 i
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldc.i4.s 10
stsfld int32 zzz::i
ldsfld int32 zzz::i
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
The documentation very clearly
states that initonly fields can only be changed in the constructor, but the CLR
( Common Language Runtime) does not strictly check this. Maybe in the next
version, they should guard against such occurrences.
Thus, the entire series of
restrictions on readonly have to be enforced by the programming language that
converts the source code to IL. We are not trying to run down IL, but IL
expects someone else to do the error checking in this situation.
a.cs
public class zzz
{
public static void Main()
{
zzz a = new zzz();
pqr();
a.abc();
}
public static void pqr()
{
}
public void abc()
{
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field public static initonly int32 i
.method public hidebysig
static void vijay() il managed
{
.entrypoint
.locals (class zzz V_0)
newobj instance void zzz::.ctor()
stloc.0
call void zzz::pqr()
ldloc.0
call instance void zzz::abc()
ret
}
.method public hidebysig static void pqr() il managed
{
ret
}
.method public hidebysig instance void abc() il managed
{
ret
}
}
This example serves as a
refresher. The static function pqr is not
passed the this pointer on the stack, whereas, the non-static function
abc is passed the this pointer or a reference to where its variables are stored
in memory.
Thus, before the call to
function abc, the instruction ldloc.0 pushes the reference of zzz onto the
stack.
a.cs
public class zzz
{
public static void Main()
{
pqr(10,20);
}
public static void pqr(int i , int j)
{
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.field public static initonly int32 i
.method public hidebysig
static void vijay() il managed
{
.entrypoint
ldc.i4.s 10
ldc.i4.s 20
call void zzz::pqr(int32,int32)
ret
}
.method public hidebysig static void pqr(int32 i,int32 j) il managed
{
ret
}
}
The calling convention indicates
the order in which the parameters should be pushed onto the stack. The default
sequence in IL is the order in which they were written. Thus, the number 10
first goes onto the stack, followed by the number 20.
Microsoft implements the reverse
order. Thus, first 20 goes on the stack followed by 10. We cannot reason out
this idiosyncrasy.
a.cs
public class zzz
{
public static void Main()
{
bb a = new bb();
}
}
public class aa
{
public aa()
{
System.Console.WriteLine("in const aa");
}
public aa(int i)
{
System.Console.WriteLine("in const aa" + i);
}
}
public class bb : aa
{
public bb() : this(20)
{
System.Console.WriteLine("in const bb");
}
public bb(int i) : base(i)
{
System.Console.WriteLine("in const bb" + i);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig
static void vijay() il managed
{
.entrypoint
.locals (class bb V_0)
newobj instance void bb::.ctor()
stloc.0
ret
}
}
.class public auto ansi aa extends [mscorlib]System.Object
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void
[mscorlib]System.Object::.ctor()
ldstr "in const
aa"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
.method public hidebysig specialname rtspecialname instance
void .ctor(int32 i) il managed
{
ldarg.0
call instance void
[mscorlib]System.Object::.ctor()
ldstr "in const
aa"
ldarga.s i
box
[mscorlib]System.Int32
call class
System.String [mscorlib]System.String::Concat(class System.Object,class
System.Object)
call void [mscorlib]System.Console::WriteLine(class
System.String)
ret
}
}
.class public auto ansi bb extends aa
{
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
ldc.i4.s 20
call instance void
bb::.ctor(int32)
ldstr "in const
bb"
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
.method public hidebysig specialname rtspecialname instance
void .ctor(int32 i) il managed
{
ldarg.0
ldarg.1
call instance void
aa::.ctor(int32)
ldstr "in const
bb"
ldarga.s i
box
[mscorlib]System.Int32
call class
System.String [mscorlib]System.String::Concat(class System.Object,class
System.Object)
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Output
in const aa20
in const bb20
in const bb
We have created only one object,
which is an instance of the class bb. Instead of two constructors, one for the
base class and one from the derived class, three constructors are called.
• In IL, at first, a call is made to
the constructor of bb with no parameters.
• Then, on reaching the constructor bb,
a call is made to another constructor of the same class but with a parameter
value of 20. this(20) gets converted into an actual constructor call with one
parameter.
• Now, we move onto the one constructor
of bb. Here, initially a call the one constructor of aa is made as the base
class constructor needs to be called first.
Luckily, the base class
constructor of aa does not take us on another wild goose chase. After it
finishes execution, the strings are displayed, and finally, the constructor of
bb that has no parameters, gets called.
Thus, base and this do not exist
in IL and are compile time artefacts that get hard coded into the IL code.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object {
.method public hidebysig
static void vijay() il managed {
.entrypoint
.locals (class aa V_0)
newobj instance void aa::.ctor()
ret
}
}
.class public auto ansi aa extends [mscorlib]System.Object {
.method private hidebysig specialname rtspecialname instance
void .ctor() il managed
{
ret
}
}
Output
Exception occurred: System.MethodAccessException: aa..ctor()
at zzz.vijay()
We cannot access a private
member from outside the class. Thus, as we have made the only constructor
private in the class bb, we are not allowed to create any object that looks
like class bb. In C#, the same rules apply for the access modifiers also.
a.cs
public class zzz
{
public static void Main()
{
yyy a = new yyy();
}
}
class yyy
{
public int i;
public bool j;
public yyy()
{
System.Console.WriteLine(i);
System.Console.WriteLine(j);
}
}
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class yyy V_0)
newobj instance void
yyy::.ctor()
stloc.0
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
.field public int32 i
.field public bool j
.method public hidebysig specialname rtspecialname instance void
.ctor() il managed
{
ldarg.0
call instance void
[mscorlib]System.Object::.ctor()
ldarg.0
ldfld int32 yyy::i
call void
[mscorlib]System.Console::WriteLine(int32)
ldarg.0
ldfld bool yyy::j
call void [mscorlib]System.Console::WriteLine(bool)
ret
}
}
Output
0
False
Here, the variables i and j are
not initialized. Thus, these fields do not get initialized in the static
constructors of class yyy. Before any code in class yyy gets called, these
variables are assigned their default values, which depend upon their data type. In this case, they are initialised
by the constructors of the int and bool classes, since these constructors get
called first.
a.cs
class zzz
{
public static void Main()
{
int i = 10;
string j;
j = i >= 20 ? "hi" : "bye";
System.Console.WriteLine(j);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed {
.entrypoint
.locals (int32 V_0,class System.String V_1)
ldc.i4.s 10
stloc.0
ldloc.0
ldc.i4.s 20
bge.s IL_000f
ldstr
"bye"
br.s IL_0014
IL_000f: ldstr "hi"
IL_0014: stloc.1
ldloc.1
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Output
bye
The ternary operator is
glorified if statement compressed into a single line. The variables i and j in
C# become V_0 and V_1 on conversion to IL. We first initialize variable V_0 to
10 and then, place the condition value 20 on the stack.
The instruction bge.s is based
on the instructions clt and brfalse.
• If the condition is TRUE, bge.s
executes a jump to the label IL_0014.
• If the condition is FALSE, the program
proceeds to the label IL_000f.
Then, the program proceeds to
the WriteLine function and prints the appropriate text.
From the resultant IL code,
there is no way of deciphering whether the original C# code had used an if
statement or a ?: operator. A large number of operators in C#, such as the
ternary operator, have been borrowed from the C programming language.
a.cs
class zzz
{
public static void Main()
{
int i = 1, j= 2;
if ( i >= 4 &
j > 1)
System.Console.WriteLine("&
true");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32 V_0,int32 V_1)
ldc.i4.1
stloc.0
ldc.i4.2
stloc.1
ldloc.0
ldc.i4.4
clt
ldc.i4.0
ceq
ldloc.1
ldc.i4.1
cgt
and
brfalse.s IL_001c
ldstr "& true"
call void
[mscorlib]System.Console::WriteLine(class System.String)
IL_001c: ret
}
}
The & operator in C# makes
the if statement more complex. It only returns TRUE if both the conditions are
TRUE. Otherwise, it returns FALSE. There is no equivalent for the &
operator in IL. Thus, it is implemented in a round about way as follows:
• First we use the ldc instruction to
place a constant value on the stack.
• Next, the instruction stloc initializes
variables i and j i.e. V_0 and V_1.
• Then, the value of V_0 is placed on the
stack.
• Thereafter, the condition value 4 is
checked.
• Then, the condition clt is used to
check if the first item on the stack is less than the second. If it is, as is
the case in the above example, then the value 1 (TRUE) is put on the stack.
• The original expression in C# is i
>= 4. In IL, a check for < or clt is made.
• Then we check for equality i.e. = using
ceq and place zero on the stack. This results in a FALSE.
• Then we follow the same rules for j
> 1. Here, we use cgt instead of clt. The result of the cgt operator is
TRUE.
• This result of TRUE is ANDED with the
previous result of FALSE to finally give a FALSE value.
Note that the AND instruction
will return a 1, if and only if, both the conditions are TRUE. In all other
conditions, it will return FALSE.
a.cs
class zzz
{
public static void Main()
{
int i = 1, j= 2;
if ( i >= 4 &&
j > 1)
System.Console.WriteLine("&&
true");
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32 V_0,int32 V_1)
ldc.i4.1
stloc.0
ldc.i4.2
stloc.1
ldloc.0
ldc.i4.4
blt.s IL_0016
ldloc.1
ldc.i4.1
ble.s IL_0016
ldstr "&& true"
call void
[mscorlib]System.Console::WriteLine(class System.String)
IL_0016: ret
}
}
Operators like the &&
operator are called short circuit operators as they execute the second
condition only if the first condition is true. We have repeated the same IL
code as earlier, but now the condition is checked by instruction blt.s, a
combination of the clt and brtrue instructions.
If the condition is FALSE, a
jump is made to the ret instruction at label IL_0016. Only if the condition is
TRUE, we proceed further and check the second condition. For this, we use the
instruction ble.s that is a combination of cgt and brfalse. If the second
condition is FALSE, we jump to the ret command as before and for TRUE we
execute the WriteLine function.
The && operator executes
faster than the & because it only proceeds further if the first condition
results in TRUE. In doing so, the output of the first expression affects the
final outcome.
The | and || operators also
behave in a similar manner.
a.cs
class zzz {
public static void Main()
{
bool x,y;
x = true;
y = false;
System.Console.WriteLine( x ^ y);
x = false;
System.Console.WriteLine( x ^ y); }
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object {
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (bool V_0,bool V_1)
ldc.i4.1
stloc.0
ldc.i4.0
stloc.1
ldloc.0
ldloc.1
xor
call void
[mscorlib]System.Console::WriteLine(bool)
ldc.i4.0
stloc.0
ldloc.0
ldloc.1
xor
call void
[mscorlib]System.Console::WriteLine(bool)
ret
}
}
Output
True
False
The ^ sign is called an XOR
operator. The XOR is like an OR statement, but there is a difference: An OR
returns TRUE if any of its operands is TRUE, but an XOR will return TRUE if and
only if one of its operands is TRUE and the other one is FALSE. Even if both
operands are TRUE, it will return FALSE. xor is an IL instruction.
The != operator gets converted
into the normal set of IL instructions i.e. a comparison is done and the
program branches accordingly.
a.cs
class zzz
{
public static void Main()
{
bool x = true;
System.Console.WriteLine(!x);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (bool V_0)
ldc.i4.1
stloc.0
ldloc.0
ldc.i4.0
ceq
call void
[mscorlib]System.Console::WriteLine(bool)
ret
}
}
Output
False
The ! operator in C# converts a
TRUE to a FALSE and vice versa. In IL, the instruction used is ceq. This
instruction checks the last two parameters on the stack. If they are the same,
it returns TRUE, otherwise it returns FALSE.
Since the variable x is TRUE, it gets initialized to 1. It is thereafter checked for equality with the value 0. As they are not equal, the final result is 0 or FALSE. This result is put on the stack. The same logic applies had x been FALSE. 0 would have been put on the stack and checked for equality with the other 0. Since they match the final answer would be TRUE.