-12-
Arrays
An array is a contiguous block
of memory that stores values of the same type. These values are an indexed
collection. The runtime has built in support to handle arrays. Vector is
another name for an array that has only one dimension and the index count
starts at zero. An array type can be any type derived from System.Object. This
includes everything under the sun, excluding pointers, which are not allowed in
this version of the CLR. Nobody knows about the next version. An array is a
subtype of System.Array and we are given plenty of leeway in working with
arrays. The newarr instruction is used only for single dimensional arrays.
a.cs
class zzz
{
public static void Main()
{
int[] a;
a= new int[3];
a[1]= 10;
System.Console.WriteLine(a[1]);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[] V_0)
ldc.i4.3
newarr
[mscorlib]System.Int32
stloc.0
ldloc.0
ldc.i4.1
ldc.i4.s 10
stelem.i4
ldloc.0
ldc.i4.1
ldelem.i4
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
10
IL recognises the array data
type. Thus, in the locals directive, we see an array of int32 called V_0. This
is similar to the process of creating an array in C# where we first specify
that we want an array variable. Then, to create the actual array, the size of
the array is mentioned. In IL, the size is placed on the stack. IL uses newarr,
similar to newobj to create the array
in memory. However, in C#, new is used
for an array as well as for a reference type. The data type of the array to be
created is also passed to the newarr instruction. Like newobj, newarr also
places the reference of the array on the stack. Thereafter, V_0 is initialized
with this reference, which is pushed on the stack using ldloc.0.
We will now explain the IL code
generated for the statement a[1] = 10. To do so, the array index, in this case,
the value 1 followed by the value of the array is to be initialized i.e. 10 is
pushed on the stack. So, there are 3 items on our stack: At the bottom, the
array reference, then the array index and finally the new value of the array
variables.
These parameters are required by
the instruction stelem.i4 to initialize an array member. To read the value of
an array variable, the address of the
array reference is loaded on the stack, followed by the index of the array. The
instruction ldelem.i4 does the reverse. It retrieves the value of an array
variable. As mentioned earlier, i4 stands for 4 bytes on the stack. Most
instructions have such a data type at the end of their instruction.
a.cs
class zzz
{
public static void Main(string[] a)
{
System.Console.WriteLine(a.Length);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay(class System.String[] a) il managed
{
.entrypoint
ldarg.0
ldlen
conv.i4
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
> a one two
Output
2
The array class has a member
called Length.This Length member in C# gets converted to an IL instruction
ldlen, that requires an array object on the stack and returns the length. Array
handling is very powerful in .NET because IL has an intrinsic ability to
understand arrays. In IL, the array has been made a first class member.
a.cs
class zzz
{
public static void Main()
{
int[] a;
a= new int[2];
a[0]= 12; a[1]= 10;
foreach (int i in a)
System.Console.WriteLine(i);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[] V_0,int32 V_1,int32[] V_2,int32 V_3,int32 V_4)
ldc.i4.2
newarr [mscorlib]System.Int32
stloc.0
ldloc.0
ldc.i4.0
ldc.i4.s 12
stelem.i4
ldloc.0
ldc.i4.1
ldc.i4.s 10
stelem.i4
ldloc.0
stloc.2
ldloc.2
ldlen
conv.i4
stloc.3
ldc.i4.0
stloc.s V_4
br.s IL_002d
IL_001c: ldloc.2
ldloc.s V_4
ldelem.i4
stloc.1
ldloc.1
call void
[mscorlib]System.Console::WriteLine(int32)
ldloc.s V_4
ldc.i4.1
add
stloc.s V_4
IL_002d: ldloc.s V_4
ldloc.3
blt.s IL_001c
ret
}
}
Output
12
10
Here we have a small C# program
that has been transformed to a large IL program. To begin, we have created 5
locals instead on 1. Two of them, V_0 and V_2, are arrays and the rest are mere
ints. The two stelem.i4 instructions initialize the 2 array members as seen in
the above programs.
Now let us understand how IL
deals with a foreach statement. Ldloc.0 stores the reference of the array on
the stack. The instruction stloc.2 makes local V_2 as the same array reference
as V_0. Then the array reference V_2, which is similar to V_0, is loaded on the
stack. Finally using instruction ldlen,
the length of the array is determined.
The number 2 is present on the
stack. This represents the length of the array. It is changed to occupy 4 bytes
on the stack and is stored in local
V_3, using the instruction stloc.3. The number 0 is then placed on the stack
using the ldc instruction. stloc pops this value 0 into local V_4 and br branches to label IL_002d where the
value of variable V_4, 0, is loaded. Also the value of local V_3, that stores
the length of the array, i.e. 2 is loaded on the stack.
Since 1 is less than 2, the code
at label IL_001c is executed. This loads the array reference on the stack, then
loads local V_4, which is the index.
Finally, ldelem fetches the value of member a[0].
Adding 1 to the member V_4
serves a dual purpose: One to index the array for ldelema.i4 and the other to
stop the loop whenever we cross the length of the array stored in local V_3.
This is how a for each statement is converted, step by step, into IL code.
a.cs
public class zzz
{
public static void Main()
{
zzz z = new zzz();
z.abc("hi","bye");
}
void abc(params string [] b)
{
System.Console.WriteLine(b[0]);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class zzz V_0,class System.String[] V_1)
newobj instance void
zzz::.ctor()
stloc.0
ldloc.0
ldc.i4.2
newarr
[mscorlib]System.String
stloc.1
ldloc.1
ldc.i4.0
ldstr "hi"
stelem.ref
ldloc.1
ldc.i4.1
ldstr
"bye"
stelem.ref
ldloc.1
call instance void zzz::abc(class System.String[])
ret
}
.method private hidebysig instance void abc(class
System.String[] b) il managed
{
.param [1]
.custom instance void [mscorlib]System.ParamArrayAttribute::.ctor()
= ( 01 00 00 00 )
ldarg.1
ldc.i4.0
ldelem.ref
call void
[mscorlib]System.Console::WriteLine(class System.String)
ret
}
}
Output
hi
A function with params parameter
accepts a variable number of parameters.
How does the compiler handle it?
As usual, we see object V_0,
that is an instance of class zzz. Alongwith it is an array of strings V_1, which we have not created. The number 2
is then placed on the stack and following it is an array of size 2. As the two parameters i.e. the strings
"hi" and "bye", are to be placed on the stack, IL first
creates an array of size of 2. This
array address is pushed onto the stack.
Using ldc.i4.0, index 0 is
pushed on the stack, followed by a string "hi". Thereafter instruction stelem is suffixed
with the type. Here, ref stands for the object itself. Thus, the temp array
V_1's first or the zeroth member gets a
value "hi" and the same process is repeated for the second array
member. Thus, for a params parameter, all the parameters are converted into one
huge array and the function abc is called with this array on the stack. The
final effect is similar to placing all the individual parameters in one big
array.
In the function abc, the first
change is that the function accepts an array with the same name as in C#. This
param directive uses the metadata to store an initial value for the array. The
array has two members "hi" and "bye". It is this data that
the array b's members must be initialized to.
The .params with number 1 stands for the first parameter in the function prototype. Here 0 stands for
the return value and 1 stands for the first parameter, that is our array.
We will explore the custom
directive in detail later. The rest of
the IL code loads the second member of the array on the stack using ldelem.ref.
This is similar in concept to stelem.ref. Thus, the compiler does a lot of hard
work for implementing the params modifier. To sum up, it converts all the
individual parameters into one array, and this array is placed on the stack. IL
does not fully understand the params modifier. Thus the params modifier has to
be the last entry in the parameter list. The ref prefix is used to denote a
reference element.
a.cs
class zzz
{
public static void Main()
{
zzz a = new zzz();
a.abc();
}
unsafe public void pqr( int *b)
{
System.Console.WriteLine(b[1]);
b[1] = 16;
}
unsafe public void abc()
{
int [] a = new int[2];
a[0] = 10; a[1] = 2;
fixed ( int *i = a) pqr(i);
System.Console.WriteLine(a[1]);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class zzz V_0)
newobj instance void
zzz::.ctor()
stloc.0
ldloc.0
call instance void zzz::abc()
ret
}
.method public hidebysig instance void abc() il managed
{
.locals (int32[] V_0,int32&
pinned V_1)
ldc.i4.2
newarr
[mscorlib]System.Int32
stloc.0
ldloc.0
ldc.i4.0
ldc.i4.s 10
stelem.i4
ldloc.0
ldc.i4.1
ldc.i4.2
stelem.i4
ldloc.0
ldc.i4.0
ldelema
[mscorlib]System.Int32
stloc.1
ldarg.0
ldloc.1
conv.i
call instance void
zzz::pqr(int32*)
ldc.i4.0
conv.u
stloc.1
ldloc.0
ldc.i4.1
ldelem.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ret
}
}
.method public hidebysig instance void pqr(int32* b) il managed
{
ldarg.1
ldc.i4.4
ldc.i4.1
mul
add
ldind.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ldarg.1
ldc.i4.4
ldc.i4.1
mul
add
ldc.i4.s 16
stind.i4
ret
}
Output
2
16
Here, we will explain certain
features of pointer handling in C# and IL. In the C# program we have created an
array of size 2 in the function abc and the array members are initialised. The
keyword fixed fixes the array reference in memory. For the purpose of
efficiency, the garbage collector can move things around in memory. By fixing
the reference in memory, we can prevent the Garbage Collector from moving this
reference in memory.
This array reference is stored
in a pointer to an int and the function pqr is called. This function displays
the value of the first member of the array and then changes it. The change is
reflected in the original array also. In the locals, we define our int array as
usual, but we have another variable V_1, that is also a pointer, but with a
& and not a *. This pointer is also
pinned, which means that IL will not move it around. If it is moved in memory,
then we cannot keep track of its memory location. Thus, a fixed becomes a
pinned location.
Using ldelema, the array and its
index are pushed on the stack. V_1 is
initialized to this value and function
pqr is called. In the function pqr,
a [] is converted into a memory location. Thus, the address of the array is loaded on the
stack. Then, the numbers 4 and 1 are placed on the stack because an int size is
4 and the array index is 1. After multiplying them, 4 is added to the product
to get the offset. The array members are then displayed. The same logic on
arrays can be applied to change its value. Whether a[1] or *(a+1) is used, the above program remains the same.
a.cs
public class zzz
{
public static void Main()
{
string [] s = new string[3];
object [] t = s;
t[0] = null;
t[1] = "hi";
t[2] = new yyy();
}
}
class yyy
{
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.String[] V_0,class System.Object[] V_1)
ldc.i4.3
newarr [mscorlib]System.String
stloc.0
ldloc.0
stloc.1
ldloc.1
ldc.i4.0
ldnull
stelem.ref
ldloc.1
ldc.i4.1
ldstr "hi"
stelem.ref
ldloc.1
ldc.i4.2
newobj instance void
yyy::.ctor()
stelem.ref
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
}
Output
Exception occurred: System.ArrayTypeMismatchException: An
exception of type System.ArrayTypeMismatchException was thrown.
at zzz.vijay()
The array s is an array of three
strings. We have declared an array of objects but initialised it to an array of
strings, which is perfectly legal in C#. We then initialised the members of t
to a null, a string and a yyy object respectively. The runtime knows that even
though t is an array of objects, it was initialized to an array of strings. Its
members can only be strings or a NULL.
The IL code is very
straightforward. It uses newarr to create an array of strings. Then it uses
stloc.1 to initialize V_1 or array t. Thereafter, stelem.ref is used to
initialize the individual array members. However, the last stelem.ref checks
the data type of the runtime error and flags it as an exception. The code used
for throwing the exception is not present in the array class at all. It is in
stelem.ref and we are not privy to this code.
a.cs
public class zzz
{
public static void Main()
{
string [] s = new string[3];
object [] t = s;
t[0] = (string)new yyy();
System.Console.WriteLine(t[0]);
t[1] = new yyy();
System.Console.WriteLine(t[1]);
}
}
class yyy
{
public static implicit operator string ( yyy a)
{
return "hi";
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.String[] V_0,class System.Object[] V_1)
ldc.i4.3
newarr
[mscorlib]System.String
stloc.0
ldloc.0
stloc.1
ldloc.1
ldc.i4.0
newobj instance void
yyy::.ctor()
call class
System.String yyy::op_Implicit(class yyy)
stelem.ref
ldloc.1
ldc.i4.0
ldelem.ref
call void
[mscorlib]System.Console::WriteLine(class System.Object)
ldloc.1
ldc.i4.1
newobj instance void
yyy::.ctor()
stelem.ref
ldloc.1
ldc.i4.1
ldelem.ref
call void [mscorlib]System.Console::WriteLine(class
System.Object)
ret
}
}
.class private auto ansi yyy extends [mscorlib]System.Object
{
.method public hidebysig specialname static class
System.String op_Implicit(class yyy a)
il managed
{
.locals (class System.String V_0)
ldstr "hi"
stloc.0
ldloc.0
ret
}
}
Output
hi
Exception occurred: System.ArrayTypeMismatchException: An
exception of type System.ArrayTypeMismatchException was thrown.
at zzz.vijay()
Had the compiler been a little
more concerned about exceptions, it would have prevented the above program from
throwing one at runtime, by spotting the error at compile time itself. We have
the same situation as before. The array t is an array of objects, but
initialized to an array of strings. The member t[0] is initialized to a yyy
object, but now with a cast. This cast calls the string operator or op_Implicit
functions, that returns a string.
As the cast is not stated
explicitly in the second case, the function op_Implicit does not convert the
yyy object into a String. The compiler should have noticed it at run time and
thrown an exception. But it ignores
this completely. Sometimes compilers do not behave as intelligently as
expected.
a.cs
class zzz
{
static void F(params object[] b)
{
object o = b[0];
System.Console.WriteLine(o.GetType().FullName );
System.Console.WriteLine(b.Length);
}
static void Main()
{
object[] a = {1, "Hello", 123};
object o = a;
F(a);
F((object)a);
F(o);
F((object[])o);
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method private hidebysig static void F(class System.Object[] b)
il managed
{
.param [1]
.custom instance void
[mscorlib]System.ParamArrayAttribute::.ctor() = ( 01 00 00 00 )
.locals (class System.Object V_0)
ldarg.0
ldc.i4.0
ldelem.ref
stloc.0
ldloc.0
call instance class
[mscorlib]System.Type [mscorlib]System.Object::GetType()
callvirt instance class
System.String [mscorlib]System.Type::get_FullName()
call void
[mscorlib]System.Console::WriteLine(class System.String)
ldarg.0
ldlen
conv.i4
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
.method private hidebysig static void vijay() il managed
{
.entrypoint
.locals (class System.Object[] V_0,class System.Object V_1,class
System.Object[] V_2,int32 V_3)
ldc.i4.3
newarr
[mscorlib]System.Object
stloc.2
ldloc.2
ldc.i4.0
ldc.i4.1
stloc.3
ldloca.s V_3
box
[mscorlib]System.Int32
stelem.ref
ldloc.2
ldc.i4.1
ldstr
"Hello"
stelem.ref
ldloc.2
ldc.i4.2
ldc.i4.s 123
stloc.3
ldloca.s V_3
box
[mscorlib]System.Int32
stelem.ref
ldloc.2
stloc.0
ldloc.0
stloc.1
ldloc.0
call void zzz::F(class
System.Object[])
ldc.i4.1
newarr
[mscorlib]System.Object
stloc.2
ldloc.2
ldc.i4.0
ldloc.0
stelem.ref
ldloc.2
call void
zzz::F(class System.Object[])
ldc.i4.1
newarr
[mscorlib]System.Object
stloc.2
ldloc.2
ldc.i4.0
ldloc.1
stelem.ref
ldloc.2
call void
zzz::F(class System.Object[])
ldloc.1
castclass class
System.Object[]
call void
zzz::F(class System.Object[])
ret
}
}
Output
System.Int32
3
System.Object[]
1
System.Object[]
1
System.Int32
3
This is quite a huge program.
The explanation is slightly complicated but, without understanding IL code, it
is next to impossible to understand the nitty-gritty of C#.
Lets us tread one step at a
time. This example demonstrates some basic concepts of C# programming. We first
create an array of objects called a, of size 3 and initialize them to two
numbers and one string. Remember that everything in the .NET world is an
object. Then we have another object o that is initialized to a. We do not get
an error, but you need to bear in mind that a is an array and o is an object,
that now stirs a reference to an array.
We call the function F four
times:
• first with the object a, which is an
array.
• then with the same object cast to an
object.
• then with the object o.
• finally with the object a cast to an
array of objects.
The function F accepts the
parameter in an array of objects called b. The first member b[0] is stored in
an object called o. The fullname of this object and the length of the array are
printed using the WriteLine function.
In the first case, an array of 3
ints is placed on the stack. The name is System.Int32 and the size of the array
is 3.
In the second case, as the array
is casted into an Object, only the first member becomes a System.Object.
The third case has an object
placed on the stack which is read in an array of objects. The size is displayed
as 1 since the size of the original is 1.
In the last case, C# remembers
that o was equated to an array of 3 ints and thus the new array size is 3.
Up to the stelem.ref
statement,the 3 array members are merely being initialized to the value of 1,
Hello and 123. The local V_0 is array a
and local V_1 refers to object o. As it is an array of objects, the string does
not pose any problems, but since the numbers are value types, they have to be
first converted to a reference type using the box instruction.
The first call simply places the
array stored in local V_0 on the stack. The second call places 1 on the stack
and then creates a new array of size 1 using newarr. It stores this new array
in local V_2 and then loads the value of local V_2, which is an object, on the
stack. Then, it loads a 0 and the first main array containing 3 members, on the
stack. stelem.ref is used to initialize V_2 to this value. This local is then
placed on the stack. See what a simple cast does.
Similarly, in the third case we
create an array of size 1, store it in local V_0 and then place it on the
stack. Then, we place 0 and the local V_1 on the stack and initialize V_1 to it
for the function. The last call simply places the object V_1 on the stack and
calls castclass. Function F is straightforward while performing its job. Ask
yourself whether it was the C# code that enabled you to grasp the program or was
it the IL code?
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed {
.entrypoint
.locals (int32[] V_0)
ldc.i4.3
newarr
[mscorlib]System.Int32
stloc.0
ldloc.0
ldc.i4.6
ldc.i4.s 10
stelem.i4
ret
}
}
Output
Exception occurred: System.IndexOutOfRangeException: An
exception of type System.IndexOutOfRangeException was thrown.
at zzz.vijay()
Our array above has only 3
members, whereas we tried to store a value in the seventh member. Whenever we
exceed the bounds of an array, we will get a IndexOutOfRangeException at
runtime. Thus, be careful in dealing with arrays. Do not cross the picket line.
We store values in an array and index them, so that we can retrieve a single
item by position.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[] V_0)
ldc.i4.3
newarr
[mscorlib]System.Int32
stloc.0
ldloc.0
ldc.i4.1
ldelema System.Int32
ldc.i4.2
stobj System.Int32
ldloc.0
ldc.i4.1
ldelema System.Int32
ldobj System.Int32
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
2
We have different instructions
for dealing with value types and arrays. Arrays are nothing but a number of
variables stored together in memory. The ldelema takes two parameters on the
stack. The first is the address of the array that is V_0 and the second is the
index of the variable whose memory location is desired.
After running the instruction we
have on the stack, the address of a variable at a specified array index. The
instruction ldelema requires the data type of the array, because the offset of
the members of the array is decided by the data type. The instruction stobj
stores the value in the memory location thereby initializing the first member
of the array to 10.
To display the first member, the
address is placed on the stack and
ldobj is used to retrieve the value. The instructions ldobj and stobj
have nothing to do with arrays. They deal with reading a memory location and
placing the value found on the stack
and vice versa. Thus they only work with value type arrays.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldnull
ldc.i4.1
ldc.i4.s 10
stelem.i4
ret
}
}
Output
Exception occurred: System.NullReferenceException: Attempted
to dereference a null object reference.
at zzz.vijay()
Since we placed a null array
reference on the stack, we get an NullReferenceException error. We are
basically simulating some of the exceptions
that arrays can throw at us.
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldc.i4.3
newarr [mscorlib]System.Int32
call instance int32 [mscorlib]System.Array::get_Length()
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
3
Like we used the ldlen
instruction earlier, we could have instead used the get_Length function, which
in turn, is a Property of the Array class. The choice is yours, but as we
demonstrated earlier, the Length property is converted to the ldlen instruction
by the C# compiler, as it is far more efficient. At the end of the day, the
get_Length function does the same thing. IL does not have instructions that can
handle arrays other than vectors. Thus, multi-dimensional arrays, also called
general arrays, are created using array functions.
a.cs
class zzz
{
public static void Main()
{
int [,] a = new int[1,2];
a[0,0] = 10;
a[0,1] = 20;
System.Console.WriteLine(a[0,1]);
}
}
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.field private int32 x
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[0...,0...] V_0)
ldc.i4.1
ldc.i4.2
newobj instance void int32[0...,0...]::.ctor(int32,int32)
stloc.0
ldloc.0
ldc.i4.0
ldc.i4.0
ldc.i4.s 10
call instance void int32[0...,0...]::Set(int32,int32,int32)
ldloc.0
ldc.i4.0
ldc.i4.1
ldc.i4.s 20
call instance void int32[0...,0...]::Set(int32,int32,int32)
ldloc.0
ldc.i4.0
ldc.i4.1
call instance int32 int32[0...,0...]::Get(int32,int32)
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
20
One area where C# excels in is
array handling. This is only because IL understands arrays internally. Lets us
now find out how IL handles two dimensional arrays.
A two dimensional array is
declared in the same way that a normal array is declared, and the dimensions
are stated in the new instruction. The array index starts from 0 and not from
1. In IL, to create a two dimensional array, there is a special syntax, i.e. a
0 followed by 3 dots, twice in the locals directive. The two array dimensions
are placed on the stack and newobj is called. It is not newarr. Newobj calls
the constructor of the two dimensional array class that takes two parameters.
The return value is then stored in local V_0.
To fetch a value from a two
dimensional array, the reference to the array is loaded on the stack and stored
in V_0, followed by the two indexes, using ldc. Thereafter the values are
placed on the stack to initialize the
array member. The function Set of the same int array class is called with four
parameters on the stack.
Conversely, to fetch a value,
the function Get is called with the 3 parameters on the stack, the array
reference and the 2 index values. Thus, multi-dimensional arrays are built
using array class functions, and not IL instructions, which are used to build
single dimensional arrays. The rank of an array is defined as the number of
dimensions of the array. The runtime expects at least a rank of 1.
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.field private int32 x
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[1...3,5...] V_0)
ldc.i4.2
ldc.i4.6
newobj instance void int32[1...3,5...]::.ctor(int32,int32)
pop
ret
}
}
A general purpose array has an
upper bound and a lower bound. Unfortunately, as of now, the runtime does not
do any bound checking. The first dimension has a lower bound of 1 and an upper
bound of 3. You can choose the bounds you desire.
a.cs
class zzz
{
public static void Main()
{
int [,,] a;
a = new int[2,3,4];
a[1,2,3] = 10;
System.Console.WriteLine(a[1,2,3]);
}
}
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[0...,0...,0...] V_0)
ldc.i4.2
ldc.i4.3
ldc.i4.4
newobj instance void
int32[0...,0...,0...]::.ctor(int32,int32,int32)
stloc.0
ldloc.0
ldc.i4.1
ldc.i4.2
ldc.i4.3
ldc.i4.s 10
call instance void
int32[0...,0...,0...]::Set(int32,int32,int32,int32)
ldloc.0
ldc.i4.1
ldc.i4.2
ldc.i4.3
call instance int32
int32[0...,0...,0...]::Get(int32,int32,int32)
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
10
An array can have any rank. The
above array is a three dimensional one
and has a rank of 3. So, we have to use the array handling functions to work
with them. The rank of an array is declared by using a comma between the square
brackets. The number of commas plus one is the rank of an array. If no specific
bounds are supplied, the default is 0 for the lower bound and infinity for the
upper bound.
You can specify none, one or
both bounds. The CLR, in this version, ignores all the bounds information you
provide, and only pays heed to the number placed on the stack at the time of
creation of the array. Here, you have to supply all the information. Only those
arrays that have a 0 bound in all their dimensions, are CLR compliant.
In the above example, three bound
values are placed on the stack and the array constructor is called with three
values. We are not allowed to use newarr, as the above array is not a vector.
Now to set it to a value, the three index values are placed on the stack in a
specific order. The same Set Function
is called, but this time with four parameters. The same rules are relevant for
the Get function also. The point that we want to make is that the magnitude of
the rank has no effect on the way the array is handled. No substantial changes
are required.
There are two array constructors
that can be used. The first takes the same number of parameters as the rank of
the array. The second constructor takes up twice the number of parameters as
the rank of the array. In the second type of constructor, the first two
parameters specify the lower and upper bounds of the first dimension, and the
next two parameters specify the upper and lower bounds for the second dimension
and so on. The first constructor always assumes the lower bound to be zero.
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[5...10,3...7] V_0)
ldc.i4.5
ldc.i4.6
ldc.i4.3
ldc.i4.5
newobj instance void int32[5...10,3...7]::.ctor(int32,
int32,int32,int32)
stloc.0
ldloc.0
ldc.i4.6
ldc.i4.5
ldc.i4.s 10
call instance void int32[0...,0...]::Set(int32,int32,int32)
ldloc.0
ldc.i4.6
ldc.i4.5
call instance int32
int32[0...,0...]::Get(int32,int32)
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
10
ldc.i4.6
ldc.i4.5
We then change the above two
lines to
ldc.i4.1
ldc.i4.2
and we see the following
exception thrown at us.
Exception occurred: System.IndexOutOfRangeException: An
exception of type System.IndexOutOfRangeException was thrown.
at zzz.vijay()
An array with a lower and upper
bound, having a rank of 2 is placed on the stack. The first dimension starts at
5 and ends at 10. Thus, on the stack is placed first the lower bound i.e. 5,
and then, the length of the array. There is no upper bounds. As the array
starts at 5 and ends at 10, the length is calculated as follows: 10 - 5 + 1 = 6
(i.e. the upper bound - lower bound + 1). The same rule holds true for the next
rank.
The rest of the code remains the
same. When the array member 6, 5 are changed to index values of 1, 2, an
exception is thrown. This is because
the array bounds for the first dimension are 5 to 10 and for the second
dimension are 3 to 7. Any attempt to cross the array bounds in any direction
generates an exception.
Array
of Arrays
a.cs
class zzz
{
public static void Main()
{
int [][] a = new int[2][];
a[0] = new int[1];
a[1] = new int[10];
System.Console.WriteLine(a.Length) ;
System.Console.WriteLine(a[0].Length) ;
}
}
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.field private int32 x
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[][] V_0)
ldc.i4.2
newarr int32[]
stloc.0
ldloc.0
ldc.i4.0
ldc.i4.1
newarr
[mscorlib]System.Int32
stelem.ref
ldloc.0
ldc.i4.1
ldc.i4.s 10
newarr
[mscorlib]System.Int32
stelem.ref
ldloc.0
ldlen
conv.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ldloc.0
ldc.i4.0
ldelem.ref
ldlen
conv.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
2
1
Let us explore jagged arrays
where an array member can contain another array of a different length. We are
creating an array that has an irregular shape. In C#, the syntax to create an
array of arrays is the same. It consists of two square brackets [][]. We first
create the array using only the first dimension. This is done by using newarr
and stating an array data type as a parameter. We then initialize V_0 with this
array reference.
Now, since we have to create two
separate one dimensional arrays, we first place the array reference on the
stack. Then we place the index of the array member we want to initialise
followed by the size of the new array. Finally, we call newarr to create an
array of ints and place the reference on the stack. stelem.ref is used to initialize the array member with this array
reference. The same is repeated for the second member a[1] also.
The function ldlen returns the
length of the array. For the main array, using ldloc.0 its reference is placed
on the stack. For the second length, ldelem.ref is used to first fetch the
reference of the array out of the first array member a[0], and then ldlen is
used to obtain the length.
a.cs
class zzz
{
public static void Main()
{
int[][] a = new int[2][] { new int[] {2,3}, new int[] {5,6,7} };
System.Console.WriteLine(a[0][1]) ;
System.Console.WriteLine(a[1][2]) ;
}
}
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.field private int32 x
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[][] V_0,int32[][] V_1,int32[] V_2)
ldc.i4.2
newarr int32[]
stloc.1
ldloc.1
ldc.i4.0
ldc.i4.2
newarr
[mscorlib]System.Int32
stloc.2
ldloc.2
ldc.i4.0
ldc.i4.2
stelem.i4
ldloc.2
ldc.i4.1
ldc.i4.3
stelem.i4
ldloc.2
stelem.ref
ldloc.1
ldc.i4.1
ldc.i4.3
newarr
[mscorlib]System.Int32
stloc.2
ldloc.2
ldc.i4.0
ldc.i4.5
stelem.i4
ldloc.2
ldc.i4.1
ldc.i4.6
stelem.i4
ldloc.2
ldc.i4.2
ldc.i4.7
stelem.i4
ldloc.2
stelem.ref
ldloc.1
stloc.0
ldloc.0
ldc.i4.0
ldelem.ref
ldc.i4.1
ldelem.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ldloc.0
ldc.i4.1
ldelem.ref
ldc.i4.2
ldelem.i4
call void
[mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
3
7
The above example is similar to
its predecessor, though it is more elaborate and complete. A jagged array is
created that is made of two arrays of sizes
2 and 3 respectively. They can be initialized in one stroke. IL does it the hard way. To fetch the value of
a[1][2], it places the reference of the array on the stack. Then it places 1,
the first array index, on the stack. Thereafter, ldelem.ref is used to obtain
an array reference.
Thus, at first an array
reference is pushed on the stack. Then 2 is placed on the stack, and ldelema.i4
is used to get the second member of this new array. A jagged array is treated
as an array whose members contain other independent arrays.
An array of arrays is different
from a multi dimensional array. A multi dimensional array forms one memory
block whereas, an array of arrays holds references to other arrays in memory.
Thus, an array of arrays is slower in execution since it needs to make an extra
indirection to reach the final element.
We can also use pointers with
arrays. The salient feature of an array of arrays is that, the first array
merely stores the addresses of other arrays. The disadvantage of a multi
dimensional array is the fact that, all the dimensions have to be of the same
size.
a.il
.assembly mukhi {}
.class public auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
.locals (int32[][][] a)
ldc.i4.5
newobj instance void int32[][][]::.ctor(int32)
stloc a
ldloc a
ldc.i4.0
ldc.i4.3
newobj instance void int32[][]::.ctor(int32)
call instance void int32[][][]::Set(int32, int32[][])
ldloc a
ldc.i4.0
call instance int32[][] int32[][][]::Get(int32)
ldc.i4.1
ldc.i4 10
newobj instance void int32[]::.ctor(int32)
call instance void int32[][]::Set(int32, int32[])
ldloc a
ldc.i4.0
call instance int32[][] int32[][][]::Get(int32)
ldc.i4.1
call instance int32[] int32[][]::Get(int32)
ldc.i4.5
ldc.i4 100
call instance void int32[]::Set(int32, int32)
ldloc a
ldc.i4.0
call instance int32[][] int32[][][]::Get(int32)
ldc.i4.1
call instance int32[] int32[][]::Get(int32)
ldc.i4.5
call instance int32 int32[]::Get(int32)
call void [mscorlib]System.Console::WriteLine(int32)
ret
}
}
Output
100
Here, we shall see how to create
an array a of type [][][]. We first create a local a of type array of array of
array. Thus, we have two levels of indirection. We want the first or main array
to have a size of 5 i.e. it should be able to store the references of 5 arrays
in memory. The instruction ldc places the size 5 of this array on the stack.
Thereafter newobj is used to create the first dimension of this array. The
instruction stloc a initializes this array and ldloc a put its reference on the
stack.
Subsequently two values are
placed on the stack. One is the index of the first member a[0] and the other is
the size of the array that this member should point to i.e. 3. newobj creates
an array called int32[][]. To store it in a[0] the Set function is used. This
function requires the index of the array as the first parameter. Hence, 0 is
placed on the stack, even though newobj does not require it. It simplifies the
call of the Set function.
The next thing required is an
int32[] to store in our int32[][]. So, the array a is placed again on the stack
and 0 is used to obtain the value of the array that has just been created. The
Get functions does the job of retrieving values. Then, as before, 1 is placed
on the stack followed by the size of the new array i.e.10. Finally, newobj creates a simple array int32[] and places it on the stack which is then
stored using the Set function.
Remember that the value 1 has
already been placed on the stack. To
execute the operation a[0][1][5] = 100
the member a[0] is requred. So, the array reference a is placed on the
stack followed by 0 and the Get function is called.
To access a[0][1], as the first
member of array a[0] is already on the stack, all that is requred is placing 1
on the stack and calling Get again. Now, to store the value in the member
a[0][1][5], 5 is loaded on the stack. To fetch the values of member a[0][1][5],
the same procedure as before is followed.
That is
• load the array reference on the stack.
• obtain the member 0 by using get.
• obtain the member 1 of this array
• finally the member 5 on this array.
The logic is the same as
described earlier.
a.cs
using System;
public class zzz
{
public static void abc(int i,
__arglist)
{
ArgIterator a = new ArgIterator(__arglist);
while (a.GetRemainingCount() > 0)
Console.WriteLine(__refvalue(a.GetNextArg(), int));
}
public static void Main()
{
abc(20, __arglist(1, 2, 3));
}
}
a.il
.assembly mukhi {}
.class private auto ansi zzz extends [mscorlib]System.Object
{
.method public hidebysig static void vijay() il managed
{
.entrypoint
ldc.i4.s 20
ldc.i4.1
ldc.i4.2
ldc.i4.3
call vararg void zzz::abc(int32,...,int32,int32,int32)
ret
}
.method public hidebysig static vararg void abc(int32 i) il
managed
{
.locals (value class [mscorlib]System.ArgIterator V_0)
ldloca.s V_0
arglist
call instance void
[mscorlib]System.ArgIterator::.ctor(value class
[mscorlib]System.RuntimeArgumentHandle)
br.s IL_001d
IL_000b: ldloca.s V_0
call instance
typedref [mscorlib]System.ArgIterator::GetNextArg()
refanyval
[mscorlib]System.Int32
ldind.i4
call void
[mscorlib]System.Console::WriteLine(int32)
IL_001d: ldloca.s V_0
call instance int32
[mscorlib]System.ArgIterator::GetRemainingCount()
ldc.i4.0
bgt.s IL_000b
ret
}
}
Output
1
2
3
This example builds upon the
earlier example, which has a function that accepts a variable number of
arguments. In C#, __arglist enables us
to implement a function that accepts a variable number of arguments.
Internally, in IL, the function
is marked with a vararg modifier and, the ArgIterator class is used to display
the values in a loop.