Arrays - C# to IL

-12-

Arrays

An array is a contiguous block of memory that stores values of the same type. These values are an indexed collection. The runtime has built in support to handle arrays. Vector is another name for an array that has only one dimension and the index count starts at zero. An array type can be any type derived from System.Object. This includes everything under the sun, excluding pointers, which are not allowed in this version of the CLR. Nobody knows about the next version. An array is a subtype of System.Array and we are given plenty of leeway in working with arrays. The newarr instruction is used only for single dimensional arrays.

a.cs

class zzz

{

public static void Main()

{

int[] a;

a= new int[3];

a[1]= 10;

System.Console.WriteLine(a[1]);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[] V_0)

ldc.i4.3

newarr [mscorlib]System.Int32

stloc.0

ldloc.0

ldc.i4.1

ldc.i4.s 10

stelem.i4

ldloc.0

ldc.i4.1

ldelem.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

IL recognises the array data type. Thus, in the locals directive, we see an array of int32 called V_0. This is similar to the process of creating an array in C# where we first specify that we want an array variable. Then, to create the actual array, the size of the array is mentioned. In IL, the size is placed on the stack. IL uses newarr, similar to newobj to create the array in memory. However, in C#, new is used for an array as well as for a reference type. The data type of the array to be created is also passed to the newarr instruction. Like newobj, newarr also places the reference of the array on the stack. Thereafter, V_0 is initialized with this reference, which is pushed on the stack using ldloc.0.

We will now explain the IL code generated for the statement a[1] = 10. To do so, the array index, in this case, the value 1 followed by the value of the array is to be initialized i.e. 10 is pushed on the stack. So, there are 3 items on our stack: At the bottom, the array reference, then the array index and finally the new value of the array variables.

These parameters are required by the instruction stelem.i4 to initialize an array member. To read the value of an array variable, the address of the array reference is loaded on the stack, followed by the index of the array. The instruction ldelem.i4 does the reverse. It retrieves the value of an array variable. As mentioned earlier, i4 stands for 4 bytes on the stack. Most instructions have such a data type at the end of their instruction.

a.cs

class zzz

{

public static void Main(string[] a)

{

System.Console.WriteLine(a.Length);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay(class System.String[] a) il managed

{

.entrypoint

ldarg.0

ldlen

conv.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

> a one two

Output

The array class has a member called Length.This Length member in C# gets converted to an IL instruction ldlen, that requires an array object on the stack and returns the length. Array handling is very powerful in .NET because IL has an intrinsic ability to understand arrays. In IL, the array has been made a first class member.

a.cs

class zzz

{

public static void Main()

{

int[] a;

a= new int[2];

a[0]= 12; a[1]= 10;

foreach (int i in a)

System.Console.WriteLine(i);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[] V_0,int32 V_1,int32[] V_2,int32 V_3,int32 V_4)

ldc.i4.2

newarr [mscorlib]System.Int32

stloc.0

ldloc.0

ldc.i4.0

ldc.i4.s 12

stelem.i4

ldloc.0

ldc.i4.1

ldc.i4.s 10

stelem.i4

ldloc.0

stloc.2

ldloc.2

ldlen

conv.i4

stloc.3

ldc.i4.0

stloc.s V_4

br.s IL_002d

IL_001c: ldloc.2

ldloc.s V_4

ldelem.i4

stloc.1

ldloc.1

call void [mscorlib]System.Console::WriteLine(int32)

ldloc.s V_4

ldc.i4.1

add

stloc.s V_4

IL_002d: ldloc.s V_4

ldloc.3

blt.s IL_001c

ret

}

Output

Here we have a small C# program that has been transformed to a large IL program. To begin, we have created 5 locals instead on 1. Two of them, V_0 and V_2, are arrays and the rest are mere ints. The two stelem.i4 instructions initialize the 2 array members as seen in the above programs.

Now let us understand how IL deals with a foreach statement. Ldloc.0 stores the reference of the array on the stack. The instruction stloc.2 makes local V_2 as the same array reference as V_0. Then the array reference V_2, which is similar to V_0, is loaded on the stack. Finally using instruction ldlen, the length of the array is determined.

The number 2 is present on the stack. This represents the length of the array. It is changed to occupy 4 bytes on the stack and is stored in local V_3, using the instruction stloc.3. The number 0 is then placed on the stack using the ldc instruction. stloc pops this value 0 into local V_4 and br branches to label IL_002d where the value of variable V_4, 0, is loaded. Also the value of local V_3, that stores the length of the array, i.e. 2 is loaded on the stack.

Since 1 is less than 2, the code at label IL_001c is executed. This loads the array reference on the stack, then loads local V_4, which is the index. Finally, ldelem fetches the value of member a[0].

Adding 1 to the member V_4 serves a dual purpose: One to index the array for ldelema.i4 and the other to stop the loop whenever we cross the length of the array stored in local V_3. This is how a for each statement is converted, step by step, into IL code.

a.cs

public class zzz

{

public static void Main()

{

zzz z = new zzz();

z.abc("hi","bye");

}

void abc(params string [] b)

{

System.Console.WriteLine(b[0]);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (class zzz V_0,class System.String[] V_1)

newobj instance void zzz::.ctor()

stloc.0

ldloc.0

ldc.i4.2

newarr [mscorlib]System.String

stloc.1

ldloc.1

ldc.i4.0

ldstr "hi"

stelem.ref

ldloc.1

ldc.i4.1

ldstr "bye"

stelem.ref

ldloc.1

call instance void zzz::abc(class System.String[])

ret

}

.method private hidebysig instance void abc(class System.String[] b) il managed

{

.param [1]

.custom instance void [mscorlib]System.ParamArrayAttribute::.ctor() = ( 01 00 00 00 )

ldarg.1

ldc.i4.0

ldelem.ref

call void [mscorlib]System.Console::WriteLine(class System.String)

ret

}

Output

A function with params parameter accepts a variable number of parameters. How does the compiler handle it?

As usual, we see object V_0, that is an instance of class zzz. Alongwith it is an array of strings V_1, which we have not created. The number 2 is then placed on the stack and following it is an array of size 2. As the two parameters i.e. the strings "hi" and "bye", are to be placed on the stack, IL first creates an array of size of 2. This array address is pushed onto the stack.

Using ldc.i4.0, index 0 is pushed on the stack, followed by a string "hi". Thereafter instruction stelem is suffixed with the type. Here, ref stands for the object itself. Thus, the temp array V_1's first or the zeroth member gets a value "hi" and the same process is repeated for the second array member. Thus, for a params parameter, all the parameters are converted into one huge array and the function abc is called with this array on the stack. The final effect is similar to placing all the individual parameters in one big array.

In the function abc, the first change is that the function accepts an array with the same name as in C#. This param directive uses the metadata to store an initial value for the array. The array has two members "hi" and "bye". It is this data that the array b's members must be initialized to. The .params with number 1 stands for the first parameter in the function prototype. Here 0 stands for the return value and 1 stands for the first parameter, that is our array.

We will explore the custom directive in detail later. The rest of the IL code loads the second member of the array on the stack using ldelem.ref. This is similar in concept to stelem.ref. Thus, the compiler does a lot of hard work for implementing the params modifier. To sum up, it converts all the individual parameters into one array, and this array is placed on the stack. IL does not fully understand the params modifier. Thus the params modifier has to be the last entry in the parameter list. The ref prefix is used to denote a reference element.

a.cs

class zzz

{

public static void Main()

{

zzz a = new zzz();

a.abc();

}

unsafe public void pqr( int *b)

{

System.Console.WriteLine(b[1]);

b[1] = 16;

}

unsafe public void abc()

{

int [] a = new int[2];

a[0] = 10; a[1] = 2;

fixed ( int *i = a) pqr(i);

System.Console.WriteLine(a[1]);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (class zzz V_0)

newobj instance void zzz::.ctor()

stloc.0

ldloc.0

call instance void zzz::abc()

ret

}

.method public hidebysig instance void abc() il managed

{

.locals (int32[] V_0,int32& pinned V_1)

ldc.i4.2

newarr [mscorlib]System.Int32

stloc.0

ldloc.0

ldc.i4.0

ldc.i4.s 10

stelem.i4

ldloc.0

ldc.i4.1

ldc.i4.2

stelem.i4

ldloc.0

ldc.i4.0

ldelema [mscorlib]System.Int32

stloc.1

ldarg.0

ldloc.1

conv.i

call instance void zzz::pqr(int32*)

ldc.i4.0

conv.u

stloc.1

ldloc.0

ldc.i4.1

ldelem.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

.method public hidebysig instance void pqr(int32* b) il managed

{

ldarg.1

ldc.i4.4

ldc.i4.1

mul

add

ldind.i4

call void [mscorlib]System.Console::WriteLine(int32)

ldarg.1

ldc.i4.4

ldc.i4.1

mul

add

ldc.i4.s 16

stind.i4

ret

}

Output

Here, we will explain certain features of pointer handling in C# and IL. In the C# program we have created an array of size 2 in the function abc and the array members are initialised. The keyword fixed fixes the array reference in memory. For the purpose of efficiency, the garbage collector can move things around in memory. By fixing the reference in memory, we can prevent the Garbage Collector from moving this reference in memory.

This array reference is stored in a pointer to an int and the function pqr is called. This function displays the value of the first member of the array and then changes it. The change is reflected in the original array also. In the locals, we define our int array as usual, but we have another variable V_1, that is also a pointer, but with a & and not a *. This pointer is also pinned, which means that IL will not move it around. If it is moved in memory, then we cannot keep track of its memory location. Thus, a fixed becomes a pinned location.

Using ldelema, the array and its index are pushed on the stack. V_1 is initialized to this value and function pqr is called. In the function pqr, a [] is converted into a memory location. Thus, the address of the array is loaded on the stack. Then, the numbers 4 and 1 are placed on the stack because an int size is 4 and the array index is 1. After multiplying them, 4 is added to the product to get the offset. The array members are then displayed. The same logic on arrays can be applied to change its value. Whether a[1] or *(a+1) is used, the above program remains the same.

a.cs

public class zzz

{

public static void Main()

{

string [] s = new string[3];

object [] t = s;

t[0] = null;

t[1] = "hi";

t[2] = new yyy();

}

class yyy

{

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (class System.String[] V_0,class System.Object[] V_1)

ldc.i4.3

newarr [mscorlib]System.String

stloc.0

ldloc.0

stloc.1

ldloc.1

ldc.i4.0

ldnull

stelem.ref

ldloc.1

ldc.i4.1

ldstr "hi"

stelem.ref

ldloc.1

ldc.i4.2

newobj instance void yyy::.ctor()

stelem.ref

ret

}

.class private auto ansi yyy extends [mscorlib]System.Object

{

}

Output

Exception occurred: System.ArrayTypeMismatchException: An exception of type System.ArrayTypeMismatchException was thrown.

at zzz.vijay()

The array s is an array of three strings. We have declared an array of objects but initialised it to an array of strings, which is perfectly legal in C#. We then initialised the members of t to a null, a string and a yyy object respectively. The runtime knows that even though t is an array of objects, it was initialized to an array of strings. Its members can only be strings or a NULL.

The IL code is very straightforward. It uses newarr to create an array of strings. Then it uses stloc.1 to initialize V_1 or array t. Thereafter, stelem.ref is used to initialize the individual array members. However, the last stelem.ref checks the data type of the runtime error and flags it as an exception. The code used for throwing the exception is not present in the array class at all. It is in stelem.ref and we are not privy to this code.

a.cs

public class zzz

{

public static void Main()

{

string [] s = new string[3];

object [] t = s;

t[0] = (string)new yyy();

System.Console.WriteLine(t[0]);

t[1] = new yyy();

System.Console.WriteLine(t[1]);

}

class yyy

{

public static implicit operator string ( yyy a)

{

return "hi";

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (class System.String[] V_0,class System.Object[] V_1)

ldc.i4.3

newarr [mscorlib]System.String

stloc.0

ldloc.0

stloc.1

ldloc.1

ldc.i4.0

newobj instance void yyy::.ctor()

call class System.String yyy::op_Implicit(class yyy)

stelem.ref

ldloc.1

ldc.i4.0

ldelem.ref

call void [mscorlib]System.Console::WriteLine(class System.Object)

ldloc.1

ldc.i4.1

newobj instance void yyy::.ctor()

stelem.ref

ldloc.1

ldc.i4.1

ldelem.ref

call void [mscorlib]System.Console::WriteLine(class System.Object)

ret

}

.class private auto ansi yyy extends [mscorlib]System.Object

{

.method public hidebysig specialname static class System.String op_Implicit(class yyy a) il managed

{

.locals (class System.String V_0)

ldstr "hi"

stloc.0

ldloc.0

ret

}

Output

Exception occurred: System.ArrayTypeMismatchException: An exception of type System.ArrayTypeMismatchException was thrown.

at zzz.vijay()

Had the compiler been a little more concerned about exceptions, it would have prevented the above program from throwing one at runtime, by spotting the error at compile time itself. We have the same situation as before. The array t is an array of objects, but initialized to an array of strings. The member t[0] is initialized to a yyy object, but now with a cast. This cast calls the string operator or op_Implicit functions, that returns a string.

As the cast is not stated explicitly in the second case, the function op_Implicit does not convert the yyy object into a String. The compiler should have noticed it at run time and thrown an exception. But it ignores this completely. Sometimes compilers do not behave as intelligently as expected.

a.cs

class zzz

{

static void F(params object[] b)

{

object o = b[0];

System.Console.WriteLine(o.GetType().FullName );

System.Console.WriteLine(b.Length);

}

static void Main()

{

object[] a = {1, "Hello", 123};

object o = a;

F(a);

F((object)a);

F(o);

F((object[])o);

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method private hidebysig static void F(class System.Object[] b) il managed

{

.param [1]

.custom instance void [mscorlib]System.ParamArrayAttribute::.ctor() = ( 01 00 00 00 )

.locals (class System.Object V_0)

ldarg.0

ldc.i4.0

ldelem.ref

stloc.0

ldloc.0

call instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()

callvirt instance class System.String [mscorlib]System.Type::get_FullName()

call void [mscorlib]System.Console::WriteLine(class System.String)

ldarg.0

ldlen

conv.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

.method private hidebysig static void vijay() il managed

{

.entrypoint

.locals (class System.Object[] V_0,class System.Object V_1,class System.Object[] V_2,int32 V_3)

ldc.i4.3

newarr [mscorlib]System.Object

stloc.2

ldloc.2

ldc.i4.0

ldc.i4.1

stloc.3

ldloca.s V_3

box [mscorlib]System.Int32

stelem.ref

ldloc.2

ldc.i4.1

ldstr "Hello"

stelem.ref

ldloc.2

ldc.i4.2

ldc.i4.s 123

stloc.3

ldloca.s V_3

box [mscorlib]System.Int32

stelem.ref

ldloc.2

stloc.0

ldloc.0

stloc.1

ldloc.0

call void zzz::F(class System.Object[])

ldc.i4.1

newarr [mscorlib]System.Object

stloc.2

ldloc.2

ldc.i4.0

ldloc.0

stelem.ref

ldloc.2

call void zzz::F(class System.Object[])

ldc.i4.1

newarr [mscorlib]System.Object

stloc.2

ldloc.2

ldc.i4.0

ldloc.1

stelem.ref

ldloc.2

call void zzz::F(class System.Object[])

ldloc.1

castclass class System.Object[]

call void zzz::F(class System.Object[])

ret

}

Output

System.Int32

System.Object[]

System.Int32

This is quite a huge program. The explanation is slightly complicated but, without understanding IL code, it is next to impossible to understand the nitty-gritty of C#.

Lets us tread one step at a time. This example demonstrates some basic concepts of C# programming. We first create an array of objects called a, of size 3 and initialize them to two numbers and one string. Remember that everything in the .NET world is an object. Then we have another object o that is initialized to a. We do not get an error, but you need to bear in mind that a is an array and o is an object, that now stirs a reference to an array.

We call the function F four times:

• first with the object a, which is an array.

• then with the same object cast to an object.

• then with the object o.

• finally with the object a cast to an array of objects.

The function F accepts the parameter in an array of objects called b. The first member b[0] is stored in an object called o. The fullname of this object and the length of the array are printed using the WriteLine function.

In the first case, an array of 3 ints is placed on the stack. The name is System.Int32 and the size of the array is 3.

In the second case, as the array is casted into an Object, only the first member becomes a System.Object.

The third case has an object placed on the stack which is read in an array of objects. The size is displayed as 1 since the size of the original is 1.

In the last case, C# remembers that o was equated to an array of 3 ints and thus the new array size is 3.

Up to the stelem.ref statement,the 3 array members are merely being initialized to the value of 1, Hello and 123. The local V_0 is array a and local V_1 refers to object o. As it is an array of objects, the string does not pose any problems, but since the numbers are value types, they have to be first converted to a reference type using the box instruction.

The first call simply places the array stored in local V_0 on the stack. The second call places 1 on the stack and then creates a new array of size 1 using newarr. It stores this new array in local V_2 and then loads the value of local V_2, which is an object, on the stack. Then, it loads a 0 and the first main array containing 3 members, on the stack. stelem.ref is used to initialize V_2 to this value. This local is then placed on the stack. See what a simple cast does.

Similarly, in the third case we create an array of size 1, store it in local V_0 and then place it on the stack. Then, we place 0 and the local V_1 on the stack and initialize V_1 to it for the function. The last call simply places the object V_1 on the stack and calls castclass. Function F is straightforward while performing its job. Ask yourself whether it was the C# code that enabled you to grasp the program or was it the IL code?

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed {

.entrypoint

.locals (int32[] V_0)

ldc.i4.3

newarr [mscorlib]System.Int32

stloc.0

ldloc.0

ldc.i4.6

ldc.i4.s 10

stelem.i4

ret

}

Output

Exception occurred: System.IndexOutOfRangeException: An exception of type System.IndexOutOfRangeException was thrown.

at zzz.vijay()

Our array above has only 3 members, whereas we tried to store a value in the seventh member. Whenever we exceed the bounds of an array, we will get a IndexOutOfRangeException at runtime. Thus, be careful in dealing with arrays. Do not cross the picket line. We store values in an array and index them, so that we can retrieve a single item by position.

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[] V_0)

ldc.i4.3

newarr [mscorlib]System.Int32

stloc.0

ldloc.0

ldc.i4.1

ldelema System.Int32

ldc.i4.2

stobj System.Int32

ldloc.0

ldc.i4.1

ldelema System.Int32

ldobj System.Int32

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

We have different instructions for dealing with value types and arrays. Arrays are nothing but a number of variables stored together in memory. The ldelema takes two parameters on the stack. The first is the address of the array that is V_0 and the second is the index of the variable whose memory location is desired.

After running the instruction we have on the stack, the address of a variable at a specified array index. The instruction ldelema requires the data type of the array, because the offset of the members of the array is decided by the data type. The instruction stobj stores the value in the memory location thereby initializing the first member of the array to 10.

To display the first member, the address is placed on the stack and ldobj is used to retrieve the value. The instructions ldobj and stobj have nothing to do with arrays. They deal with reading a memory location and placing the value found on the stack and vice versa. Thus they only work with value type arrays.

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

ldnull

ldc.i4.1

ldc.i4.s 10

stelem.i4

ret

}

Output

Exception occurred: System.NullReferenceException: Attempted to dereference a null object reference.

at zzz.vijay()

Since we placed a null array reference on the stack, we get an NullReferenceException error. We are basically simulating some of the exceptions that arrays can throw at us.

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

ldc.i4.3

newarr [mscorlib]System.Int32

call instance int32 [mscorlib]System.Array::get_Length()

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

Like we used the ldlen instruction earlier, we could have instead used the get_Length function, which in turn, is a Property of the Array class. The choice is yours, but as we demonstrated earlier, the Length property is converted to the ldlen instruction by the C# compiler, as it is far more efficient. At the end of the day, the get_Length function does the same thing. IL does not have instructions that can handle arrays other than vectors. Thus, multi-dimensional arrays, also called general arrays, are created using array functions.

a.cs

class zzz

{

public static void Main()

{

int [,] a = new int[1,2];

a[0,0] = 10;

a[0,1] = 20;

System.Console.WriteLine(a[0,1]);

}

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.field private int32 x

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[0...,0...] V_0)

ldc.i4.1

ldc.i4.2

newobj instance void int32[0...,0...]::.ctor(int32,int32)

stloc.0

ldloc.0

ldc.i4.0

ldc.i4.s 10

call instance void int32[0...,0...]::Set(int32,int32,int32)

ldloc.0

ldc.i4.0

ldc.i4.1

ldc.i4.s 20

call instance void int32[0...,0...]::Set(int32,int32,int32)

ldloc.0

ldc.i4.0

ldc.i4.1

call instance int32 int32[0...,0...]::Get(int32,int32)

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

One area where C# excels in is array handling. This is only because IL understands arrays internally. Lets us now find out how IL handles two dimensional arrays.

A two dimensional array is declared in the same way that a normal array is declared, and the dimensions are stated in the new instruction. The array index starts from 0 and not from 1. In IL, to create a two dimensional array, there is a special syntax, i.e. a 0 followed by 3 dots, twice in the locals directive. The two array dimensions are placed on the stack and newobj is called. It is not newarr. Newobj calls the constructor of the two dimensional array class that takes two parameters. The return value is then stored in local V_0.

To fetch a value from a two dimensional array, the reference to the array is loaded on the stack and stored in V_0, followed by the two indexes, using ldc. Thereafter the values are placed on the stack to initialize the array member. The function Set of the same int array class is called with four parameters on the stack.

Conversely, to fetch a value, the function Get is called with the 3 parameters on the stack, the array reference and the 2 index values. Thus, multi-dimensional arrays are built using array class functions, and not IL instructions, which are used to build single dimensional arrays. The rank of an array is defined as the number of dimensions of the array. The runtime expects at least a rank of 1.

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.field private int32 x

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[1...3,5...] V_0)

ldc.i4.2

ldc.i4.6

newobj instance void int32[1...3,5...]::.ctor(int32,int32)

pop

ret

}

A general purpose array has an upper bound and a lower bound. Unfortunately, as of now, the runtime does not do any bound checking. The first dimension has a lower bound of 1 and an upper bound of 3. You can choose the bounds you desire.

a.cs

class zzz

{

public static void Main()

{

int [,,] a;

a = new int[2,3,4];

a[1,2,3] = 10;

System.Console.WriteLine(a[1,2,3]);

}

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[0...,0...,0...] V_0)

ldc.i4.2

ldc.i4.3

ldc.i4.4

newobj instance void int32[0...,0...,0...]::.ctor(int32,int32,int32)

stloc.0

ldloc.0

ldc.i4.1

ldc.i4.2

ldc.i4.3

ldc.i4.s 10

call instance void int32[0...,0...,0...]::Set(int32,int32,int32,int32)

ldloc.0

ldc.i4.1

ldc.i4.2

ldc.i4.3

call instance int32 int32[0...,0...,0...]::Get(int32,int32,int32)

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

An array can have any rank. The above array is a three dimensional one and has a rank of 3. So, we have to use the array handling functions to work with them. The rank of an array is declared by using a comma between the square brackets. The number of commas plus one is the rank of an array. If no specific bounds are supplied, the default is 0 for the lower bound and infinity for the upper bound.

You can specify none, one or both bounds. The CLR, in this version, ignores all the bounds information you provide, and only pays heed to the number placed on the stack at the time of creation of the array. Here, you have to supply all the information. Only those arrays that have a 0 bound in all their dimensions, are CLR compliant.

In the above example, three bound values are placed on the stack and the array constructor is called with three values. We are not allowed to use newarr, as the above array is not a vector. Now to set it to a value, the three index values are placed on the stack in a specific order. The same Set Function is called, but this time with four parameters. The same rules are relevant for the Get function also. The point that we want to make is that the magnitude of the rank has no effect on the way the array is handled. No substantial changes are required.

There are two array constructors that can be used. The first takes the same number of parameters as the rank of the array. The second constructor takes up twice the number of parameters as the rank of the array. In the second type of constructor, the first two parameters specify the lower and upper bounds of the first dimension, and the next two parameters specify the upper and lower bounds for the second dimension and so on. The first constructor always assumes the lower bound to be zero.

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[5...10,3...7] V_0)

ldc.i4.5

ldc.i4.6

ldc.i4.3

ldc.i4.5

newobj instance void int32[5...10,3...7]::.ctor(int32, int32,int32,int32)

stloc.0

ldloc.0

ldc.i4.6

ldc.i4.5

ldc.i4.s 10

call instance void int32[0...,0...]::Set(int32,int32,int32)

ldloc.0

ldc.i4.6

ldc.i4.5

call instance int32 int32[0...,0...]::Get(int32,int32)

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

ldc.i4.6

ldc.i4.5

We then change the above two lines to

ldc.i4.1

ldc.i4.2

and we see the following exception thrown at us.

Exception occurred: System.IndexOutOfRangeException: An exception of type System.IndexOutOfRangeException was thrown.

at zzz.vijay()

An array with a lower and upper bound, having a rank of 2 is placed on the stack. The first dimension starts at 5 and ends at 10. Thus, on the stack is placed first the lower bound i.e. 5, and then, the length of the array. There is no upper bounds. As the array starts at 5 and ends at 10, the length is calculated as follows: 10 - 5 + 1 = 6 (i.e. the upper bound - lower bound + 1). The same rule holds true for the next rank.

The rest of the code remains the same. When the array member 6, 5 are changed to index values of 1, 2, an exception is thrown. This is because the array bounds for the first dimension are 5 to 10 and for the second dimension are 3 to 7. Any attempt to cross the array bounds in any direction generates an exception.

Array of Arrays

a.cs

class zzz

{

public static void Main()

{

int [][] a = new int[2][];

a[0] = new int[1];

a[1] = new int[10];

System.Console.WriteLine(a.Length) ;

System.Console.WriteLine(a[0].Length) ;

}

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.field private int32 x

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[][] V_0)

ldc.i4.2

newarr int32[]

stloc.0

ldloc.0

ldc.i4.0

ldc.i4.1

newarr [mscorlib]System.Int32

stelem.ref

ldloc.0

ldc.i4.1

ldc.i4.s 10

newarr [mscorlib]System.Int32

stelem.ref

ldloc.0

ldlen

conv.i4

call void [mscorlib]System.Console::WriteLine(int32)

ldloc.0

ldc.i4.0

ldelem.ref

ldlen

conv.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

Let us explore jagged arrays where an array member can contain another array of a different length. We are creating an array that has an irregular shape. In C#, the syntax to create an array of arrays is the same. It consists of two square brackets [][]. We first create the array using only the first dimension. This is done by using newarr and stating an array data type as a parameter. We then initialize V_0 with this array reference.

Now, since we have to create two separate one dimensional arrays, we first place the array reference on the stack. Then we place the index of the array member we want to initialise followed by the size of the new array. Finally, we call newarr to create an array of ints and place the reference on the stack. stelem.ref is used to initialize the array member with this array reference. The same is repeated for the second member a[1] also.

The function ldlen returns the length of the array. For the main array, using ldloc.0 its reference is placed on the stack. For the second length, ldelem.ref is used to first fetch the reference of the array out of the first array member a[0], and then ldlen is used to obtain the length.

a.cs

class zzz

{

public static void Main()

{

int[][] a = new int[2][] { new int[] {2,3}, new int[] {5,6,7} };

System.Console.WriteLine(a[0][1]) ;

System.Console.WriteLine(a[1][2]) ;

}

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.field private int32 x

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[][] V_0,int32[][] V_1,int32[] V_2)

ldc.i4.2

newarr int32[]

stloc.1

ldloc.1

ldc.i4.0

ldc.i4.2

newarr [mscorlib]System.Int32

stloc.2

ldloc.2

ldc.i4.0

ldc.i4.2

stelem.i4

ldloc.2

ldc.i4.1

ldc.i4.3

stelem.i4

ldloc.2

stelem.ref

ldloc.1

ldc.i4.1

ldc.i4.3

newarr [mscorlib]System.Int32

stloc.2

ldloc.2

ldc.i4.0

ldc.i4.5

stelem.i4

ldloc.2

ldc.i4.1

ldc.i4.6

stelem.i4

ldloc.2

ldc.i4.2

ldc.i4.7

stelem.i4

ldloc.2

stelem.ref

ldloc.1

stloc.0

ldloc.0

ldc.i4.0

ldelem.ref

ldc.i4.1

ldelem.i4

call void [mscorlib]System.Console::WriteLine(int32)

ldloc.0

ldc.i4.1

ldelem.ref

ldc.i4.2

ldelem.i4

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

The above example is similar to its predecessor, though it is more elaborate and complete. A jagged array is created that is made of two arrays of sizes 2 and 3 respectively. They can be initialized in one stroke. IL does it the hard way. To fetch the value of a[1][2], it places the reference of the array on the stack. Then it places 1, the first array index, on the stack. Thereafter, ldelem.ref is used to obtain an array reference.

Thus, at first an array reference is pushed on the stack. Then 2 is placed on the stack, and ldelema.i4 is used to get the second member of this new array. A jagged array is treated as an array whose members contain other independent arrays.

An array of arrays is different from a multi dimensional array. A multi dimensional array forms one memory block whereas, an array of arrays holds references to other arrays in memory. Thus, an array of arrays is slower in execution since it needs to make an extra indirection to reach the final element.

We can also use pointers with arrays. The salient feature of an array of arrays is that, the first array merely stores the addresses of other arrays. The disadvantage of a multi dimensional array is the fact that, all the dimensions have to be of the same size.

a.il

.assembly mukhi {}

.class public auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

.locals (int32[][][] a)

ldc.i4.5

newobj instance void int32[][][]::.ctor(int32)

stloc a

ldloc a

ldc.i4.0

ldc.i4.3

newobj instance void int32[][]::.ctor(int32)

call instance void int32[][][]::Set(int32, int32[][])

ldloc a

ldc.i4.0

call instance int32[][] int32[][][]::Get(int32)

ldc.i4.1

ldc.i4 10

newobj instance void int32[]::.ctor(int32)

call instance void int32[][]::Set(int32, int32[])

ldloc a

ldc.i4.0

call instance int32[][] int32[][][]::Get(int32)

ldc.i4.1

call instance int32[] int32[][]::Get(int32)

ldc.i4.5

ldc.i4 100

call instance void int32[]::Set(int32, int32)

ldloc a

ldc.i4.0

call instance int32[][] int32[][][]::Get(int32)

ldc.i4.1

call instance int32[] int32[][]::Get(int32)

ldc.i4.5

call instance int32 int32[]::Get(int32)

call void [mscorlib]System.Console::WriteLine(int32)

ret

}

Output

100

Here, we shall see how to create an array a of type [][][]. We first create a local a of type array of array of array. Thus, we have two levels of indirection. We want the first or main array to have a size of 5 i.e. it should be able to store the references of 5 arrays in memory. The instruction ldc places the size 5 of this array on the stack. Thereafter newobj is used to create the first dimension of this array. The instruction stloc a initializes this array and ldloc a put its reference on the stack.

Subsequently two values are placed on the stack. One is the index of the first member a[0] and the other is the size of the array that this member should point to i.e. 3. newobj creates an array called int32[][]. To store it in a[0] the Set function is used. This function requires the index of the array as the first parameter. Hence, 0 is placed on the stack, even though newobj does not require it. It simplifies the call of the Set function.

The next thing required is an int32[] to store in our int32[][]. So, the array a is placed again on the stack and 0 is used to obtain the value of the array that has just been created. The Get functions does the job of retrieving values. Then, as before, 1 is placed on the stack followed by the size of the new array i.e.10. Finally, newobj creates a simple array int32[] and places it on the stack which is then stored using the Set function.

Remember that the value 1 has already been placed on the stack. To execute the operation a[0][1][5] = 100 the member a[0] is requred. So, the array reference a is placed on the stack followed by 0 and the Get function is called.

To access a[0][1], as the first member of array a[0] is already on the stack, all that is requred is placing 1 on the stack and calling Get again. Now, to store the value in the member a[0][1][5], 5 is loaded on the stack. To fetch the values of member a[0][1][5], the same procedure as before is followed. That is

• load the array reference on the stack.

• obtain the member 0 by using get.

• obtain the member 1 of this array

• finally the member 5 on this array.

The logic is the same as described earlier.

a.cs

using System;

public class zzz

{

public static void abc(int i, __arglist)

{

ArgIterator a = new ArgIterator(__arglist);

while (a.GetRemainingCount() > 0)

Console.WriteLine(__refvalue(a.GetNextArg(), int));

}

public static void Main()

{

abc(20, __arglist(1, 2, 3));

}

a.il

.assembly mukhi {}

.class private auto ansi zzz extends [mscorlib]System.Object

{

.method public hidebysig static void vijay() il managed

{

.entrypoint

ldc.i4.s 20

ldc.i4.1

ldc.i4.2

ldc.i4.3

call vararg void zzz::abc(int32,...,int32,int32,int32)

ret

}

.method public hidebysig static vararg void abc(int32 i) il managed

{

.locals (value class [mscorlib]System.ArgIterator V_0)

ldloca.s V_0

arglist

call instance void [mscorlib]System.ArgIterator::.ctor(value class [mscorlib]System.RuntimeArgumentHandle)

br.s IL_001d

IL_000b: ldloca.s V_0

call instance typedref [mscorlib]System.ArgIterator::GetNextArg()

refanyval [mscorlib]System.Int32

ldind.i4

call void [mscorlib]System.Console::WriteLine(int32)

IL_001d: ldloca.s V_0

call instance int32 [mscorlib]System.ArgIterator::GetRemainingCount()

ldc.i4.0

bgt.s IL_000b

ret

}

Output

This example builds upon the earlier example, which has a function that accepts a variable number of arguments. In C#, __arglist enables us to implement a function that accepts a variable number of arguments.

Internally, in IL, the function is marked with a vararg modifier and, the ArgIterator class is used to display the values in a loop.