MD5 (Message Digest Algorithm )

 

main()
{
int a,b;
b = 0x30000003;
a = (b << 3) | (b >> 29);
printf("%x\n",a);
printf("%x\n",b<<3);
printf("%x\n",b>>29);
}
 
80000019
80000018
1

When we right shift a byte by 3, the three leftmost bits get thrown off the byte. The same occurs when we left shift a byte, the top bits disappear into thin air. What we would like to do for various reasons is to rotate bits in a byte but the bits that lie on the edge do not disappear into thin air but actually rotate back into the byte.

In our example the 3 lets us have the first 2 bits on. When we shift them to the left, we should get an answer of 18 and not 19 that we see.  This is because bits 1 and 2 now become bits 4 and 5. If we simply left shift b by three we get 80000018. Lets move on to the second part.

By shifting the bits to the left will throw off the top three bits so we now also right shift all the bits to the right by 29, which throws off all but the last bits which become the first three. From the end we have two of the last four bits on or in other word the last three and four bits are on. By shifting 29 to right the third  last bit becomes 1 and the fourth last bit disappears.

When we bit wise or the two, the effect is off a rotation. What we are planning to do in the first few programs is to break up the MD5 program into manageable smaller programs. The next program explains how.

#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
main()
{
int a,b;
b = 0x30000003;
a = ROTATE_LEFT (b,3);
printf("%x\n",a);
}
 
80000019

The MD5 code written by Ron Rivest who created for RSA labs, he is the R of RSA wrote the MD5 algorithm. RFC 1321 describes this algorithm and also gives us a reference implementation. What we plan to do is explain is the reference code written by the man himself.

Obviously we can not go wrong if we understand the code written by the master himself. Nobody can do a better job on the MD5 algorithm than the person who designed the algorithm. What we will do is break up his code into manageable chunks.

#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
main()
{
int a,b;
b = 0x30000003;
a = ROTATE_LEFT (b,3);
printf("%x\n",a);
}
 

The reference implementation uses a macro ROTATE_LEFT that rotates a value a certain number of times. When we write macros, we use round brackets a lot even when not necessary.

To sum up, the macro ROTATE_LEFT  first shifts the bytes say three to the left. This results in the first three bits being zeroes and the last three bits being thrown off. Then we rotate the bits by 29 resulting in the last three bits becoming the first three bits. We now bit wise or the above two to get a actual rotation and not a shift.

#include <math.h>
char * printbc(char i)
{
unsigned int j,l,m,n;
unsigned char *a,*b;
n = 0;
a = (char *)malloc(9);
b = (char *)malloc(10);
a[8] = 0;
b[9] = 0;
for( j = 0 ; j <= 7 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[7-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 7; i++,j++)
{
b[i] = a[j];
if ( j == 3)
{
b[i+1] = 32;
i++;
}
}
return b;
}
char * printbs(short i)
{
unsigned int j,l,m;
unsigned char *a,*b;
a = (char *)malloc(17);
b = (char *)malloc(20);
a[16] = 0;
b[19] = 0;
for( j = 0 ; j <= 15 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[15-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 15; i++,j++)
{
b[i] = a[j];
if ( j == 3 || j == 7 || j == 11)
{
b[i+1] = 32;
i++;
}
}
return b;
}
char * printbi(unsigned int i)
{
unsigned int j,l,m;
unsigned char *a,*b;
a = (char *)malloc(33);
b = (char *)malloc(40);
a[32] = 0;
b[39] = 0;
for( j = 0 ; j <= 31 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[31-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 31; i++,j++)
{
b[i] = a[j];
if ( j == 3 || j == 7 || j == 11 || j == 15 || j == 19 || j == 23 || j == 27)
{
b[i+1] = 32;
i++;
}
}
return b;
}
main()
{
unsigned int i = 0xfedc4321;
unsigned char j = 0xfe;
unsigned short k = 0xfedc;
printf("%s %02x\n",printbc(j),j);
printf("%s %04x\n",printbs(k),k);
printf("%s %08x\n",printbi(i),i);
}
 
1111 1110 fe
1111 1110 1101 1100 fedc
1111 1110 1101 1100 0100 0011 0010 0001 fedc4321

The function printf has a zillion options and we bet no one can remember all its myriad options. The printf function allows you to display different types of data but it does not display data the way we want, and that is in binary. It displays the values of variables in hex, but there is no way we can see the actual bit pattern. We thus had no choice but o write our own series of print functions that will display the bit pattern of a number.

We need three such functions as the numbers can be 8 bit, char or 16 bit short or finally 32 bits or int or long. The code remains nearly the same so lets start with a char or 8 bits using the printbc function. We first allocate 9 bytes for an array of chars to store the entire bit pattern, the last extra byte is for the zero.

After storing the bit pattern we would like to have a space after every bits displayed and thus create another array b which is one larger as we need to introduce only one space. We null terminate each array writing a zero to the last member of the array. We then come across a for loop that iterates 8 times as there are 8 bits in a byte.

Each time in the loop we would like to extract one bit at a time. We thus need to bit wise and the parameter I with 0x1, 0x2, 0x4 etc, i.e. powers of 2 so that only one bit is 1 at one iteration of the loop. We use the pow function that raises the first parameter 2 to the power of the second which is 2. 

The pow function returns a double that we cast to a int l. We then bit wise and I and l and store this value into m. The value of m will be such that it will have all the bits as 0 and may have only one bit as zero or 1. This bit will be in the first position for the first iteration of the loop, second position when j is 1 and third position when j is 2.

We would need to make this bit the first bit n the byte and thus we right shift m that many bytes to the right using the loop variable j. Thus the first time in the loop we need no shifting, the second time as the second bit is 0 or 1, we right shift only once.

We cannot display the bit in the first iteration as this is the first bit and it needs to be displayed last. Thus we store all the bits first into an array a and as m is 0 or 1, we need to add 48 to this to store a ASCII 0 or 1 in the array. Also we need to store this value at the end of the array at byte 7 to be precise.

Thus the array offset is 7 –j so that at the second iteration, the value is stored at index offset 6 as j is 1. Now that we have our string we now need to copy it into the array b, but with a space. We have 8 bits from the array a to be copied, but we need a extra space, so that the number of bits to be copied is 9 but the for loop will yet be executed 8 times.

The value of j goes from 0 to 7 and we use this value as an offset for the array a. The variable I is used to index array b and till the value of j is 3, both I and j have the same values. The minute j touches 3 which means that we have written 4 bits, we add a space to the array b at position I+1 as at position I we have just written a byte.

We then increase I by 1 and from now on I will be larger than j by 1. This is how we add a space to the array b and then w return this array. The printbs function displays the 16 bits of a short. The array b is now 3 larger as we have 3 spaces to display.

The for loop goes on 16 times instead of 8. In the last for loop we add a space when j is 3 or 7 or 11. The printbi function displays the 32 bits of a long. Here we need to add 7 more spaces and thus we make the array b 40 larger. The for loop now iterates 32 times and we then finally add the 7 spaces.

main()

{

int i,j,k;

i = 5;

j = 10;

printf("%d %d\n",i+j, i | j);

i = 3;

j = 6;

printf("%d %d\n",i+j, i | j);

}
 
15 15
9 7
 

One question that always bothers us is when to use bit oring and when to add. In the first printf both give us the same answer, whereas in the second case we get a different answer. Thus the rule is that whenever two numbers do not share any bit positions whose value is 1, it does not matter whether we bit wise or or add.

Thus as 5 and 10 have no 1 in the same position we can use any of the two. In the second case the second is on in both and hence bitwise oring and plus give us different answer. In the above function F, x and not x will never share a 1 in the same position and the bitwise or could be replaced by a plus.

#include <math.h>
main()
{
double f = sin(3);
long i;
printf("sin=%f\n",f);
i = abs(f *4294967296);
printf("i=%08x\n",i);
}
 
sin=0.141120
i=242070db

The above example uses the sin function that takes a value in radians. We multiply this returned value by a constant number 4294967296 and then remove the negative sign if any using the abs function. In the MD5 algorithm we need to do something 64 times. Each time we need to add a number which is calculated using the above formula. The only change is that the parameter to the sin function will vary from 1 to 64.

#include <math.h>
char * printbc(char i)
{
unsigned int j,l,m,n;
unsigned char *a,*b;
n = 0;
a = (char *)malloc(9);
b = (char *)malloc(10);
a[8] = 0;
b[9] = 0;
for( j = 0 ; j <= 7 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[7-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 7; i++,j++)
{
b[i] = a[j];
if ( j == 3)
{
b[i+1] = 32;
i++;
}
}
return b;
}
#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
main()
{
unsigned int a,b,c,f;
a = 0xf0;
b = 0x22;
c = 0x35;
f = F(a,b,c);
printf("%s ",printbc(a));
printf("%s ",printbc(b));
printf("%s\n",printbc(c));
printf("%s\n",printbc(f));
}
 
1111 0000 0010 0010 0011 0101
0010 0101

Most of the above program has been done before, it is the printbc function. In the function main, we have three variables a,b and c. The variable a has a value 0xf0 which makes the first four bits as zero and the last four bits are 1. We then call a macro F which we have stolen from the MD5 code.

All that this macro does is first bitwise and the first two parameters and then bitwise and the not of x and z. Finally it bitwise or’s them together. The net result of this exercise is that the result of this macro is a bitwise if statement. If the first bit of x is 1 or true , then the first bit of y is the answer, if 0 or false then the first bit of z is used.

This process carries on for all 8 bits. As we have chosen the value of a or x as the first 4 bits 0 and the last four bits 1, the output is the first four bits of z and the last four bits of y. In mathematics this is called selection or bit wise anding.

#include <math.h>
char * printbc(char i)
{
unsigned int j,l,m,n;
unsigned char *a,*b;
n = 0;
a = (char *)malloc(9);
b = (char *)malloc(10);
a[8] = 0;
b[9] = 0;
for( j = 0 ; j <= 7 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[7-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 7; i++,j++)
{
b[i] = a[j];
if ( j == 3)
{
b[i+1] = 32;
i++;
}
}
return b;
}
#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
main()
{
unsigned int a,b,c,f;
a = 0xf0;
b = 0xf0;
c = 0x35;
f = G(a,b,c);
printf("%s ",printbc(a));
printf("%s ",printbc(b));
printf("%s\n",printbc(c));
printf("%s\n",printbc(f));
}
 
1111 0000 1111 0000 0011 0101
1111 0000

We now use another macro G which acts as a majority function. The final bits depend upon a simple majority vote. Lets start with the lowest bit. We have two zeroes and a 1. The majority is a 0 and hence the output bit is 0. Lets take the highest bit of each byte. The first two bytes have a value of 1, the third a value of 0, majority wins and the bit value is 1.

0001 0000 0001 0100 0001 0110
0001 0010

The above case uses a macro H to simulate the parity function. The first bit of each byte is 0 and hence the output bit is 0. We then take the second bit of the last byte as 1 and as there is only a single bit as 1, the output is 1 and this is called odd parity. The third bit has two ones in the second and third byte and thus the third bit is 0. Finally all the bytes have the fifth bit as 1 and hence the fifth bit is 1.

#include <math.h>
char * printbc(char i)
{
unsigned int j,l,m,n;
unsigned char *a,*b;
n = 0;
a = (char *)malloc(9);
b = (char *)malloc(10);
a[8] = 0;
b[9] = 0;
for( j = 0 ; j <= 7 ; j++)
{
double k = pow(2,j);
l = k;
m = i & l;
m = m >> j;
a[7-j] = m + 48;
}
for(i = 0,j = 0 ; j <= 7; i++,j++)
{
b[i] = a[j];
if ( j == 3)
{
b[i+1] = 32;
i++;
}
}
return b;
}
#define I(x, y, z) ((y) ^ ((x) | (~z))) 
main()
{
unsigned int a,b,c,f;
a = 0x02;
b = 0x12;
c = 0x16;
f = I(a,b,c);
printf("%s ",printbc(a));
printf("%s ",printbc(b));
printf("%s\n",printbc(c));
printf("%s\n",printbc(f));
}
 
0000 0010 0001 0010 0001 0110
1111 1001
 
0 0 0   1
1 1 1   0 
0 0 1   0
0 1 1   1
 
Explanation later
 
#include <stdio.h>
#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
#define H(x, y, z) ((x) ^ (y) ^ (z))
#define I(x, y, z) ((y) ^ ((x) | (~z))) 
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define FF(a, b, c, d, x, s, ac) \
  {(a) += F ((b), (c), (d)) + (x) + (unsigned int)(ac); \
   (a) = ROTATE_LEFT ((a), (s)); \
   (a) += (b); \
  }
#define GG(a, b, c, d, x, s, ac) \
  {(a) += G ((b), (c), (d)) + (x) + (unsigned int)(ac); \
   (a) = ROTATE_LEFT ((a), (s)); \
   (a) += (b); \
  }
#define HH(a, b, c, d, x, s, ac) \
  {(a) += H ((b), (c), (d)) + (x) + (unsigned int)(ac); \
   (a) = ROTATE_LEFT ((a), (s)); \
   (a) += (b); \
  }
#define II(a, b, c, d, x, s, ac) \
  {(a) += I ((b), (c), (d)) + (x) + (unsigned int)(ac); \
   (a) = ROTATE_LEFT ((a), (s)); \
   (a) += (b); \
  }
unsigned int i,bytes,in[16],a,b,c,d;
FILE *fp;
unsigned char *p;
void main ()
{
fp = fopen("a.txt", "rb");
a = 0x67452301;
b = 0xefcdab89;
c = 0x98badcfe;
d = 0x10325476;
bytes=fread (in, 1, 1024, fp);
p = (char *)in;
p[bytes] = 0x80;
in[14] = bytes << 3;
in[15] = bytes >> 29;
a += ((b & c) | (~b & d))  + in[0] + 3614090360;
a = ROTATE_LEFT(a, 7); 
a += b;
//FF( a, b, c, d, in[ 0], 7, 3614090360);
FF( d, a, b, c, in[ 1], 12, 3905402710);
FF( c, d, a, b, in[ 2], 17,  606105819);
FF( b, c, d, a, in[ 3], 22, 3250441966);
FF( a, b, c, d, in[ 4], 7, 4118548399);
FF( d, a, b, c, in[ 5], 12, 1200080426);
FF( c, d, a, b, in[ 6], 17, 2821735955);
FF( b, c, d, a, in[ 7], 22, 4249261313);
FF( a, b, c, d, in[ 8], 7, 1770035416);
FF( d, a, b, c, in[ 9], 12, 2336552879);
FF( c, d, a, b, in[10], 17, 4294925233);
FF( b, c, d, a, in[11], 22, 2304563134);
FF( a, b, c, d, in[12], 7, 1804603682);
FF( d, a, b, c, in[13], 12, 4254626195);
FF( c, d, a, b, in[14], 17, 2792965006);
FF( b, c, d, a, in[15], 22, 1236535329);
 
a += ((b & d) | (c & ~d)) + in[1] + 4129170786; 
a = ROTATE_LEFT (a,5);
a += b;
 
//GG( a, b, c, d, in[ 1], 5, 4129170786);
GG( d, a, b, c, in[ 6], 9, 3225465664);
GG( c, d, a, b, in[11], 14,  643717713);
GG( b, c, d, a, in[ 0], 20, 3921069994);
GG( a, b, c, d, in[ 5], 5, 3593408605);
GG( d, a, b, c, in[10], 9,   38016083);
GG( c, d, a, b, in[15], 14, 3634488961);
GG( b, c, d, a, in[ 4], 20, 3889429448);
GG( a, b, c, d, in[ 9], 5,  568446438);
GG( d, a, b, c, in[14], 9, 3275163606);
GG( c, d, a, b, in[ 3], 14, 4107603335);
GG( b, c, d, a, in[ 8], 20, 1163531501);
GG( a, b, c, d, in[13], 5, 2850285829);
GG( d, a, b, c, in[ 2], 9, 4243563512);
GG( c, d, a, b, in[ 7], 14, 1735328473);
GG( b, c, d, a, in[12], 20, 2368359562);
 
a += (b ^ c ^ d) + in[5]  + 4294588738;
a = ROTATE_LEFT(a,4); 
a += b;
 
//HH( a, b, c, d, in[ 5], 4, 4294588738);
HH( d, a, b, c, in[ 8], 11, 2272392833);
HH( c, d, a, b, in[11], 16, 1839030562);
HH( b, c, d, a, in[14], 23, 4259657740);
HH( a, b, c, d, in[ 1], 4, 2763975236);
HH( d, a, b, c, in[ 4], 11, 1272893353);
HH( c, d, a, b, in[ 7], 16, 4139469664);
HH( b, c, d, a, in[10], 23, 3200236656);
HH( a, b, c, d, in[13], 4,  681279174);
HH( d, a, b, c, in[ 0], 11, 3936430074);
HH( c, d, a, b, in[ 3], 16, 3572445317);
HH( b, c, d, a, in[ 6], 23,   76029189);
HH( a, b, c, d, in[ 9], 4, 3654602809);
HH( d, a, b, c, in[12], 11, 3873151461);
HH( c, d, a, b, in[15], 16,  530742520);
HH( b, c, d, a, in[ 2], 23, 3299628645);
 
a += (c ^ (b | ~d))  + in[0]  + 4096336452;
a = ROTATE_LEFT (a,6);
a += (b);
 
//II( a, b, c, d, in[ 0], 6, 4096336452);
II( d, a, b, c, in[ 7], 10, 1126891415);
II( c, d, a, b, in[14], 15, 2878612391);
II( b, c, d, a, in[ 5], 21, 4237533241);
II( a, b, c, d, in[12], 6, 1700485571);
II( d, a, b, c, in[ 3], 10, 2399980690);
II( c, d, a, b, in[10], 15, 4293915773);
II( b, c, d, a, in[ 1], 21, 2240044497);
II( a, b, c, d, in[ 8], 6, 1873313359);
II( d, a, b, c, in[15], 10, 4264355552);
II( c, d, a, b, in[ 6], 15, 2734768916);
II( b, c, d, a, in[13], 21, 1309151649);
II( a, b, c, d, in[ 4], 6, 4149444226);
II( d, a, b, c, in[11], 10, 3174756917);
II( c, d, a, b, in[ 2], 15,  718787259);
II( b, c, d, a, in[ 9], 21, 3951481745);
a = a + 0x67452301;
b = b + 0xefcdab89;
c = c + 0x98badcfe;
d = d + 0x10325476;
p = (char *)&a;
printf("%02x%02x%02x%02x",p[0],p[1],p[2],p[3]);
p = (char *)&b;
printf("%02x%02x%02x%02x",p[0],p[1],p[2],p[3]);
p = (char *)&c;
printf("%02x%02x%02x%02x",p[0],p[1],p[2],p[3]);
p = (char *)&d;
printf("%02x%02x%02x%02x",p[0],p[1],p[2],p[3]);
}
 
0dece865043d2105e03126dc2c979e6e

The above program is a simplified version of the MD5 algorithm. The MD5 algorithm takes whatever input we give it and outputs a 128 bit hash value which represents the input we gave it. There is way that any two inputs will give us the same hash value. This is called a collisions are is not mathematically feasible. The reverse is also not possible, where given a hash value we can get at the original input.

The Govt of USA used another message digest algorithm called SHA-1 which spits out a 160 bit hash value. These algorithms are called one way hashes as given the hash we cannot get back to the input. The passwords that we give databases or operating systems are not stored internally as clear texts because of security reasons. If stored as clear text say in a database, anyone can get access to them if they can access the database.

Thus store the one way hash using MD5 and throw away the actual password. Each time you want to check the password you calculate the one way hash and then check with the one way hash stored in the database. This method is more secure because the minute the system has the user type in the password, it is immediately converted to a one way hash and the password does not remain in the system.

We first open the file a.txt for reading. We then take 4 variables a, b c and d and initialize them to arbitrary values. These values actually form a pattern as they start from 0 and to up to F and then back to 0. It does not look like this at first stroke as the nibbles are reversed. We then read the bytes in a.txt using the fread function. The MD5 algorithm says whatever the size of text, we need to apply some 64 rounds to it.

Thus we break up the input in chunks of 512 bits or 64 bytes or 16 ints and apply the transformations. After we do this we will come to the end where our data left may not be a multiple of 512 bits. Thus at the end we need to pad our data with some bits. Theses pad bits start with 0x80 and then are a set of zeroes. These pad bits go on until the size becomes 448 bits which is 64 bits less than 512.

We store the size of the original input minus the padding as a 64 bit number. If the size of the data is greater than 64 bits, then we store only the lower 64 bits. We get a pointer to a char p which we equate to the buffer in which we read the data. The variable bytes tells us the size of the text. We use this variable as the index into the in array and place a 0x80 there. 

The array in is an array of ints and as we want to write to a single byte we need a char * pointer. Finally at array index values of 14 and 15, that is the last 64 bits we write out the length of the file. The variables a,b,c and d will store the 128 bit hash value for us. We will now need to carry out 64 rounds for each 512 bits of data. We first use a series of macros, four in number to compute the hash for us.

The first parameter will be one that will change its value and each time we call the macro we change the order of the a,b,c and d. The FF macro calls the macro F which behaved like the bitwise if. To this value we add the input data  taking four bytes together as a number. Initially we take the in array members one after the other, after words we jumble the index values.

This is how we factor in the current data in computing the one way hash. We then add another number which is the multiplication of a constant and the sin value as shown above. After doing all this we rotate the bits not shift them a certain random number of bytes. Finally we add the second parameter to the first. We do this 16 times and then move on to round two.

Here we use the macro GG and the G macro which we call implements the majority function. Then in the HH macro we call the J macro which is the parity function. Both get called 16 times. Finally we end the last round with the II macro which calls the I macro. Once these 64 rounds are over, we once add the constants to a,b,c and d and then display the 128 bit hash value in hex. We first display a and then b, c and d.

The low byte are displayed first and hence we set a pointer to char to each variable and then display the low byte first. The printf with the x modifier will display the high byte first. This is how we can calculate the MD5 algorithm.

#include <stdio.h>
#include <time.h>
#include <string.h>
typedef unsigned char *POINTER;
typedef unsigned short int UINT2;
typedef unsigned long int UINT4;
#define PROTO_LIST(list) list
typedef struct {
  UINT4 state[4];                                   /* state (ABCD) */
  UINT4 count[2];        /* number of bits, modulo 2^64 (lsb first) */
  unsigned char buffer[64];                         /* input buffer */
} MD5_CTX;
void MD5Init PROTO_LIST ((MD5_CTX *));
void MD5Update PROTO_LIST ((MD5_CTX *, unsigned char *, unsigned int));
void MD5Final PROTO_LIST ((unsigned char [16], MD5_CTX *));
#define S11 7
#define S12 12
#define S13 17
#define S14 22
#define S21 5
#define S22 9
#define S23 14
#define S24 20
#define S31 4
#define S32 11
#define S33 16
#define S34 23
#define S41 6
#define S42 10
#define S43 15
#define S44 21
static void MD5Transform PROTO_LIST ((UINT4 [4], unsigned char [64]));
static void Encode PROTO_LIST ((unsigned char *, UINT4 *, unsigned int));
static void Decode PROTO_LIST   ((UINT4 *, unsigned char *, unsigned int));
static void MD5_memcpy PROTO_LIST ((POINTER, POINTER, unsigned int));
static void MD5_memset PROTO_LIST ((POINTER, int, unsigned int));
 
static unsigned char PADDING[64] = {
  0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
#define F(x, y, z) (((x) & (y)) | ((~x) & (z)))
#define G(x, y, z) (((x) & (z)) | ((y) & (~z)))
#define H(x, y, z) ((x) ^ (y) ^ (z))
#define I(x, y, z) ((y) ^ ((x) | (~z)))
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
 
#define FF(a, b, c, d, x, s, ac) { \
 (a) += F ((b), (c), (d)) + (x) + (UINT4)(ac); \
 (a) = ROTATE_LEFT ((a), (s)); \
 (a) += (b); \
  }
#define GG(a, b, c, d, x, s, ac) { \
 (a) += G ((b), (c), (d)) + (x) + (UINT4)(ac); \
 (a) = ROTATE_LEFT ((a), (s)); \
 (a) += (b); \
  }
#define HH(a, b, c, d, x, s, ac) { \
 (a) += H ((b), (c), (d)) + (x) + (UINT4)(ac); \
 (a) = ROTATE_LEFT ((a), (s)); \
 (a) += (b); \
  }
#define II(a, b, c, d, x, s, ac) { \
 (a) += I ((b), (c), (d)) + (x) + (UINT4)(ac); \
 (a) = ROTATE_LEFT ((a), (s)); \
 (a) += (b); \
  }
 
void MD5Init (context)
MD5_CTX *context;                                        /* context */
{
  context->count[0] = context->count[1] = 0;
  context->state[0] = 0x67452301;
  context->state[1] = 0xefcdab89;
  context->state[2] = 0x98badcfe;
  context->state[3] = 0x10325476;
}
void MD5Update (context, input, inputLen)
MD5_CTX *context;                                        /* context */
unsigned char *input;                                /* input block */
unsigned int inputLen;                     /* length of input block */
{
  unsigned int i, index, partLen;
  index = (unsigned int)((context->count[0] >> 3) & 0x3F);
  if ((context->count[0] += ((UINT4)inputLen << 3))    < ((UINT4)inputLen << 3))
  context->count[1]++;
  context->count[1] += ((UINT4)inputLen >> 29);
  partLen = 64 - index;
  if (inputLen >= partLen) {
 MD5_memcpy((POINTER)&context->buffer[index], (POINTER)input, partLen);
 MD5Transform (context->state, context->buffer);
 for (i = partLen; i + 63 < inputLen; i += 64)
   MD5Transform (context->state, &input[i]);
 index = 0;
  }
  else
 i = 0;
  MD5_memcpy ((POINTER)&context->buffer[index], (POINTER)&input[i],inputLen-i);
}
void MD5Final (digest, context)
unsigned char digest[16];                         /* message digest */
MD5_CTX *context;                                       /* context */
{
  unsigned char bits[8];
  unsigned int index, padLen;
  Encode (bits, context->count, 8);
  index = (unsigned int)((context->count[0] >> 3) & 0x3f);
  padLen = (index < 56) ? (56 - index) : (120 - index);
  MD5Update (context, PADDING, padLen);
  MD5Update (context, bits, 8);
  Encode (digest, context->state, 16);
  MD5_memset ((POINTER)context, 0, sizeof (*context));
}
static void MD5Transform (state, block)
UINT4 state[4];
unsigned char block[64];
{
  UINT4 a = state[0], b = state[1], c = state[2], d = state[3], x[16];
  Decode (x, block, 64);
  FF (a, b, c, d, x[ 0], S11, 0xd76aa478); /* 1 */
  FF (d, a, b, c, x[ 1], S12, 0xe8c7b756); /* 2 */
  FF (c, d, a, b, x[ 2], S13, 0x242070db); /* 3 */
  FF (b, c, d, a, x[ 3], S14, 0xc1bdceee); /* 4 */
  FF (a, b, c, d, x[ 4], S11, 0xf57c0faf); /* 5 */
  FF (d, a, b, c, x[ 5], S12, 0x4787c62a); /* 6 */
  FF (c, d, a, b, x[ 6], S13, 0xa8304613); /* 7 */
  FF (b, c, d, a, x[ 7], S14, 0xfd469501); /* 8 */
  FF (a, b, c, d, x[ 8], S11, 0x698098d8); /* 9 */
  FF (d, a, b, c, x[ 9], S12, 0x8b44f7af); /* 10 */
  FF (c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */
  FF (b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */
  FF (a, b, c, d, x[12], S11, 0x6b901122); /* 13 */
  FF (d, a, b, c, x[13], S12, 0xfd987193); /* 14 */
  FF (c, d, a, b, x[14], S13, 0xa679438e); /* 15 */
  FF (b, c, d, a, x[15], S14, 0x49b40821); /* 16 */
  GG (a, b, c, d, x[ 1], S21, 0xf61e2562); /* 17 */
  GG (d, a, b, c, x[ 6], S22, 0xc040b340); /* 18 */
  GG (c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
  GG (b, c, d, a, x[ 0], S24, 0xe9b6c7aa); /* 20 */
  GG (a, b, c, d, x[ 5], S21, 0xd62f105d); /* 21 */
  GG (d, a, b, c, x[10], S22,  0x2441453); /* 22 */
  GG (c, d, a, b, x[15], S23, 0xd8a1e681); /* 23 */
  GG (b, c, d, a, x[ 4], S24, 0xe7d3fbc8); /* 24 */
  GG (a, b, c, d, x[ 9], S21, 0x21e1cde6); /* 25 */
  GG (d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */
  GG (c, d, a, b, x[ 3], S23, 0xf4d50d87); /* 27 */
  GG (b, c, d, a, x[ 8], S24, 0x455a14ed); /* 28 */
  GG (a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */
  GG (d, a, b, c, x[ 2], S22, 0xfcefa3f8); /* 30 */
  GG (c, d, a, b, x[ 7], S23, 0x676f02d9); /* 31 */
  GG (b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */
  HH (a, b, c, d, x[ 5], S31, 0xfffa3942); /* 33 */
  HH (d, a, b, c, x[ 8], S32, 0x8771f681); /* 34 */
  HH (c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */
  HH (b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */
  HH (a, b, c, d, x[ 1], S31, 0xa4beea44); /* 37 */
  HH (d, a, b, c, x[ 4], S32, 0x4bdecfa9); /* 38 */
  HH (c, d, a, b, x[ 7], S33, 0xf6bb4b60); /* 39 */
  HH (b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */
  HH (a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
  HH (d, a, b, c, x[ 0], S32, 0xeaa127fa); /* 42 */
  HH (c, d, a, b, x[ 3], S33, 0xd4ef3085); /* 43 */
  HH (b, c, d, a, x[ 6], S34,  0x4881d05); /* 44 */
  HH (a, b, c, d, x[ 9], S31, 0xd9d4d039); /* 45 */
  HH (d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */
  HH (c, d, a, b, x[15], S33, 0x1fa27cf8); /* 47 */
  HH (b, c, d, a, x[ 2], S34, 0xc4ac5665); /* 48 */
  II (a, b, c, d, x[ 0], S41, 0xf4292244); /* 49 */
  II (d, a, b, c, x[ 7], S42, 0x432aff97); /* 50 */
  II (c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */
  II (b, c, d, a, x[ 5], S44, 0xfc93a039); /* 52 */
  II (a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */
  II (d, a, b, c, x[ 3], S42, 0x8f0ccc92); /* 54 */
  II (c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */
  II (b, c, d, a, x[ 1], S44, 0x85845dd1); /* 56 */
  II (a, b, c, d, x[ 8], S41, 0x6fa87e4f); /* 57 */
  II (d, a, b, c, x[15], S42, 0xfe2ce6e0); /* 58 */
  II (c, d, a, b, x[ 6], S43, 0xa3014314); /* 59 */
  II (b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */
  II (a, b, c, d, x[ 4], S41, 0xf7537e82); /* 61 */
  II (d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */
  II (c, d, a, b, x[ 2], S43, 0x2ad7d2bb); /* 63 */
  II (b, c, d, a, x[ 9], S44, 0xeb86d391); /* 64 */
  state[0] += a;
  state[1] += b;
  state[2] += c;
  state[3] += d;
MD5_memset ((POINTER)x, 0, sizeof (x));
}
static void Encode (output, input, len)
unsigned char *output;
UINT4 *input;
unsigned int len;
{
  unsigned int i, j;
  for (i = 0, j = 0; j < len; i++, j += 4) {
 output[j] = (unsigned char)(input[i] & 0xff);
 output[j+1] = (unsigned char)((input[i] >> 8) & 0xff);
 output[j+2] = (unsigned char)((input[i] >> 16) & 0xff);
 output[j+3] = (unsigned char)((input[i] >> 24) & 0xff);
  }
}
static void Decode (output, input, len)
UINT4 *output;
unsigned char *input;
unsigned int len;
{
  unsigned int i, j;
  for (i = 0, j = 0; j < len; i++, j += 4)
 output[i] = ((UINT4)input[j]) | (((UINT4)input[j+1]) << 8) | (((UINT4)input[j+2]) << 16) | (((UINT4)input[j+3]) << 24);
}
static void MD5_memcpy (output, input, len)
POINTER output;
POINTER input;
unsigned int len;
{
  unsigned int i;
  for (i = 0; i < len; i++)
 output[i] = input[i];
}
static void MD5_memset (output, value, len)
POINTER output;
int value;
unsigned int len;
{
  unsigned int i;
  for (i = 0; i < len; i++)
 ((char *)output)[i] = (char)value;
}
static void MDPrint (digest)
unsigned char digest[16];
{
  unsigned int i;
  for (i = 0; i < 16; i++)
 printf ("%02x", digest[i]);
}
int main (argc, argv)
int argc;
char *argv[];
{
  FILE *file;
  MD5_CTX context;
  int len;
  unsigned char buffer[1024], digest[16];
  if ((file = fopen (argv[1], "rb")) == NULL)
 printf ("%s can't be opened\n", argv[1]);
  else {
 MD5Init (&context);
 while (len = fread (buffer, 1, 1024, file))
   MD5Update (&context, buffer, len);
 MD5Final (digest, &context);
 fclose (file);
 printf ("MD%d (%s) = ", 5, argv[1]);
 MDPrint (digest);
 printf ("\n");
  }
}

Lets now look at the code Ron Rivest wrote and kill two birds with one stone. The first bird is that we know how a great crypt analyst writes code and the second is that this is the best way to figure out how MD5 works. We have removed some comments and blank lines and some unnecessary code.

Back to the main page