QuarkLabs Python Security Challenge - Solution

Overview

What follows is a walkthrough of the Python security challenge posed on QuarkLabs and the steps I took to complete it. There are likely many other routes to get to the same goal, what follows is just what occurred to me as I drank 21st Amendment Hell or High Watermelon.

The Lambda Complex

At the best of times lambdas in Python can be a headfuck, but the start to the challenge was designed to be a deliberate mess of lambda pain that looked like this:

(lambda g, c, d: (lambda _: (_.__setitem__('$', ''.join([(_['chr'] if ('chr'
in _) else chr)((_['_'] if ('_' in _) else _)) for _['_'] in (_['s'] if ('s'
in _) else s)[::(-1)]])), _)[-1])( (lambda _: (lambda f, _: f(f, _))((lambda
__,_: ((lambda _: __(__, _))((lambda _: (_.__setitem__('i', ((_['i'] if ('i'
in _) else i) + 1)),_)[(-1)])((lambda _: (_.__setitem__('s',((_['s'] if ('s'
in _) else s) + [((_['l'] if ('l' in _) else l)[(_['i'] if ('i' in _) else i
)] ^ (_['c'] if ('c' in _) else c))])), _)[-1])(_))) if (((_['g'] if ('g' in
_) else g) % 4) and ((_['i'] if ('i' in _) else i)< (_['len'] if ('len' in _
) else len)((_['l'] if ('l' in _) else l)))) else _)), _) ) ( (lambda _: (_.
__setitem__('!', []), _.__setitem__('s', _['!']), _)[(-1)] ) ((lambda _: (_.
__setitem__('!', ((_['d'] if ('d' in _) else d) ^ (_['d'] if ('d' in _) else
d))), _.__setitem__('i', _['!']), _)[(-1)])((lambda _: (_.__setitem__('!', [
(_['j'] if ('j' in _) else j) for  _[ 'i'] in (_['zip'] if ('zip' in _) else
zip)((_['l0'] if ('l0' in _) else l0), (_['l1'] if ('l1' in _) else l1)) for
_['j'] in (_['i'] if ('i' in _) else i)]), _.__setitem__('l', _['!']), _)[-1
])((lambda _: (_.__setitem__('!', [1373, 1281, 1288, 1373, 1290, 1294, 1375,
1371,1289, 1281, 1280, 1293, 1289, 1280, 1373, 1294, 1289, 1280, 1372, 1288,
1375,1375, 1289, 1373, 1290, 1281, 1294, 1302, 1372, 1355, 1366, 1372, 1302,
1360, 1368, 1354, 1364, 1370, 1371, 1365, 1362, 1368, 1352, 1374, 1365, 1302
]), _.__setitem__('l1',_['!']), _)[-1])((lambda _: (_.__setitem__('!',[1375,
1368, 1294, 1293, 1373, 1295, 1290, 1373, 1290, 1293, 1280, 1368, 1368,1294,
1293, 1368, 1372, 1292, 1290, 1291, 1371, 1375, 1280, 1372, 1281, 1293,1373,
1371, 1354, 1370, 1356, 1354, 1355, 1370, 1357, 1357, 1302, 1366, 1303,1368,
1354, 1355, 1356, 1303, 1366, 1371]), _.__setitem__('l0', _['!']), _)[(-1)])
            ({ 'g': g, 'c': c, 'd': d, '$': None})))))))['$'])

(fig 1)

So on quick inspection of the mess we can see that the outermost lambda takes in three values g, c and d and returns the '$' element of a dictionary. In between that a lot of nested function calls take place.

In an effort to understand WTF is going on in the mess let’s break out the lambdas and allow ourselves to work through them one at a time. Given that each lambda is taking the result of the next lambda it makes sense to start at the inner most lambda and work our way backwards (out) from there.

The starting point is the establishment of dictionary object using the values passed into the outermost object (whatever they end up being), all very straight forward:

{ 'g': g, 'c': c, 'd': d, '$': None}

(fig 2)

Next come a couple fairly straight forward lambdas that add in extra values to the starting dictionary:

lambda _: (_.__setitem__('!',[1375,
1368, 1294, 1293, 1373, 1295, 1290, 1373, 1290, 1293, 1280, 1368, 1368,1294,
1293, 1368, 1372, 1292, 1290, 1291, 1371, 1375, 1280, 1372, 1281, 1293,1373,
1371, 1354, 1370, 1356, 1354, 1355, 1370, 1357, 1357, 1302, 1366, 1303,1368,
1354, 1355, 1356, 1303, 1366, 1371]), _.__setitem__('l0', _['!']), _)[(-1)]

(fig 3)

For those not super fluent in lambdas (yet!) this takes in a variable named _ as its argument and then proceeds to call the __setitem__ method on it, showing us that _ is a dictionary object and that initially the key '!' is being set to a list of integers, which is then itself finally reassigned to the key 'l0'. At the end of this our dictionary looks like:

{ 'g': g,
'c': c,
'd': d,
'$': None,
'!': [1375,
1368, 1294, 1293, 1373, 1295, 1290, 1373, 1290, 1293, 1280, 1368, 1368,1294,
1293, 1368, 1372, 1292, 1290, 1291, 1371, 1375, 1280, 1372, 1281, 1293,1373,
1371, 1354, 1370, 1356, 1354, 1355, 1370, 1357, 1357, 1302, 1366, 1303,1368,
1354, 1355, 1356, 1303, 1366, 1371],
'l0': [1375,
1368, 1294, 1293, 1373, 1295, 1290, 1373, 1290, 1293, 1280, 1368, 1368,1294,
1293, 1368, 1372, 1292, 1290, 1291, 1371, 1375, 1280, 1372, 1281, 1293,1373,
1371, 1354, 1370, 1356, 1354, 1355, 1370, 1357, 1357, 1302, 1366, 1303,1368,
1354, 1355, 1356, 1303, 1366, 1371] }

(fig 4)

The next lambda looks very similar and does much the same:

lambda _: (_.__setitem__('!', [1373, 1281, 1288, 1373, 1290, 1294, 1375,
1371,1289, 1281, 1280, 1293, 1289, 1280, 1373, 1294, 1289, 1280, 1372, 1288,
1375,1375, 1289, 1373, 1290, 1281, 1294, 1302, 1372, 1355, 1366, 1372, 1302,
1360, 1368, 1354, 1364, 1370, 1371, 1365, 1362, 1368, 1352, 1374, 1365, 1302
]), _.__setitem__('l1',_['!']), _)[-1]

(fig 5)

The only difference this time is the lambda takes the dictionary output from the previous lambda (fig4) as its input and then overwrites the '!' element with a new list of integers which then gets reassigned to the 'l1' element.

The third lambda is a little more complex than the previous two but introduces some conventions seen in the rest of the lambdas that are worth understanding. It looks like this (whitespace added for clarity):

lambda _: (_.__setitem__('!', [ (_['j'] if ('j' in _) else j)
    for _[ 'i'] in (_['zip'] if ('zip' in _) else zip)
((_['l0'] if ('l0' in _) else l0), (_['l1'] if ('l1' in _) else l1))
for _['j'] in (_['i']
               if ('i' in _) else i)]

), _.__setitem__('l', _['!']), _)[-1]

(fig 7)

We again see elements of the input dictionary _ being set but now there are a bunch of if, else and for keywords present. While logically these act as you would expect their syntax differs slightly from that which you see such functions used outside of a lambda.

The first construct you will see repeated in the rest of the lambdas is:

(_['j'] if ('j' in _) else j

(fig 8)

All this is really doing is looking for the presence of a key in a dictionary and returning one of two results based on whether the key is there or not, it’s somewhat equivalent to the get method on a dictionary with the optional default argument. We then hit a for loop:

for _[ 'i'] in (_['zip'] if ('zip' in _) else zip)

(fig 9)

This uses the above check for key in dictionary if logic to set the zip function (there is no key named 'zip' so the zip function is returned). Once we follow this and the next for loops through we end up setting yet another key in the dictionary, this time named 'l' that has the value of the lists from 'l0' and 'l1' combined into a single list.

The following lambda is the first time we see any of the variables (g, c, d) that are passed into the outer most lambda being referenced. This lambda XOR’s (^) the value of d with itself. Obviously any number XOR’d with itself just returns 0. This means that the value of the d variable can just be any integer and the 'i' key in the dictionary will get set to 0.

lambda _: (_.__setitem__('!',
                             ((_['d'] if ('d' in _) else d)
                              ^ (_['d'] if ('d' in _) else d))
), _.__setitem__('i', _['!']), _)[(-1)]

(fig 11)

The next lambda is the simplest yet, just adding a new element to the dictionary 's' and assigning it a value of an empty list (via the '!' element like all previous lambdas).

lambda _: (_.__setitem__('!', []), _.__setitem__('s', _['!']), _)[(-1)]

(fig 12)

The next lambda is a nested set of other lambdas and is a bit more involved:

lambda _: (lambda f, _: f(f, _))
((lambda ZZ,_: ((lambda _: ZZ(ZZ, _))
((lambda _: (_.__setitem__('i', ((_['i'] if ('i' in _) else i) +
1)),_)[(-1)])

((lambda _: (_.__setitem__('s',
            ((_['s'] if ('s' in _) else s) + [((_['l'] if ('l' in _)
else l)[(_['i'] if ('i' in _) else i)] ^ (_['c'] if ('c' in _) else
c))])), _)[-1])(_)))

 if (((_['g'] if ('g' in
_) else g) % 4) and ((_['i'] if ('i' in _) else i)< (_['len'] if ('len' in _
) else len)((_['l'] if ('l' in _) else l)))) else _)), _)

(fig 13)

Walking through this we again see a lot of the same conditional checks for keys in the dictionary, the 'i' value is then used as an iterator to walk through the values in the list 'l' from the dictionary and perform a mathematical operation on them.

First each item in the 'l' list is XOR’d (^) with the value held in the c variable and puts the result in a list in the 's' key.

Next the g variable is referenced from the dictionary and is subject to a modulo 4 operation:

if (((_['g'] if ('g' in _) else g) % 4)

(fig 14)

This condition means that for the iteration loop to continue the expression in fig 13 needs to return a non-zero value, which means the value of g can be any value as long as it is not a multiple of 4.

So at this point we know d can be any integer and g can be any integer that is not a multiple of 4. The task is to find out what the correct value of c should be so that when it’s XOR’d against the values in the list l it should produce the correct result.

The final lambda walks through the list produced in the previous lambda and stored in the 's' key and takes the chr() representation of the integers and joins them into a string and stores this in the '$' key. With the right value of c this should be our magic string.

lambda _: (_.__setitem__('$', ''.join([(_['chr']
                        if ('chr' in _)else chr)
 ((_['_'] if ('_' in _) else _)) for _['_'] in (_['s'] if ('s'
in _) else s)[::(-1)]])), _)[-1]

(fig 15)

Finding The Exit

OK we made our way through the lambda complex, we understand the operations that take place on the lists of integers contained within the first 2 lambdas and we know the values needed for 2 of the input variables (g, d). The singular unknown is the value needed for c that when XOR’d with the values in the list produces a meaningful string.

Being a man of limited mathematical genius I opted to just try lots of values of c and see what they produced and if anything looked like the secret URL. We just need to add a little extra logic around the lambdas to walk through guesses for the value of c and see what strings are returned:

g=1 #Not a multiple of 4
d=1 #Any integer
c=0 #What we need to find

for guess in range(c, 10000):
    try:
        print "%d: %s"%(guess, quark(g, guess, d))
    except:
        ##For certain values of c chr() throws an exception
        pass

(fig 16)

Running this code will produce a mess of strings scrolling up the screen, looking at the output you should see one string with a URL in - the magic value for this 1337 - cute:

1335:!lbai{o|e}bolmac!}zozgm!|k}a{|mk}!l9j6:=6jk>7hhhl?<k=7;>k9oj:79>o:o776:>=ljh=98=jj:?96ojh
1336:.cmnf/pt`sjrm`c/bnl.ru`uhb.sdrntsbdr.c6e9529ed18gggc03d2841d6`e5861`5`889512ceg2672ee5069`eg
1337:/blog.quarkslab.com/static/resources/b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf
1338:,aold-rvbqhpoba-`ln,pwbwj`,qfplvq`fp,a4g;70;gf3:eeea21f0:63f4bg7:43b7b::;730age0450gg724;bge

(fig 17)

Going to the URL results in a file downloading.

A pure Python equivalent of the transformations done by the lambdas follows to aid any understanding that is needed:

l = [1375, 1373, 1368, 1281, 1294, 1288, 1293, 1373, 1373, 1290, 1295,
1294, 1290, 1375, 1373, 1371, 1290, 1289, 1293, 1281, 1280, 1280, 1368,
1293, 1368, 1289, 1294, 1280, 1293, 1373, 1368, 1294, 1372, 1289, 1292,
1280, 1290, 1372, 1291, 1288, 1371, 1375, 1375, 1375, 1280, 1289, 1372,
1373, 1281, 1290, 1293, 1281, 1373, 1294, 1371, 1302, 1354, 1372, 1370,
1355, 1356, 1366, 1354, 1372, 1355, 1302, 1370, 1360, 1357, 1368, 1357,
1354, 1302, 1364, 1366, 1370, 1303, 1371, 1368, 1365, 1354, 1362, 1355,
1368, 1356, 1352, 1303, 1374, 1366, 1365, 1371, 1302]
l.reverse()
c = 1337
out = []
for x in l:
    out.append(chr(x^c))

print ''.join(out)

(fig 18)

Customise y0 Pythons

Once downloaded a quick inspection of the data shows it to be a 64-bit ELF binary, time to spin up a fresh Linux VM.

$ file b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf
b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf: ELF 64-bit LSB
executable, x86-64, version 1 (SYSV), dynamically linked (uses shared
libs), for GNU/Linux 2.6.26, not stripped

(fig 19)

Running strings on the binary returns a wealth of strings that look very much like function names and modules from a C Python runtime e.g.

....
PySymtable_Build
PyUnicodeUCS2_DecodeUTF16Stateful
PyComplex_FromCComplex
fwrite@@GLIBC_2.2.5
_PyGILState_Init
PyStructSequence_InitType
PyString_Concat
PyAST_Check
....

(fig 20)

Running the binary however returns an error:

$ ./b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ImportError: No module named site

(fig 21)

Yup! Looks like an unhappy Python runtime to me, lets set the PYTHONHOME environment variable and strace the execution to see what the binary is looking for:

$ export PYTHONHOME="/tmp/quark"
$ strace ./b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf

<snipped for brevity>
stat("/tmp/quark/lib/python2.7", {st_mode=S_IFDIR|0755, st_size=4096,
...}) = 0
stat("/tmp/quark/lib/python2.7/lib-dynload", 0x7fff150b0eb0) = -1 ENOENT
(No such file or directory)
write(2, "ImportError", 11ImportError)             = 11
write(2, ": ", 2: )                       = 2
write(2, "No module named site", 20No module named site)    = 20

(fig 22)

OK as expected with a Python runtime initialising it is looking for the site module, specifically it is looking for it in the lib/python2.7 subdirectory of where PYTHONHOME was set to be. Let’s pop an empty site.py at that location and see if the mystery binary complains less:

$ ./b7d8438de09fffb12e3950e7ad4970a4a998403bdf3763dd4178adf
Python 2.7.8+ (nvcs/newopcodes:a9bd62e4d5f2+, Sep  1 2014, 11:41:46)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

(fig 23)

OK awesome, we are dropped to an interactive shell that has a strange banner but on the surface behaves like what we would expect. Now to begin to look for the secret song name.....

Jump Into The C

As the binary doesn’t come with any site-packages its functionality is very limited. The first question I had was whether the binary was using the standard opcode table mappings, a quick test for this is to see if the runtime can import a .pyc module created by a standard Python 2.7.x runtime. I tested this and it imports standard Python bytecode fine, so opcodes seem normal. This is good as if we want to add in functionality from existing Python modules such as inspect or pdb we can do so.

Let’s first see what the builtins are in this runtime:

>>> import sys
>>> sys.builtin_module_names
('__builtin__', '__main__', '_ast', '_codecs', '_sre', '_symtable',
'_warnings', '_weakref', 'do_not_run_me', 'errno', 'exceptions', 'gc',
'imp', 'marshal', 'posix', 'pwd', 'signal', 'sys', 'tb', 'tb_div',
'tb_mod', 'tb_mult', 'tb_sub', 'thread', 'xxsubtype', 'zipimport')

(fig 24)

Probably a safe guess that do_not_run_me is a non-standard builtin, everything else looks normal though. Let’s import it and see what we get:

>>> import do_not_run_me
>>> dir(do_not_run_me)
['__doc__', '__name__', '__package__', 'run_me']
>>> type(do_not_run_me.run_me)
<type 'builtin_function_or_method'>

(fig 25)

>>> do_not_run_me.run_me.__doc__
'There are two kinds of people in the world: those who say there is no
such thing as infinite recursion, and those who say ``There are two
kinds of people in the world: those who say there is no such thing as
infinite recursion, and those who say ...'

(fig 26)

OK docs not that helpful, let’s just run do_not_run_me.run_me function:

>>> do_not_run_me.run_me()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: function takes exactly 1 argument (0 given)

(fig 27)

Pass in a guessed arg:

>>> do_not_run_me.run_me(1337)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not int

(fig 28)

OK now a string arg:

>>> do_not_run_me.run_me("test")
Segmentation fault

(fig 29)

Hmmm, a segfault is far more indicative of something screwy in the C Python runtime than anything at the python language layer. This however didn’t stop me from dicking around for 30 mins or so getting the dis and pdb modules (and the million dependency modules) copied over to /tmp/quark/lib/python2.7/site-packages. Interestingly the itertools module was not available or builtin to the binary so to satisfy some dependencies pure Python versions of some of the functions had to be patched into the depending modules. Once done I played around to see what dis could tell me about the do_not_run_me module. In short, jack shit.

Right fuckit, let’s jump into the binary itself - fire up IDA Pro or if you don’t have that then Hopper has a free trial.

Symbols hadn’t been stripped so finding the code relating to the do_not_run_me module was pretty straightforward, a pseudo code example of them is below:

function initdo_not_run_me {
    rax = Py_InitModule4_64("do_not_run_me", HelloMethods, 0x0, 0x0, 0x3f5);
    return rax;
}

(fig 30)

function run_me {
    rsp = rsp - 0x28;
    rax = PyArg_ParseTuple(rsi, 0x56c70b, rsp, rsp + 0x10, r8, r9, rbx);
    LODWORD(rdx) = 0x0;
    if (LODWORD(rax) != 0x0) {
            rbx = *(*(*_PyThreadState_Current + 0x10) + 0x30);
            PyObject_Call(PyFunction_New(PyMarshal_ReadObjectFromString(incr.9351,
0x91), rbx), PyTuple_New(0x0), 0x0);
            rbp = PyObject_Call(PyFunction_New(PyMarshal_ReadObjectFromString(
                *rsp, *(rsp + 0x10)), rbx), PyTuple_New(0x0), 0x0);
            PyObject_Call(PyFunction_New(PyMarshal_ReadObjectFromString(decr.9357,
0x217), rbx), PyTuple_New(0x0), 0x0);
            rdx = rbp;
    }
    rax = rdx;
    return rax;
}

(fig 31)

The init code of do_not_run_me looks absolutely standard. The code associated with run_me shows the use of Python marshaling via the PyMarshal_ReadObjectFromString() function of two strings referenced as incr.9351 and decr.9357.

Looking in the binary the two blobs of data that are referenced as incr.9351 and decr.9357 can be easily found and carved out. As we know they are inputs to marshal let’s try to unmarshal them, the binary doesn’t have the marshal module builtin so use a standard Python runtime:

import marshal
import dis
x = "\x63\x00\x00\x00\x00\x01\x00\x00..."  # incr.9351 hex bytes

y = marshal.loads(x)
print y

print dis.disco(y)

(fig 34)

This produces:

<code object foo at 0x1048105b0, file "obfuscate/gen.py", line 5>
  6           0 LOAD_CONST               1 (1)
              3 LOAD_CLOSURE             0
Traceback (most recent call last):
  File "unm.py", line 8, in <module>
    print dis.disco(y)
  File ".../dis.py", line 107, in disassemble
    print '(' + free[oparg] + ')',
IndexError: tuple index out of range

(fig 35)

As can be inferred from the code object’s name this is not a marshaled object of standard Python bytecode and so the disassembler crashes out when it gets confused.

I assume that conversely whatever is given as the argument to do_not_run_me.run_me also needs to have an object using the same remapped crazy bytecode and so this is the source of the segfault when a passed in string does not unmarshal to a code object of the obfuscated type and gets passed to PyObject_Call. However, we can use the function to unmarshal and exec the crazy code objects we extracted from the binary and return their output upon execution.

Let’s use the marshaled strings carved from the binary as input to the do_not_run_me.run_me() function. Using the incr.9351 data as input returns 3 and does not segfault, this is a good sign \o/. Using the decr.9357 data as an input returns:

>>> do_not_run_me.run_me(decr_9357_str)
Traceback (most recent call last):
  File "obfuscate/gen.py", line 22, in foo
NameError: global name 'quarkslab' is not defined

(fig 36)

Let’s make a quarkslab list object:

>>> quarkslab = []
>>> do_not_run_me.run_me(decr_9357_str)
>>> quarkslab
['For The New Lunar Republic']

(fig 37)

That looks like a thing, let’s see if it hashes to the salted hash value given on the challenge page:

>>> import hashlib
>>> hashlib.sha256("bacalhau"+'For The New Lunar Republic').hexdigest()
'61b42c223973996c797a9a366c64c3595052ff71089b4ff13d3251b66b6366e9'

(fig 38)

Awesome, it looks like we are done.

Footnotes

The steps above are just the way I got through to the answer while drinking watermelon beer, there are likely other (smarter) ways that you could achieve the same results.

Big thanks to the Quarkslab folks for creating a fun Python focussed challenge.

If you find mistakes in the above let me know and I’ll try and correct them.