In a previous discussion, Tim Newsham said
“I would like to see someone reverse engineer some small Haskell programs. The compilation techniques are totally foreign to anyone familiar with standard imperative languages and there are no tools designed specifically for the task.”
He then provided a link to some examples to analyze. Another commenter brought up Standard ML, another functional language. (I assume he means the NJ Standard ML implementation, but it could also be OCaml or Moscow ML as Dan Moniz pointed out.) Tim responded:
“I don’t know entirely. I’m not familiar with ML compiler implementation. They could use similar compilation techniques, but might not. ML is not ‘pure’ (and additionally is strict) so the compilation techniques might be different.”
He also provided links to a couple papers on implementing compilers for functional language. One commenter took a brief look at Tim’s examples:
“I took a look. The compiled Haskell is definitely different from the compiled ML I looked at. Roughly the same order of magnitude as to how terrible it was, though. Mine actually used Peano arithmetic on lists for simple arithmetic operations. What was funny was the authors of that program bragging about how algorithmically fast their technology was. I couldn’t help but think, after examining some entire functions and finding that all of the code was dead except for a tiny fraction of the instructions, how much a decent back-end (something with constant propagation and dead-code elimination) could have improved the runtime performance.”
Since one common obfuscation technique is to implement a VM and then write your protection code in that enviroment, how obfuscated is compiled object code from standard functional programming languages?
well i also took a look at that a.out in the previous post with gdb it seems to be perfectly debuggable in gdb
tim do you have a stripped sstripped elfkicked
sort of reverseme compiled in haskell
i would like to take a look at it and see how tough it is to really find a pasword for name
Nate feel free to cut edit delete beautify the following crap output from session on the a.out
looks like the html markers > and < got stripped in the previous post
the disassembly looks pretty straight forward with symbols and names
like this
0x8087b4c <createThread>:push ebp
0x8087b4d <createThread+1>:push edi
0x8087b4e <createThread+2>:push esi
0x8087b4f <createThread+3>:push ebx
0x8087b50 <createThread+4>:sub esp,0xc
0x8087b53 <createThread+7>:mov eax,DWORD PTR [esp+32]
“it seems to be perfectly debuggable in gdb”
Yes, there are no anti-debugging tricks in the code. That’s not the point.
“the disassembly looks pretty straight forward with symbols and names”
The createThread function is part of the runtime, not one of the functions from the source (also listed on the same page). The assembly does look somewhat “normal” but recovering the semantics from the assembly is not straightforward. At the very least, it does not look like code you’d get from an imperative language.
re: “peano arithmetic” from the old thread that kicked off this new blog thread — Here’s some short python fun I was goofing around with last week:
http://www.thenewsh.com/~newsham/lambda.py
and here’s a cool paper that inspired me to goof with such things:
Click to access 03-JansenKoopmanPlasmeijer-EfficientInterpretation.pdf
“how obfuscated is compiled object code from standard functional programming languages?”
About as obfuscated as using -static on a large C/C++ program IMO. The objdump output looks a bit ‘ugly’, and its bloated as hell, but its certainly follow-able given enough time to recreate all the underlying functions.
Its pretty easy to whip through the disassembly of a small C/C++ program to recover its semantics.
well there seems to be a few hood hatted buddhas lurking out there that were probably designed to analyse and make sense out of this bloat
buddha refuses to get enlightned sitting under my computer
hood needs some green and i refuse to install them
hat is 5 years old so its probably dirty as hell
declarative debuggers nice generic term however
haskell 6.6.1 runs nice so probably i would try play with it and diff them to see how much stagnent pattern emerges