Cyber-weapon authors catch up on blog reading

One of the more popular posts on this blog was the one pointing out how Stuxnet was unsophisticated. Its use of traditional malware methods and lack of protection for the payload indicated that the authors were either “Team B” or in a big hurry. The post was intended to counteract the breathless praise in the press for the advent of sophisticated “cyber-weapons”.

This year, more information was released in the New York Times that gave more support for both theories. The authors may not have had a lot of time due to political pressure and concern about Iran’s progress. The uneasy partnership between the US and Israel may have led to both parties keeping their best tricks in their back pockets.

A lot of people seemed skeptical about the software protection method I described called “secure triggers”. (I had written about this before also, calling it “hash-and-decrypt”.) The general idea is to gather information about the environment in order to generate a cryptographic key, which is used to decrypt the payload. If even one bit of info is incorrect, the payload can’t be decrypted. The analyst has to brute-force the proper environment, which can be made infeasible if there’s enough entropy and/or the validation method is too slow.

The critics claimed that secure triggers were too complicated or unable to withstand malware analyst scrutiny. However, this approach had been used successfully in everything from Core Impact to Blu-ray to Team Twiizers exploits, so it was feasible. Either the malware developers were not aware of this technique or there were other constraints, such as time, preventing it from being used.

Now we’ve got Gauss, which uses (surprise!) this exact technique. And, it turns out to be somewhat effective in preventing Kaspersky from analyzing the payload. We either predicted or caused the future, take your pick.

Is this the endgame? Not even, but it does mean we’re ready for the next stage.

The malware industry has had a stable environment for a while. Targeted attacks were rare, and most new malware authors hadn’t spent a lot of effort building in custom protection for their payloads. Honeypots and local analysis methods assume the code and behavior remain stable between the malware analyst’s environment and the intended target.

In the next stage, proper use of mechanisms like secure triggers will divide malware analysis into two phases: infection and payload. The infection stage can be analyzed with traditional techniques in order to find the security flaws exploited, propagation method, etc. The payload stage will change drastically, with more effort being spent on in situ analysis.

When the payload only decrypts and runs on a single target system, the malware analyst will need direct access to the compromised host. There are several forms this might take. The obvious one is providing a remote shell to the analyst to log in, attach a debugger, try to get a memory dump of the process, etc. This is dangerous because it involves giving an outsider access to a trusted system, and one that might be critical to other operations. Even if a whole-system memory dump is generated, say by physical access or a cold-boot attack, there is still going to be a lot of sensitive information there.

Another approach is emulation. The analyst uses a VM that turns all local syscalls into remote ones. This is connected to the compromised target host (or a clone of it), which runs a daemon to answer the API queries. The malware sample or relevant portions of it (such as the hash-and-decrypt routine) are run in the analyst’s VM, but the information the routine gathers comes directly from the compromised host. This allows the analyst to gather the relevant information while not having full access to the compromised machine.

In the next phase after this, malware authors add more anti-emulation checks to their payload decryption routine. They try to prevent this routine from being run in isolation, in an emulator. Eventually, you end up in a cat-and-mouse game of Core Wars on the live hardware. Malware keeps a closely-synchronized global heartbeat so that any attempt to dump and restart it on a single host corrupts its state irrecoverably. The payload, its triggers, and encryption keys evolve in coordination with the other hosts on the network and are tied closely to each machine’s identity.

Is this where we’re headed? I’m not sure, but I do know that software protection measures are becoming more, not less relevant.

State space explosion in program analysis and crypto

While analyzing some software the other day, I was struck by the duality of cryptanalyzing block ciphers and program analysis techniques. Both present a complex problem and similar tools can be applied to each.

The three main program analysis techniques are dynamic analysis (e.g., execution traces or debugging), symbolic execution, and abstract interpretation. Each has its place but also has unique disadvantages.

Dynamic analysis tests one set of paths through a program with some variance in inputs (and thus program state). Fuzzing is an attempt to increase the path coverage and number of states for each path via random inputs. Smart fuzzing directs the choice of these inputs by discovering constraints via an SMT solver. While dynamic analysis is fast and doesn’t give any false positives (a crash is a crash), it is extremely limited in coverage, both of code paths and program states.

Symbolic execution covers all possible inputs and code paths but has really poor performance. Since it models the exact behavior of the program for each state and code path, it does not lead to false positives or false negatives. The downside is that it is much too slow to handle more than a few simple functions.

Abstract interpretation has characteristics in common with both. It deploys three-valued logic (0, 1, and “unknown”) to predict a program’s behavior. While not fast, it is fast enough to be performed on the whole program (like dynamic analysis) and gives better coverage of inputs without the nondeterminism of fuzzing. Unlike symbolic execution, it is an under-approximation of behavior and thus leaves many questions unanswered. However, unlike fuzzing, you know exactly which states are indeterminate and can iterate on those areas.

One big problem with the two static techniques is state space explosion. Every time a conditional branch is encountered, the number of possible states doubles. Thinking cryptographically, this is analagous to adding one bit to a cipher’s key or a 1-bit S-box.

All modern block ciphers are based on the substitution and permutation primitives. Permutation is a linear operation and is easy to represent with a polynomial. Substitution (e.g., an S-box) is non-linear and increases the degree of the polynomial drastically, usually squaring it.

Algebraic cryptanalysis is a means of solving for a key by treating a cipher as a system of overdetermined equaations. What algorithms like XL do is convert a set of polynomials into linear equations, which are solvable by means such as Gaussian elimination. XL replaces each polynomial term with a single new variable, and then tries to reduce the equations in terms of the new variables. While it hasn’t broken AES yet, algebraic cryptanalysis will need to be accounted for as new ciphers are designed.

The duality between program analysis and cryptanalysis is interesting to me. Would it be useful to represent unknown conditional branches as bits of a key and the entire program as a cipher, then attempt to reduce with XLS? What about converting cipher operations on bits of an unknown key to conditional branches (or jump tables for bytewise operations) and reducing using abstract interpretation?

While this musing doesn’t have practical applications, it’s still fun to find parallels between distinct areas of your work.

Stuxnet is embarrassing, not amazing

As the New York Times posts yet another breathless story about Stuxnet, I’m surprised that no one has pointed out its obvious deficiencies. Everyone seems to be hyperventilating about its purported target (control systems, ostensibly for nuclear material production) and not the actual malware itself.

There’s a good reason for this. Rather than being proud of its stealth and targeting, the authors should be embarrassed at their amateur approach to hiding the payload. I really hope it wasn’t written by the USA because I’d like to think our elite cyberweapon developers at least know what Bulgarian teenagers did back in the early 90’s.

First, there appears to be no special obfuscation. Sure, there are your standard routines for hiding from AV tools, XOR masking, and installing a rootkit. But Stuxnet does no better at this than any other malware discovered last year. It does not use virtual machine-based obfuscation, novel techniques for anti-debugging, or anything else to make it different from the hundreds of malware samples found every day.

Second, the Stuxnet developers seem to be unaware of more advanced techniques for hiding their target. They use simple “if/then” range checks to identify Step 7 systems and their peripheral controllers. If this was some high-level government operation, I would hope they would know to use things like hash-and-decrypt or homomorphic encryption to hide the controller configuration the code is targeting and its exact behavior once it did infect those systems.

Core Labs published a piracy protection scheme including “secure triggers”, which are code that only can be executed given a particular configuration in the environment. One such approach is to encrypt your payload with a key that can only be derived on systems that have a particular configuration. Typically, you’d concatenate all the desired input parameters and hash them to derive the key for encrypting your payload. Then, you’d do the same thing on every system the code runs on. If any of the parameters is off, even by one, the resulting key is useless and the code cannot be decrypted and executed.

This is secure except against a chosen-plaintext attack. In such an attack, the analyst can repeatedly run the payload on every possible combination of inputs, halting once the right configuration is found to trigger the payload. However, if enough inputs are combined and their ranges are not too limited, you can make such a brute-force attack infeasible. If this was the case, malware analysts could only say “here’s a worm that propagates to various systems, and we have not yet found out how to unlock its payload.”

Stuxnet doesn’t use any of these advanced features. Either the authors did not care if their payload was discovered by the general public, they weren’t aware of these techniques, or they had other limitations, such as time. The longer they remained undetected, the more systems that could be attacked and the longer Stuxnet could continue evolving as a deployment platform for follow-on worms. So disregard for detection seems unlikely.

We’re left with the authors being run-of-the-mill or in a hurry. If the former, then it was likely this code was produced by a “Team B”. Such a group would be second-tier in their country, perhaps a military agency as opposed to NSA (or the equivalent in other countries). It could be a contractor or loosely-organized group of hackers.

However, I think the final explanation is most likely. Whoever developed the code was probably in a hurry and decided using more advanced hiding techniques wasn’t worth the development/testing cost. For future efforts, I’d like to suggest the authors invest in a few copies of Christian Collberg’s book. It’s excellent and could have bought them a few more months of obscurity.

More details surface on PS Jailbreak

There have been a few new developments regarding the recent PS3 USB exploit. Working with impressive speed, Mathieulh and other developers have released an open-source version of the exploit called PS Groove. A much more detailed analysis of PS Jailbreak was also posted, although it is still not completely clear how the exploit works.

The PS Groove exploit uses an AT90USB board with the excellent LUFA library as I had expected. (By the way, send some donations to Dean Camera if you use that library. He’s a generous developer.) It attaches the proper config descriptors in the right order but contains a different payload. It will also allow you to disconnect the USB device after the exploit is complete.

Now that more details are public, the exploit is quite impressive. It is a heap overflow, not stack overflow as Gamefreax had suggested. Also, I was right that they had misread the descriptor lengths (0x4D vs. 0xAD).

The exploit involves using various config/interface descriptors to align shellcode on the heap. Then through some still-unknown mechanism, a heap overflow gives a user-controllable function pointer, which is later called after free(). The bug appears to be related to how the PS3 enumerates Sony’s internal test JIG device. This device may be probed by a different portion of the kernel, which trusts the device’s USB descriptors more.

Go check out the code and read the analysis. I’d love to hear more about how this exploit works and how it was discovered.

Theories of how PS Jailbreak works

A company recently announced a modchip for the PS3 and claims they will be shipping them soon. It plugs into the USB port and allows running backup games. Inside the device is an ATmega8U2 USB microcontroller, something I’ve worked with before. While I didn’t have access to this device and don’t know PS3 internals, I spent a few minutes looking at a hex dump from a USB trace.

An article on Gamefreax (German, alternate in English) claimed to have reverse-engineered this device. They believe that a stack overflow in USB config descriptor processing allows the device to execute code on the PS3 host. However, their page shows a useless set of packets from a USB trace and some hex code obscured with a logo so it’s not possible to verify their claims from the info given.

In a later comment, a user named Descrambler posted a more complete dump of hex data from the USB trace. It is still not fully complete, but it’s enough to look into more of the details.

The first trace starts with a standard configuration descriptor. It is followed by a relatively standard interface descriptor, except it is class 254, subclass 1, and protocol 2. Since this is in the reserved class range, perhaps this is related to an internal Sony test tool. Following this descriptor is some data and PowerPC code.

The second trace starts with a config descriptor that is a bit weird. It claims the total length is 77 bytes (68 bytes of interface descriptors after the 9-byte config descriptor). It also claims to support 10 interfaces. With the standard interface descriptor length of 9 bytes, the total length should be 99 bytes.

There are multiple ways this might affect the PS3. If it believes the total length field for sizing the buffer, the first bytes after the initial 77 could overflow a buffer (in this case, “00 00 fe 01 02 00 09 04” are what follows). Or it might simply copy it into a static buffer (if 256 bytes long, the first bytes to overflow would be “fe 01 02 00 09 04 00 00”).

If it’s not an overflow, it could be related to how the PS3 parses the first 10 interface descriptors. The sequence is not regular. Taken in 9-byte chunks, it diverges after the first 6 interface descriptors, giving a next descriptor of “09 00 09 04 00 00 00 fe 01”. This is not a valid descriptor. Or, if the PS3 parses lengths of descriptors, it will end up with a few very short ones (“02 00”).

These are all just theories. It’s quite possible the second trace is just a decoy, meant to slow down reversing. The behavior described by Gamefreax cannot be validated from the USB traces posted by Descrambler. It appears Gamefreax may have misread the trace (77 bytes in the second trace is 0x4d, but they claim a descriptor length is 0xAD). Also, Descrambler’s hex dumps are incomplete and don’t show the various phases described in the Gamefreax post.

It’s definitely too early to claim that the PS Jailbreak exploit has been reverse-engineered. However, it should be quite easy to clone since all the data needed to do so is present in a USB trace. Just paste the data into example code for an AT90USB and replay the same descriptors. You might have to add in a bus disconnect in the right place but it should be relatively simple.

Don’t use included version of IDAPython

I ran into a problem the other day with IDAPython. It’s a great tool but the version that comes by default with IDA is often out-of-date. Lesson: always update your version of IDAPython after installing or upgrading IDA.

The problem I saw was that the idautils.Functions() generator would never return any functions. I added lots of debugging prints and found that Segments() worked but Functions() never found a function, no matter what the range of addresses was. I then found that the Functions() routine would never location any functions if the first address was not the exact EA of a function.

This was in IDA 5.5 with its default IDAPython. Here’s the commit that fixed the bug. Since there are other bugs in older releases, I recommend updating to 1.3.2 (Windows binary or SVN source).