Cyber-weapon authors catch up on blog reading

One of the more popular posts on this blog was the one pointing out how Stuxnet was unsophisticated. Its use of traditional malware methods and lack of protection for the payload indicated that the authors were either “Team B” or in a big hurry. The post was intended to counteract the breathless praise in the press for the advent of sophisticated “cyber-weapons”.

This year, more information was released in the New York Times that gave more support for both theories. The authors may not have had a lot of time due to political pressure and concern about Iran’s progress. The uneasy partnership between the US and Israel may have led to both parties keeping their best tricks in their back pockets.

A lot of people seemed skeptical about the software protection method I described called “secure triggers”. (I had written about this before also, calling it “hash-and-decrypt”.) The general idea is to gather information about the environment in order to generate a cryptographic key, which is used to decrypt the payload. If even one bit of info is incorrect, the payload can’t be decrypted. The analyst has to brute-force the proper environment, which can be made infeasible if there’s enough entropy and/or the validation method is too slow.

The critics claimed that secure triggers were too complicated or unable to withstand malware analyst scrutiny. However, this approach had been used successfully in everything from Core Impact to Blu-ray to Team Twiizers exploits, so it was feasible. Either the malware developers were not aware of this technique or there were other constraints, such as time, preventing it from being used.

Now we’ve got Gauss, which uses (surprise!) this exact technique. And, it turns out to be somewhat effective in preventing Kaspersky from analyzing the payload. We either predicted or caused the future, take your pick.

Is this the endgame? Not even, but it does mean we’re ready for the next stage.

The malware industry has had a stable environment for a while. Targeted attacks were rare, and most new malware authors hadn’t spent a lot of effort building in custom protection for their payloads. Honeypots and local analysis methods assume the code and behavior remain stable between the malware analyst’s environment and the intended target.

In the next stage, proper use of mechanisms like secure triggers will divide malware analysis into two phases: infection and payload. The infection stage can be analyzed with traditional techniques in order to find the security flaws exploited, propagation method, etc. The payload stage will change drastically, with more effort being spent on in situ analysis.

When the payload only decrypts and runs on a single target system, the malware analyst will need direct access to the compromised host. There are several forms this might take. The obvious one is providing a remote shell to the analyst to log in, attach a debugger, try to get a memory dump of the process, etc. This is dangerous because it involves giving an outsider access to a trusted system, and one that might be critical to other operations. Even if a whole-system memory dump is generated, say by physical access or a cold-boot attack, there is still going to be a lot of sensitive information there.

Another approach is emulation. The analyst uses a VM that turns all local syscalls into remote ones. This is connected to the compromised target host (or a clone of it), which runs a daemon to answer the API queries. The malware sample or relevant portions of it (such as the hash-and-decrypt routine) are run in the analyst’s VM, but the information the routine gathers comes directly from the compromised host. This allows the analyst to gather the relevant information while not having full access to the compromised machine.

In the next phase after this, malware authors add more anti-emulation checks to their payload decryption routine. They try to prevent this routine from being run in isolation, in an emulator. Eventually, you end up in a cat-and-mouse game of Core Wars on the live hardware. Malware keeps a closely-synchronized global heartbeat so that any attempt to dump and restart it on a single host corrupts its state irrecoverably. The payload, its triggers, and encryption keys evolve in coordination with the other hosts on the network and are tied closely to each machine’s identity.

Is this where we’re headed? I’m not sure, but I do know that software protection measures are becoming more, not less relevant.

Improving ASLR with internal randomization

Most security engineers are familiar with address randomization (ASLR). In the classic implementation, the runtime linker or image loader chooses a random base offset for the program, its dynamic libraries, heap, stack, and mmap() regions.

At a higher level, these can all be seen as obfuscation. The software protection field has led with many of these improvements because cracking programs is a superset of exploiting them. That is, an attacker with full access to a program’s entire runtime state is much more advantaged than one with only remote access to the process, filtered through an arbitrary protocol. Thus, I predict that exploit countermeasures will continue to recapitulate the historical progress of software protection.

The particular set of obfuscations used in ASLR were chosen for their ease of retrofitting existing programs. The runtime linker/loader is a convenient location for randomizing various memory offsets and its API is respected by most programs, with the most notable exceptions being malware and some software protection schemes. Other obfuscation mechanisms, like heap metadata checksumming, are hidden in the internals of system libraries. Standard libraries are a good, but less reliable location than the runtime linker. For example, many programs have their own internal allocator, reducing the obfuscation gains of adding protection to the system allocator.

A good implementation of ASLR can require attackers to use a memory disclosure vulnerability to discover or heap fung shui to create a known memory layout for reliable exploitation. While randomizing chunks returned from the standard library allocator can make it harder for attackers to create a known state, memory disclosure vulnerabilities will always allow a determined attacker to subvert obfuscation. I expect we’ll see more creativity in exercising partial memory disclosure vulnerabilities as the more flexible bugs are fixed.

ASLR has already forced researchers to package multiple bugs into a single exploit, and we should soon see attackers follow suit. However, once the base offsets of various libraries are known, the rest of the exploit can be applied unmodified. For example, a ROP exploit may need addresses of gadgets changed, but the relative offsets within libraries and the code gadgets available are consistent across systems.

The next logical step in obfuscation would be to randomize the internals of libraries and code generation. In other words, you re-link the internal functions and data offsets within libraries or programs so that code and data are at different locations in DLLs from different systems. At the same time, code generation can also be randomized so that different instruction sequences are used for the same operations. Since all this requires deep introspection, it will require a larger change in how software is delivered.

Fortunately, that change is on the horizon for other reasons. LLVM and Google NaCl are working on link-time optimization and runtime code generation, respectively. What this could mean for NaCl is that a single native executable in LLVM bitcode format would be delivered to the browser. Then, it would be translated to the appropriate native instruction set and executed.

Of course, we already have a form of this today with the various JIT environments (Java JVM, Adobe ActionScript, JavaScript V8, etc.) But these environments typically cover only a small portion of the attack surface and don’t affect the browser platform itself. Still, randomized JIT is likely to become more common this year.

One way to implement randomized code delivery is to add this to the installer. Each program could be delivered as LLVM IR and then native code generation and link addresses could be randomized as it was installed. This would not slow down the installation process significantly but would make each installation unique. Or, if the translation process was fast enough, this could be done on each program launch.

Assuming this was successfully deployed, it would push exploit development to be an online process. That is, an exploit would include a built-in ROP gadget generator and SMT solver to generate a process/system-specific exploit. Depending on the limitations of available memory disclosure vulnerabilities and specific process state, it might not be possible to automatically exploit a particular instance. Targeted attacks would have to be much more targeted and each system compromised would require the individual attention of a highly-skilled attacker.

I’m not certain software vendors will accept the nondeterminism of this approach. Obviously, it makes debugging production systems more difficult and installation-specific. However, logging the random seed used to initiate the obfuscation process could be used to recreate a consistent memory layout for testing.

For now, other obfuscation measures such as randomizing the allocator may provide more return on investment. As ROP-specific countermeasures are deployed, it will become easier to exploit a program’s specific internal logic (flags, offsets, etc.) than to try to get full code execution. It seems that, for now, exploit countermeasures will stay focused on randomizing and adding checksums to data structures, especially those in standard libraries.

But is this level of obfuscation where exploit countermeasures are headed? How long until randomized linking and code generation are part of a mainline OS?

State space explosion in program analysis and crypto

While analyzing some software the other day, I was struck by the duality of cryptanalyzing block ciphers and program analysis techniques. Both present a complex problem and similar tools can be applied to each.

The three main program analysis techniques are dynamic analysis (e.g., execution traces or debugging), symbolic execution, and abstract interpretation. Each has its place but also has unique disadvantages.

Dynamic analysis tests one set of paths through a program with some variance in inputs (and thus program state). Fuzzing is an attempt to increase the path coverage and number of states for each path via random inputs. Smart fuzzing directs the choice of these inputs by discovering constraints via an SMT solver. While dynamic analysis is fast and doesn’t give any false positives (a crash is a crash), it is extremely limited in coverage, both of code paths and program states.

Symbolic execution covers all possible inputs and code paths but has really poor performance. Since it models the exact behavior of the program for each state and code path, it does not lead to false positives or false negatives. The downside is that it is much too slow to handle more than a few simple functions.

Abstract interpretation has characteristics in common with both. It deploys three-valued logic (0, 1, and “unknown”) to predict a program’s behavior. While not fast, it is fast enough to be performed on the whole program (like dynamic analysis) and gives better coverage of inputs without the nondeterminism of fuzzing. Unlike symbolic execution, it is an under-approximation of behavior and thus leaves many questions unanswered. However, unlike fuzzing, you know exactly which states are indeterminate and can iterate on those areas.

One big problem with the two static techniques is state space explosion. Every time a conditional branch is encountered, the number of possible states doubles. Thinking cryptographically, this is analagous to adding one bit to a cipher’s key or a 1-bit S-box.

All modern block ciphers are based on the substitution and permutation primitives. Permutation is a linear operation and is easy to represent with a polynomial. Substitution (e.g., an S-box) is non-linear and increases the degree of the polynomial drastically, usually squaring it.

Algebraic cryptanalysis is a means of solving for a key by treating a cipher as a system of overdetermined equaations. What algorithms like XL do is convert a set of polynomials into linear equations, which are solvable by means such as Gaussian elimination. XL replaces each polynomial term with a single new variable, and then tries to reduce the equations in terms of the new variables. While it hasn’t broken AES yet, algebraic cryptanalysis will need to be accounted for as new ciphers are designed.

The duality between program analysis and cryptanalysis is interesting to me. Would it be useful to represent unknown conditional branches as bits of a key and the entire program as a cipher, then attempt to reduce with XLS? What about converting cipher operations on bits of an unknown key to conditional branches (or jump tables for bytewise operations) and reducing using abstract interpretation?

While this musing doesn’t have practical applications, it’s still fun to find parallels between distinct areas of your work.

Stuxnet is embarrassing, not amazing

As the New York Times posts yet another breathless story about Stuxnet, I’m surprised that no one has pointed out its obvious deficiencies. Everyone seems to be hyperventilating about its purported target (control systems, ostensibly for nuclear material production) and not the actual malware itself.

There’s a good reason for this. Rather than being proud of its stealth and targeting, the authors should be embarrassed at their amateur approach to hiding the payload. I really hope it wasn’t written by the USA because I’d like to think our elite cyberweapon developers at least know what Bulgarian teenagers did back in the early 90’s.

First, there appears to be no special obfuscation. Sure, there are your standard routines for hiding from AV tools, XOR masking, and installing a rootkit. But Stuxnet does no better at this than any other malware discovered last year. It does not use virtual machine-based obfuscation, novel techniques for anti-debugging, or anything else to make it different from the hundreds of malware samples found every day.

Second, the Stuxnet developers seem to be unaware of more advanced techniques for hiding their target. They use simple “if/then” range checks to identify Step 7 systems and their peripheral controllers. If this was some high-level government operation, I would hope they would know to use things like hash-and-decrypt or homomorphic encryption to hide the controller configuration the code is targeting and its exact behavior once it did infect those systems.

Core Labs published a piracy protection scheme including “secure triggers”, which are code that only can be executed given a particular configuration in the environment. One such approach is to encrypt your payload with a key that can only be derived on systems that have a particular configuration. Typically, you’d concatenate all the desired input parameters and hash them to derive the key for encrypting your payload. Then, you’d do the same thing on every system the code runs on. If any of the parameters is off, even by one, the resulting key is useless and the code cannot be decrypted and executed.

This is secure except against a chosen-plaintext attack. In such an attack, the analyst can repeatedly run the payload on every possible combination of inputs, halting once the right configuration is found to trigger the payload. However, if enough inputs are combined and their ranges are not too limited, you can make such a brute-force attack infeasible. If this was the case, malware analysts could only say “here’s a worm that propagates to various systems, and we have not yet found out how to unlock its payload.”

Stuxnet doesn’t use any of these advanced features. Either the authors did not care if their payload was discovered by the general public, they weren’t aware of these techniques, or they had other limitations, such as time. The longer they remained undetected, the more systems that could be attacked and the longer Stuxnet could continue evolving as a deployment platform for follow-on worms. So disregard for detection seems unlikely.

We’re left with the authors being run-of-the-mill or in a hurry. If the former, then it was likely this code was produced by a “Team B”. Such a group would be second-tier in their country, perhaps a military agency as opposed to NSA (or the equivalent in other countries). It could be a contractor or loosely-organized group of hackers.

However, I think the final explanation is most likely. Whoever developed the code was probably in a hurry and decided using more advanced hiding techniques wasn’t worth the development/testing cost. For future efforts, I’d like to suggest the authors invest in a few copies of Christian Collberg’s book. It’s excellent and could have bought them a few more months of obscurity.

A new direction for homebrew console hackers?

A recent article on game console hacking focused on the Wii and a group of enthusiasts who hack it in order to run Linux or homebrew games. The article is very interesting and delves into the debate about those who hack consoles for fun and others who only care about piracy. The fundamental question behind all this: is there a way to separate the efforts of those two groups, limiting one more than the other?

Michael Steil and Felix Domke, who were mentioned in the article, gave a great talk about Xbox 360 security a few years ago. Michael compared the history of Xbox 360 security to the PS3 and Wii, among other consoles. (Here’s a direct link to the relevant portion of the video). Of all the consoles, only the PS3 was not hacked at the time, although it has since been hacked. Since the PS3 had an officially supported method of booting Linux, there was less reason for the homebrew community to attack it. It was secure from piracy for about 3 years, the longest of any of the modern consoles.

Michael’s claim was that all of the consoles had been hacked to run homebrew games or Linux, but the ultimate result was piracy. This was likely due to the hobbyists having more skill than the pirates, something which has also been the case in smart phones but less so in satellite TV. The case of the PS3 also supports his theory.

Starting back in the 1980’s, there has been a history of software crackers getting jobs designing new protection methods. So what if the homebrew hackers put more effort into protecting their methods from the pirates? There are two approaches they might take: software or hardware protection.

Software protection has been used for exploits before. The original Xbox save game exploit used some interesting obfuscation techniques to limit it to only booting Linux. It stored its payload encrypted in the JPEG header of a penguin image. It didn’t bypass code signature verification completely, it modified the Xbox’s RSA public key to have a trivial factor, which allowed the author to sign his own images with a different private key.

With all this work, it took about 3 months for someone to reverse-engineer it. At that point, the same hole could be used to run pirated games. However, this hack didn’t directly enable piracy because there were already modchip-based methods in use. So, while obfuscation can add some time to pirates getting access to the exploit, it wasn’t much.

Another approach is to embed the exploit in a modchip. These have long been used by pirates to protect their exploits from other pirates. As soon as another group clones an exploit, the price invariably goes down. Depending on the exploitation method and protection skill of the designer, reverse-engineering the modchip can be as hard as developing the exploit independently.

The homebrew community does not release many modchips because of the development cost. But if they did, it’s possible they could reduce the risk of piracy from their exploits. It would be interesting to see a homebrew-only modchip, where games were signed by a key that certified they were independently developed and not just a copy of a commercial game. The modchip could even be a platform for limiting exploitation of new holes that were only used for piracy. In effect, the homebrew hackers would be setting up their own parallel system of control to enforce their own code of ethics.

Software and hardware protection could slow down pirates acquiring exploits. However, the approach that has already proven effective is to limit the attention of the homebrew hackers by giving them limited access to the hardware. Game console vendors should take into account the dynamics of homebrew hackers versus the pirates in order to protect their platform’s revenue.

But what can you also do about it, homebrew hackers? Can you design a survivable system for keeping your favorite console safe from piracy while enabling homebrew? Enforce a code of ethics within your group via technical measures? If anyone can make this happen, you can.

PS3 hypervisor exploit reproduced

There’s a nice series of articles by xorloser on reproducing the recent PS3 hypervisor hack. He used a microcontroller to send the glitch and improved the software exploit to work on multiple firmware revisions. Here’s a picture of his final setup.

It remains to be seen what security measures Sony has taken to address a hypervisor compromise. One countermeasure would be to lock down the OtherOS environment, since the attack depends on the ability to manipulate low-level OS memory structures. They could be using a simpler hypervisor than the GameOS side (say, one that just prevents access to the GPU). Perhaps the SPEs have a disable bit that turns off the hardware decryption unit, and the hypervisor does this before booting OtherOS.

Beyond this, they may not be using a single global key that is shared amongst all SPEs. Broadcast encryption schemes have long been used in the pay TV industry to allow fine-grained revocation of keys that have leaked. They work by embedding a subset of keys from a matrix or tree in each device. If the keys leak, they can be excluded from subsequent software releases. This requires attackers to keep extracting keys and discarding the devices as they are revoked.

Also, it’s possible there are software protection measures in place. For example, the SPE could request hashes of regions of the calling hypervisor and use this to detect patching. This results in a cat-and-mouse game where firmware updates (or even individual games) use different methods of detecting attackers. Meanwhile, attackers would try to come up with new ways to avoid these countermeasures. This has already been happening in the Xbox 360 world, as well as with nearly every other game console before now.

We’ll have to wait and see if Sony used this kind of defense-in-depth and planned for this eventuality or built a really tall wall with nothing more behind it.