Improving ASLR with internal randomization

Most security engineers are familiar with address randomization (ASLR). In the classic implementation, the runtime linker or image loader chooses a random base offset for the program, its dynamic libraries, heap, stack, and mmap() regions.

At a higher level, these can all be seen as obfuscation. The software protection field has led with many of these improvements because cracking programs is a superset of exploiting them. That is, an attacker with full access to a program’s entire runtime state is much more advantaged than one with only remote access to the process, filtered through an arbitrary protocol. Thus, I predict that exploit countermeasures will continue to recapitulate the historical progress of software protection.

The particular set of obfuscations used in ASLR were chosen for their ease of retrofitting existing programs. The runtime linker/loader is a convenient location for randomizing various memory offsets and its API is respected by most programs, with the most notable exceptions being malware and some software protection schemes. Other obfuscation mechanisms, like heap metadata checksumming, are hidden in the internals of system libraries. Standard libraries are a good, but less reliable location than the runtime linker. For example, many programs have their own internal allocator, reducing the obfuscation gains of adding protection to the system allocator.

A good implementation of ASLR can require attackers to use a memory disclosure vulnerability to discover or heap fung shui to create a known memory layout for reliable exploitation. While randomizing chunks returned from the standard library allocator can make it harder for attackers to create a known state, memory disclosure vulnerabilities will always allow a determined attacker to subvert obfuscation. I expect we’ll see more creativity in exercising partial memory disclosure vulnerabilities as the more flexible bugs are fixed.

ASLR has already forced researchers to package multiple bugs into a single exploit, and we should soon see attackers follow suit. However, once the base offsets of various libraries are known, the rest of the exploit can be applied unmodified. For example, a ROP exploit may need addresses of gadgets changed, but the relative offsets within libraries and the code gadgets available are consistent across systems.

The next logical step in obfuscation would be to randomize the internals of libraries and code generation. In other words, you re-link the internal functions and data offsets within libraries or programs so that code and data are at different locations in DLLs from different systems. At the same time, code generation can also be randomized so that different instruction sequences are used for the same operations. Since all this requires deep introspection, it will require a larger change in how software is delivered.

Fortunately, that change is on the horizon for other reasons. LLVM and Google NaCl are working on link-time optimization and runtime code generation, respectively. What this could mean for NaCl is that a single native executable in LLVM bitcode format would be delivered to the browser. Then, it would be translated to the appropriate native instruction set and executed.

Of course, we already have a form of this today with the various JIT environments (Java JVM, Adobe ActionScript, JavaScript V8, etc.) But these environments typically cover only a small portion of the attack surface and don’t affect the browser platform itself. Still, randomized JIT is likely to become more common this year.

One way to implement randomized code delivery is to add this to the installer. Each program could be delivered as LLVM IR and then native code generation and link addresses could be randomized as it was installed. This would not slow down the installation process significantly but would make each installation unique. Or, if the translation process was fast enough, this could be done on each program launch.

Assuming this was successfully deployed, it would push exploit development to be an online process. That is, an exploit would include a built-in ROP gadget generator and SMT solver to generate a process/system-specific exploit. Depending on the limitations of available memory disclosure vulnerabilities and specific process state, it might not be possible to automatically exploit a particular instance. Targeted attacks would have to be much more targeted and each system compromised would require the individual attention of a highly-skilled attacker.

I’m not certain software vendors will accept the nondeterminism of this approach. Obviously, it makes debugging production systems more difficult and installation-specific. However, logging the random seed used to initiate the obfuscation process could be used to recreate a consistent memory layout for testing.

For now, other obfuscation measures such as randomizing the allocator may provide more return on investment. As ROP-specific countermeasures are deployed, it will become easier to exploit a program’s specific internal logic (flags, offsets, etc.) than to try to get full code execution. It seems that, for now, exploit countermeasures will stay focused on randomizing and adding checksums to data structures, especially those in standard libraries.

But is this level of obfuscation where exploit countermeasures are headed? How long until randomized linking and code generation are part of a mainline OS?

11 thoughts on “Improving ASLR with internal randomization

  1. It would be interesting to see how performance might be affected by taking this to the n’th degree. Certainly MS compilers try to optimise for working set and cache performance, which if you randomised everything might cause the odd hiccup. Perhaps it doesn’t matter too much for general applications these days.

    On that note I have wondered if in this sort of scenario you could exploit cache misses to infer the layout of a randomised binary by evicting the cache and trying to do a timing attack against it, sure in theory it must be possible, but probably not very reliable :)

    1. On the instruction generation side, a lighter approach than full randomization is to try to use encodings without encoded RETs (0xc3) in them. But that is still vulnerable to pop/jmp or other instruction combinations.

      The real issue is that almost any subsequence of instructions can be repurposed for other use. For example, a routine that enables a “secure mode” flag can be used to disable that flag by returning to the middle of it with the appropriate register set to 0.

      Many permutations of instructions are equivalent, and many possibilities have little loss in performance. For example, a register can be loaded with a constant value by any number of arithmetic/logical instructions (add/sub/or/not/xor). This not only changes the behavior but also the internal offsets on variable-length instruction sets like x86.

      I’d be interested in seeing some actual research in this area.

    1. Thanks for that link. My main point is that these techniques have been used in software protection for decades but not yet present in any mainstream compilers. I expect that will be changing soon.

  2. Dynamically recompiling binaries was done in the Common Executable Format for Windows CE. The first time that the binary was delivered to the device, it was recompiled to native code specifically for that device. We might consider that to be unintentionally random since the compiler on each device was developed by the third party, not by Microsoft, so the resulting binary could look quite different across different devices.

  3. The easiest way to dynamically randomize x86 code would probably be to insert random lengths of:

    0xCC * n

    in between identified instructions. Little landmines that will cause a SIGTRAP if you return into them, but the normal code will skirt around. You wouldn’t need the IR to do this either, assuming well-behaved code that doesn’t jump into the middle of other instructions.

    1. Right, that is one measure I considered that disrupts addresses. However, it does not disrupt instruction flow, meaning every ROP gadget would still work the same, you’d just have to use your memory disclosure bug to fix up the relative offsets of each. This is a straightforward string search problem, treating the intermediate jmps as no-ops.

      Varying the instructions and semantics would mean you would need to generate new ROP gadgets and shellcode online with your target. This is significantly harder than just updating addresses of existing gadgets. In effect, you’d be writing a target-specific shellcode each time.

      Without major increases in automated exploit generation, this would give defenders an advantage by limiting the number of computers that could be compromised to the aggregate time skilled exploit authors could bring to bear on them. Worms would not be feasible.

      1. Ahh, I hadn’t realised that you were talking about online mapping of the text segment itself using memory disclosure vulnerabilities.

        One possibility to defeat that is to scatter no-access guard pages within the text segment, so that the process will die on a read fault if the attacker tries to scan its memory.

      2. That’s a good idea, even as a general defense against bad pointer reads. Perhaps that should be done with the page at 0 to counteract NULL pointer derefs before they can do damage.

  4. In order for this to be viable, the compilers have to be absolutely perfect. From experience, I can tell you that LLVM has a long way to go before we can trust it to be part of the installation process for deployed binaries. Right now, the code generation is relatively stable when you’re following the happy path of generating code from from LLVM IR that was built from a source compiler. The same is not yet the case for code that has been randomly transformed. They’ll get there though.

    There are two classes of problems in LLVM today that would prevent this approach. The first is one-off code gen bugs. These are easy to fix once they’re found, and should eventually approach zero as LLVM continues to improve. The second issue is that Apple has not yet prioritized link time optimizations (LTO). As such, the LTO infrastructure in LLVM is somewhat immature. It will often run out of memory. It’s possible to create LLVM IR that will fail to produce a binary due to having impossible to encode branches. I talked to the LLVM guys at WWDC – sadly it doesn’t sound like LTO will be important to them for some time.

    As you mentioned, this approach will make debugging a pain. Understandable runtime stack traces of crashes are really important to developers. If this was built as part of the platform’s development tools, it would be relatively straightforward to incorporate the randomization seed back to the application developer along with the stack trace. The tools could even be adjusted to automatically load that particular version of the binary.

    1. Great comment, thanks for sharing your experience.

      Debugging support for obfuscated binaries is something I’ve been meaning to post about. The idea, as you mention, is to provide the internal developers with the random seed that can “unlock” debugging support.

      Not only would it allow you to decrypt a symbol table from the binary but also provide hook points for emulating breakpoints. The goal of providing easy access to developers for adding breakpoints (if they have the seed) but making it hard for attackers is a whole paper in itself.

Comments are closed.