Memory address layout vulnerabilities

This post is about a programming mistake we have seen a few times in the field. If you live the TAOSSA, you probably already avoid this but it’s a surprisingly tricky and persistent bug.

Assume you’d like to exploit the function below on a 32-bit system. You control len and the contents of src, and they can be up to about 1 MB in size before malloc() or prior input checks start to error out early without calling this function.

int target_fn(char *src, int len)
{
    char buf[32];
    char *end;

    if (len < 0) return -1;
    end = buf + len;
    if (end > buf + sizeof(buf)) return -1;
    memcpy(buf, src, len);
    return 0;
}

Is there a flaw? If so, what conditions are required to exploit it? Hint: the obvious integer overflow using len is caught by the first if statement.

The bug is an ordinary integer overflow, but it is only exploitable in certain conditions. It depends entirely on the address of buf in memory. If the stack is located at the bottom of address space on your system or if buf was located on the heap, it is probably not exploitable. If it is near the top of address space as with most stacks, it may be exploitable. But it completely depends on the runtime location of buf, so exploitability depends on the containing program itself and how it uses other memory.

The issue is that buf + len may wrap the end pointer to memory below buf. This may happen even for small values of len, if buf is close enough to the top of memory. For example, if buf is at 0xffff0000, a len as small as 64KB can be enough to wrap the end pointer. This allows the memcpy() to become unbounded, up to the end of RAM. If you’re on a microcontroller or other system that allows accesses to low memory, memcpy() could wrap internally after hitting the top of memory and continue storing data at low addresses.

Of course, these kinds of functions are never neatly packaged in a small wrapper and easy to find. There’s usually a sea of them and the copy happens many function calls later, based on stored values. In this kind of situation, all of us (maybe even Mark Dowd) need some help sometimes.

There has been a lot of recent work on using SMT solvers to find boundary condition bugs. They are useful, but often limited. Every time you hit a branch, you have to add a constraint (or potentially double your terms, depending on the structure). Also, inferring the runtime contents of RAM is a separate and difficult problem.

We think the best approach for now is to use manual code review to identify potentially problematic sections, and then restrict the search space to that set of functions for automated verification. Despite some promising results, we’re still a long way from automated detection and exploitation of vulnerabilities. As the program verification field advances, additional constraints from ASLR, DEP, and even software protection measures reduce the ease of exploitation.

Over the next few years, it will be interesting to see if attackers can maintain their early lead by using program verification techniques. Microsoft has applied the same approach to defense, and it would be good to see this become general practice elsewhere.

Old programming habits die hard

While programming, it’s enlightening to be aware of the many influences you have. Decisions such as naming internal functions, coding style, organization, threading vs. asynchronous IO, etc. all happen because of your background. I think you could almost look at someone’s code and tell how old they are, even if they keep up with new languages and patterns.

When I think of my own programming background, I remember a famous quote:

“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”
— Edsger W.Dijkstra, June 18, 1975

Large memory allocations are a problem

A common mistake is keeping a fixed memory allocation pattern in mind. Since our machines are still changing exponentially, even a linear approach would quickly fall behind.

Back in the 90’s, I would make an effort to keep frames within 4K or 8K total to avoid hitting a page fault to resize the stack. Deep recursion or copying from stack buffer to buffer were bad because they could trigger a fault to the kernel, which would resize the process and slow down execution. It was better to reuse data in-place and pass around pointers.

Nowadays, you can malloc() gigabytes and servers have purely in-memory databases. While memory use is still important, the scale that we’re dealing with now is truly amazing (unless your brain treats performance as a log plot).

Never jump out of a for loop

The BASIC interpreter on early machines had limited garbage collection capability. If you used GOTO in order to exit a loop early, the stack frame was left around, unless you followed some guidelines. Eventually you’d run out of memory if you did this repeatedly.

Because of this, it always feels a little awkward in C to call break from a for loop, which is GOTO at the assembly level. Fortunately, C does a better job at stack management than BASIC.

Low memory addresses are faster

On the 6502, instructions that access zero page addresses (00 – ff) use a more compact instruction encoding than other addresses and also execute one cycle faster. In DOS, you may have spent a lot of time trying to swap things below the 1 MB barrier. On an Amiga, it was chip and fast RAM.

Thus, it always feels a bit faster to me to use the first few elements of an array or when an address has a lot of leading zeros. The former rule of thumb has morphed into cache line access patterns, so it is still valid in a slightly different form. With virtualized addressing, the latter no longer applies.

Pointer storage is insignificant

In the distant past, programmers would make attempts to fold multiple pointers into a single storage unit (the famous XOR trick). Memory became a little less scarce and this practice was denounced, due to its impact on debugging and garbage collection. Meanwhile, on the PC, segmented memory made the 16-bit pointer size insignificant. As developers moved to 32-bit protected mode machines in the 90’s, RAM size was still not an issue because it had grown accordingly.

However, we’re at a peculiar juncture with RAM now. Increasing pointers from 32 to 64 bits uses 66% more RAM for a doubly-linked list implementation with each node storing a 32-bit integer. If your list took 2 GB of RAM, now it takes 3.3 GB for no good reason. With virtual addressing, it often makes sense to return to a flat model where every process in the system has non-overlapping address space. A data structure such as a sparse hash table might be better than a linked list.

Where working set size is less than 4 GB, it may make sense to stay with a 32-bit OS and use PAE to access physical RAM beyond that limit. You get to keep 32-bit pointers but each process can only address 4 GB of RAM. However, you can just run multiple processes to take advantage of the extra RAM. Today’s web architectures and horizontal scaling means this may be a better choice than 64-bit for some applications.

The world of computing changes rapidly. What kind of programming practices have you evolved over the years? How are they still relevant or not? In what ways can today’s new generation of programmers learn from the past?

Stuxnet is embarrassing, not amazing

As the New York Times posts yet another breathless story about Stuxnet, I’m surprised that no one has pointed out its obvious deficiencies. Everyone seems to be hyperventilating about its purported target (control systems, ostensibly for nuclear material production) and not the actual malware itself.

There’s a good reason for this. Rather than being proud of its stealth and targeting, the authors should be embarrassed at their amateur approach to hiding the payload. I really hope it wasn’t written by the USA because I’d like to think our elite cyberweapon developers at least know what Bulgarian teenagers did back in the early 90’s.

First, there appears to be no special obfuscation. Sure, there are your standard routines for hiding from AV tools, XOR masking, and installing a rootkit. But Stuxnet does no better at this than any other malware discovered last year. It does not use virtual machine-based obfuscation, novel techniques for anti-debugging, or anything else to make it different from the hundreds of malware samples found every day.

Second, the Stuxnet developers seem to be unaware of more advanced techniques for hiding their target. They use simple “if/then” range checks to identify Step 7 systems and their peripheral controllers. If this was some high-level government operation, I would hope they would know to use things like hash-and-decrypt or homomorphic encryption to hide the controller configuration the code is targeting and its exact behavior once it did infect those systems.

Core Labs published a piracy protection scheme including “secure triggers”, which are code that only can be executed given a particular configuration in the environment. One such approach is to encrypt your payload with a key that can only be derived on systems that have a particular configuration. Typically, you’d concatenate all the desired input parameters and hash them to derive the key for encrypting your payload. Then, you’d do the same thing on every system the code runs on. If any of the parameters is off, even by one, the resulting key is useless and the code cannot be decrypted and executed.

This is secure except against a chosen-plaintext attack. In such an attack, the analyst can repeatedly run the payload on every possible combination of inputs, halting once the right configuration is found to trigger the payload. However, if enough inputs are combined and their ranges are not too limited, you can make such a brute-force attack infeasible. If this was the case, malware analysts could only say “here’s a worm that propagates to various systems, and we have not yet found out how to unlock its payload.”

Stuxnet doesn’t use any of these advanced features. Either the authors did not care if their payload was discovered by the general public, they weren’t aware of these techniques, or they had other limitations, such as time. The longer they remained undetected, the more systems that could be attacked and the longer Stuxnet could continue evolving as a deployment platform for follow-on worms. So disregard for detection seems unlikely.

We’re left with the authors being run-of-the-mill or in a hurry. If the former, then it was likely this code was produced by a “Team B”. Such a group would be second-tier in their country, perhaps a military agency as opposed to NSA (or the equivalent in other countries). It could be a contractor or loosely-organized group of hackers.

However, I think the final explanation is most likely. Whoever developed the code was probably in a hurry and decided using more advanced hiding techniques wasn’t worth the development/testing cost. For future efforts, I’d like to suggest the authors invest in a few copies of Christian Collberg’s book. It’s excellent and could have bought them a few more months of obscurity.

An obvious solution to the password problem

Many organizations try to solve problems by making rules. For example, they want to prevent accounts from being compromised due to weak passwords, so they institute a password policy. But any policy with specific rules gets in the way of legitimate choices and is vulnerable to being gamed by the lazy. This isn’t because people are bad, it’s because you didn’t properly align incentives.

For example, a bank might require passwords with at least one capital letter and a number. However, things like “Password1” are barely more secure than “password”. (You get them on the second phase of running Crack, not the first phase. Big deal.) A user who chose that password was just trying to get around the rule, not choose something secure. Meanwhile, a much more secure password like “jnizwbier uvnqera” would fail the rule.

The solution is not more rules. It is twofold: give users a “why” and a “how”. You put the “why” up in great big red letters and then refer to the “how”. If users ignore this step, your “why” is not compelling enough. Come up with a bigger carrot or stick.

The “why” is a benefit or penalty. You could give accountholders a free coffee if their account goes 1 year without being compromised or requiring a password reset. Or, you can make them responsible for any money spent on their account if an investigation shows it was compromised via a password.

The “how” in this case is a short tutorial on how to choose a good passphrase, access to a good random password generator program, and enough characters (256?) to prevent arbitrary limits on choices.

That’s it. Once you align incentives and provide the means to succeed, rules are irrelevant. This goes for any system, not just passwords.