Last time, I asked the question, “why did it take 24 years for buffer overflow exploits to mature?” The relevant factors to answering this question are particular to three eras: academic computing (early 1970’s), rise of the Internet (1988), and x86 unification (1996 to present).
In the first era of academia, few systems were networked so buffer overflow flaws could only be used for privilege escalation attacks. (The main type of networking was dialup terminal access, which was a simpler interface and usually not vulnerable to buffer overflows.) Gaining control of an application was relevant to the designers of the trusted computing criteria (Rainbow Books), where objects were tagged with a persistent security level enforced by the TCB, not the program that accesses them. Except for the most secure military systems, computers were mainly used by academia and businesses, which focused more on password security and preventing access by unsophisticated attackers. Why write a complex exploit when there are weak or no passwords on system accounts?
In 1988, the wide availability of Unix source code and a common CPU architecture (VAX) led to the first malicious buffer overflow exploit. Before the rise of the minicomputer, there were many different CPU architectures on the Arpanet. The software stacks were also changing rapidly (TOPS-20, RSX-11, VMS). But by the late 1980’s, the popularity of DEC, Unix, and BSD in particular had led to a significant number of BSD VAX systems on the Internet, probably the most homogeneous it had been up to that point.
I think the widespread knowledge of the VAX architecture along with the number of BSD systems on the Internet led to it being the target of the first malicious buffer overflow exploit. Building a worm to target a handful of systems wouldn’t have been nearly as much fun, and BSD and the VAX had hit a critical mass on the Internet. More general Unix flaws (e.g., trust relationships by the “r” utilities) were also exploited by the worm but the VAX buffer overflow exploit was the only such flaw the author chose to target or had enough familiarity to create.
With this lone exception, any further exploitation of buffer overflows remained quite secret. Sun workstations and servers running 68k and then SPARC processors became the Internet host of choice by the 1990’s. However, there was significant fragmentation with many hosts running IRIX on SGI MIPS, IBM AIX on RT and RS, and HP-UX on PA-RISC. Even though Linux and the various BSD distributions had been launched, they were mostly used for hobbyists as client PCs, not servers.
Meanwhile, a network security community had developed around mailing lists such as Bugtraq and Zardoz. Most members were sysadmins and hackers, and few vendors actively participated (Sun was a notable exception). Exploits and fixes/workarounds were regularly published, most of them targeting logic flaws such as race conditions or file descriptor leakage across trust domains. (On a side note, I find it amusing that any debate about full disclosure back then usually involved how many days someone should wait between notifying the vendor and publishing an exploit, not advisory. Perhaps we should be talking in those same terms today.)
In 1995, the various buffer overflow exploits published were only for non-x86 systems (SunOS and HP-UX, in particular). Such systems were still the most prevalent servers on the Internet. The network security community was focused on this kind of “big iron” with few exploits published for open-source Unix systems or Windows, which was almost exclusively a client OS. This changed with the splitvt exploit for Linux on x86, although it took a year until the real impact of this change appeared.
One of the reasons buffer overflows were so rarely exploited on Unix workstations in the early 1990’s was the difficulty of getting CPU and OS architecture information. Vendors kept a lot of this information private or only provided it in costly books. Sure you could read the binutils source code, but without even an understanding of the assembly behavior, it was difficult to gain this kind of low-level knowledge. Also, common logic flaws were easier to exploit for the Unix-centric sysadmin community and that populated Bugtraq.
In 1996, several worlds collided and buffer overflow exploitation grew rapidly. First, Linux and BSD x86 systems had finally become common enough on the Internet that people cared about privilege escalation exploits specific to that CPU architecture. Second, open-source Linux and BSD code made it easy to understand stack layout, trap frames, and other system details. Finally, the virus community had become interested in network security and brought years of low-level x86 experience.
The last factor is the one I haven’t seen discussed elsewhere. Virus enthusiasts (virus authors, defenders, and various interested onlookers) had been engaged in a cat-and-mouse game since the 1980’s. The game was played out almost exclusively on x86 DOS systems. During this period, attackers created polymorphic code generators (MtE) and defenders created a VM-based system for automated unpacking (TBAV). Even today, the impact of all this innovation is still slowly trickling into mainstream security products.
Once all three of these components were in place, buffer overflow exploits became common and the field advanced quickly through the present day. CPU architecture information is now freely available and shellcode and sample exploits exist for all major systems. New techniques for mitigating buffer overflows and subverting mitigations are also constantly appearing with no end in sight.
The advent of the Code Red and Slammer worms in the early 2000’s can be seen as a second-coming of the 1988 worm. At that point, Windows servers were common enough on the Internet to exploit but did not yet contain valuable enough data to keep the flaw secret. That period quickly ended as banking and virtual goods increased the value of compromising Internet hosts, and client-side exploits also became valuable. While some researchers still publish buffer overflow exploits, the difficulty of producing a reliable exploit and associated commercial value means that many are also sold. The race started in the 1970’s continues to this day.
hmm. I don’t think VM-based AV was widely used until the early 2000s.
if I recall correctly, buffer overflows really became standard practice with the publication of the ‘Smashing The Stack For Fun And Profit’ file from Aleph One, http://insecure.org/stf/smashstack.html , back in 1996. That introduced many techniques which make it a viable generic attack mechanism.
VM-based AV was not widely used in other systems because the big firms weren’t forced to adopt it earlier. However, ask anyone who was around in the early 1990’s — TBAV was amazing technology that beat all other vendors in any comparison and it did it because of its VM approach. It was definitely widely-used amongst those in-the-know.
I cited A1’s article in the first post. I agree it was very good and contributed to growth in this area. However, as someone who was there for all this, I strongly disagree it was the primary source of widespread attacks. As I describe above, there were larger social and technical forces at work and A1’s paper was a symptom, not cause, of a growing focus on exploiting buffer overflows.
fair enough. I’m unfamiliar with TBAV; I only encountered the innards of AV technology when I joined McAfee in the 00’s.
good read. I don’t agree with everything you said but most of it seems reasonable. One thing you didn’t mention that I think is a very important factor is this: security is a weakest-link phenomenon. An attacker will not bother with a more sophisticated attack (ie. writing assembly code that might prove to be fairly well tied to a particular architecture, program and environment) when a simpler attack will do just fine (ie. sending mail to a uudecode alias). I think once people started exploring buffer overflows after some of the simpler attack vectors went away people started realizing how reusable the buffer overflow attacks could be made. I don’t think it was as obvious early on.
Yes, I agree there was no need to target buffer overflows due to the prevalence of general logic flaws. I mention that as an aside:
Good point that buffer overflows appeared hard to generalize at first. Most of the original exploits had a list of offsets for various data used by the exploit for multiple versions of the target code and libc. This was later greatly improved. It also benefited from the growing adoption of x86 as the common CPU, whatever the OS.