When reviewing source code, I often see a developer making claims like “this can’t happen here” or “to reach this point, X condition must have occurred”. These kinds of statements make a lot of assumptions about the execution environment. In some cases, the assumptions may be justified but often they are not, especially in the face of a motivated attacker.
One recent gain for disk reliability has been ZFS. This filesystem has a mode that generates SHA-1 hashes for all the data it stores on disk. This is an excellent way to make sure a read error hasn’t been silently corrected by the disk to the wrong value or to catch firmware that lies about the status of the physical media. SHA-1 support in filesystems is still a new feature due to the performance loss and the lingering doubt whether it’s needed since it ostensibly duplicates the builtin ECC in the drive.
Mirroring (RAID1) is great because if a disk error occurred, you can often fetch a copy from the other disk. However, it does not address the problem of being certain that an error had occurred and that the data from the other drive was still valid. ZFS gives this assurance. Mirroring could actually be worse than no RAID if you were silently switched to a corrupted disk due to a transient timeout of the other drive. This could happen if its driver had bugs that caused it to timeout on read requests under heavy load, even if the data was perfectly fine. This is not a hypothetical example. It actually happened to me with an old ATA drive.
Still, engineers resist adding any kind of redundant checks, especially of computation results. For example, you rarerly see code adding the result of a subtraction to compare to the original input value. Yet in a critical application, like calculating Bill Gates’ bank account statement, perhaps such paranoia would be warranted. In security-critical code, the developer needs even more paranoia since the threat is active manipulation of the computing environment, not accidental memory flips due to cosmic radiation.
Software protection is one area where redundancy is crucial. A common way to bypass various checks in a protection scheme is to patch them out. For example, the conditional branch that checks for a copied disc to determine if the code should continue to load a game could be inverted or bypassed. By repeating this check in various places, implemented in different ways, it can become more difficult for an attacker to determine if they have removed them all.
As another example, many protection schemes aren’t very introspective. If there is a function that decrypts some data, often you can just grab a common set of arguments to that function and repeatedly call it from your own code, mutating the arguments each time. But if that function had a redundant check that walked the stack to make sure its caller was the one it expected, this could be harder to do. Now, if the whole program was sprinkled with various checks that validated the stack in many different ways, it becomes even harder to hack out any one piece to embed in the attacker’s code. To an ordinary developer, such redundant checks seem wasteful and useless.
Besides software protection, crypto is another area that needs this kind of redundant checking. I previously wrote about how you can recover an RSA private key merely by disrupting any part of the signing operation (e.g., by injecting a fault). The solution to this is for the signer to verify the signature they just made before revealing the signature to an attacker. That is, a sign operation now performs both a sign and verify. If you weren’t concerned about this attack, this redundant operation would seem useless. But without it, an attacker can recover a private key after observing only a single faulty signature.
We need developers to be more paranoid about the validity of stored or computed values, especially in an environment where there could be an adversary. Both software protection and crypto are areas where a very powerful attacker is always assumed to be present. However, other areas of software development could also benefit from this attitude, increasing security and reliability by “wasting” a couple cycles.
The more complex an application becomes due to these checks the more likely the checks themselves will contain a vulnerability.
That can be said for any code (“the more complex it is, the more likely it contains a bug”). The question is, “is the increased value worth the risk?”
Note also that my examples illustrate my main point: less complex code introduces the vulnerability.
As long as I have been involved in IT security, I have been stunned by the lack of data integrity checking. Most security people prefer more firewalls, ids, redundant hardware and the like above some good old checking of content.
With the risks moving up layer after layer, who knows one day we will find the resources to actually make this an issue.
Excellent to see you give it some exposure, thanks.
According to http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums the checksum options are “fletcher2 | fletcher4 | sha256”, not sha1. As far as I know, nobody uses the cryptographically secure option — sha256 — instead everyone uses the insecure checksums in order to save CPU cycles.
Is there much security benefit to using SHA-256 though? You’re not getting any crypto benefit because it’s a pure hash and not a MAC (unless ZFS protects the checksums in some way that I’m not familiar with), and you’re paying a huge performance penalty just to get a wider checksum value. If all you’re worried about is catching errors than 32-bit Fletcher (or insert-favourite-quick-check here) seems to be a good tradeoff.
Dave, yes, you’re right. However, ZFS is working to add better cryptographic support. It might be worth a review.
I’ve always thought disk encryption without integrity protection is worse from a reliability standpoint. You want an accidental bitflip in a sector to be reported to an upper layer (like a RAID mirror) so it can fail over to known good data from another source. CBC alone turns a bitflip into 16 extra bytes of garbage while still not detecting the error. So it makes things worse by itself.
Ross Anderson, possibly in “Programming Satan’s Computer”, commented that anyone who used computers in the 1950s would never dream of not following up a particular mathematical operation with its inverse and a compare with the original values just to make sure that nothing had gone wrong (although in that case it was because there was a much higher probability than today that something had actually gone wrong).
For this sort of thing I really like Bertrand Meyer’s design-by-contract, where each function has preconditions (and optionally postconditions, although in the presence of multiple exit points to a function that can be a bit harder to manage) that both document the programming contract enforced by the function and act as a sanity-check on the parameters passed to it and the state-of-the-world in general. I’ve been using this in my code for ages (although support in Meyer’s Eiffel was rather more elegant than in my preferred C) and I’m amazed at how many things it catches, and conversely how many not-really-valid things you can get away with doing in code without anything obviously breaking. Any performance penalty appears to be pretty much nonexistent (particularly when you compare it to the heavyweight crypto that’s going on elsewhere), and it definitely pays off both in terms of code maintenance and for quickly catching basic problems during code changes.
I’ll have to keep reading this blog. This blog has some interesting posts and comments worth reading!
Dave has a great point. As the software stack becomes harder and harder to break attackers will look for issues in the hardware itself. If the software is checking the hardware results attacks based on a hardware flaw could be detected.
As a general comment on both software and (some) hardware protection, there’s a rather good book that’s just appeared with the somewhat misleading title of “Surreptitious Software” (the subtitle “Obfuscation, Watermarking, and Tamperproofing for Software Protection” as a bit more illuminating) that covers these issues in considerable detail. I don’t generally do endorsements but the authors of this one really know their stuff and cover a huge range of techniques, both attacks and defences.
One thing that I haven’t seen discussed much is robustness principles for software that go beyond the basic “check for integer overflows” and the like. For example one defence mechanism that MS adopted in recent versions of the Windows allocator is to check, before freeing (unlinking) a memory block, that current->prev == prev->next and current->next == next->prev, to prevent common heap-overflow attacks. This is fairly easy to add to any linked-list manipulation code, but I don’t know of any list of design patterns that can be applied in the same way as, say, the CERT secure coding guidelines.
One particularly challenging issue is how to sanity-check the contents of a large structure containing state information. Say you have an SSL implementation and want to protect it against control-flow attacks (there was an unpublished attack on an SSH implementation a few years ago, for example, that allowed you to bypass the sign-in authentication by overflowing the password field so that it overwrote the nearby ‘is-authenticated’ field with the characters from the end of the password). Lets say your SSL implementation holds all your session state in (say) a ‘struct SSL’ containing a mix of mutable and immutable (after the session is established) fields. Moving the fields around to make the mutable and immutable ones adjacent isn’t really an option because you’d end up with a random mix of unrelated variables (in other words you’d end up with an ‘unstruct SSL’). How do you build a means of checking for unexpected modifications without moving fields around? You end up with a problem that’s a bit like the AH header checks in IPsec, but with many fields present in the struct and a far more complex field structure it’s much harder to do.
(This is an open question, I’d be interested in any comments on how to solve it).
Dave, insightful as always. The book “Surrepititious Software” is on my reading list. I’ve always respected Collberg’s academic articles on software protection, given that there aren’t many out there on this topic.
One thing I keep coming back to is the idea that software protection principles will eventually be deployed in every compiler. Just like stack cookies are expected to be enabled in production software today, future compilers will automatically insert compile-time-randomized hash integrity checks for both code and data. This will be great for reliabilty as well, since memory corruption errors (even self-induced, not due to an attacker) will be more obvious.
I’m interested in seeing research in this area because, as you point out, it’s difficult to do in an automated way since you have to explicitly tell the compiler how you will be treating struct members.
Speaking of protection built into compilers, it’s been interesting tracing the evolution of /GS in Microsoft’s compilers. This has been upgraded for pretty much every release since it was introduced in VS 2002 in response to new attacks (a few days ago I went to a talk by a MS security person who said that if you have no other reason to upgrade to Visual Studio 2010 then get it just so you can rebuild your existing code with the enhanced GS handling). There’s a nice summary of the evolution (although with most of the emphasis on recent changes) on the VS blog at http://blogs.msdn.com/vcblog/archive/2009/03/19/gs.aspx.
As for the checksum-insertion, Microsoft’s Phoenix compiler project (which allows you to extend the compiler with user-written add-ons) looks like one way of doing this. The downside is that it’s still a bit of a fixer-upper. The closest I’ve seen in the OSS world is Treehydra, a gcc plugin that provides Javascript access to the gcc AST, but that’s read-only access to help in writing static analysis tools rather than a user-written compiler extension capability. Now all we need is (handwave) someone else to write the necessary plugins…