Trapping access to debug registers

October 15, 2007October 14, 2007 ~ Nate Lawson

If you’re designing or attacking a software protection scheme, the debug registers are a great resource. Their use is mostly described in the Intel SDM Volume 3B, chapter 18. They can only be accessed by ring 0 software, but their breakpoints can be triggered by execution of unprivileged code.

The debug registers provide hardware support for setting up to four different breakpoints. They have been around since the 386, as this fascinating history describes. Each breakpoint can set to occur on an execute, write, read/write, or IO read/write (i.e., in/out instructions). Each monitored address can be a range of 1, 2, 4, or 8 bytes.

DR0-3 store the addresses to be monitored. DR6 provides status bits that describe which event occurred. DR7 configures the type of event to monitor for each address. DR4-5 are aliases for DR6-7 if the CR4.DE bit is clear. Otherwise, accessing these registers yields an undocumented opcode exception. This behavior might be useful for obfuscation.

When a condition is met for one of the four breakpoints, INT1 is triggered. This is the same exception as for a single-step trap (EFLAGS.TF = 1). INT3 is for software breakpoints and is useful when setting more than four breakpoints. However, software breakpoints require modifying the code to insert an int3 instruction and can’t monitor reads/writes to memory.

One very useful feature of the debug registers is DR7.GD (bit 13). Setting this bit causes reads or writes to any of the debug registers to generate an INT1. This was originally intended to support ICE (In-Circuit Emulation) since some x86 processors implemented test mode by executing normal instructions. This mode was the same as SMM (System Management Mode), the feature that makes your laptop power management work. SMM has been around since the 386SL and is the original x86 hypervisor.

To analyze a protection scheme that accesses the debug registers, hook INT1 and set DR7.GD. When your handler is called, check DR6.BD (also bit 13). If it is set, the instruction at the faulting EIP was about to read or write to a debug register. You’re probably somewhere near the protection code. Since this is a faulting exception, the MOV DRx instruction has not executed yet and can be skipped by updating the EIP on the stack before executing IRET.

If you’re designing software protection, there are some interesting ways to use this feature to prevent attackers from having easy access to the debug registers. I’ll have to leave that for another day.

Next Baysec: Oct 16 at O’Neills

October 7, 2007 ~ Nate Lawson ~ 2 Comments

The next Baysec meeting is at O’Neills again. Come out and meet fellow security people from all over the Bay Area. As always, this is not a sponsored meeting, there is no agenda or speakers, and no RSVP is needed.

See you on Tuesday, October 16th, 7-11 pm.

O’Neills Irish Pub
747 3rd St (at King), San Francisco

C64 screen memory and anti-debugging

October 5, 2007October 4, 2007 ~ Nate Lawson ~ 6 Comments

I think it’s fun to stir your creativity periodically by analyzing old software protection schemes. I prefer the C64 because emulators are widely available, disks are cheap and easy to import, and it’s the system I became most familiar with as a kid.

One interesting anti-debugging trick was to load the protection code into screen memory. Just like on the PC, the data on your screen is just a series of values stored in memory accessible to the main CPU. On the C64, screen memory typically was located at 0x400 – 0x7FF. Data could be loaded into this region by setting the block addresses in the file’s directory entry (very simple version of shared library load address) or by explicitly storing it at that address using the serial load routines.

To keep users from seeing garbage, the foreground and background colors were set to be the same. If you tried to break into a debugger, the prompt would usually overwrite the protection code. This could be worked around by relocating actual screen memory (by reprogramming the VIC-II chip) or by manually loading the code at a different address and disassembling it.

This is an example of anti-debugging based on utilization of shared resources. The logic is that a debugger needs to use the screen to run, so if the protection is using that resource also, the attacker will disrupt the system by activating the debugger. It is usually much more effective to use up a shared resource than to just check for signs that a debugger is present, an approach that is still important today.

DRM is passive and active

October 4, 2007October 3, 2007 ~ Nate Lawson ~ 4 Comments

In a post regarding DRM (based on another post), Alun Jones of Microsoft says:

“Passive DRM protects its content from onlookers who do not have a DRM-enabled client. Encryption is generally used for Passive DRM, so that the content is meaningless garbage unless you have the right bits in your client. I consider this ‘passive’ protection, because the data is inaccessible by default, and only becomes accessible if you have the right kind of client, with the right key.

Active DRM, then, would be a scheme where protection is only provided if the client in use is one that is correctly coded to block access where it has not been specifically granted. This is a scheme in which the data is readily accessible to most normal viewers / players, but has a special code that tells a DRM-enabled viewer/player to hide the content from people who haven’t been approved.”

The whole problem is his two categories are a false distinction. You can’t arbitrarily draw a line through a system and say “this is passive, this is active.” For your CSS example, if you consider a given player’s decryption code along with an arbitrary encrypted DVD, you have a system with both active and passive elements. If you leave out either of those elements, you have a disc that won’t play or a player with no disc, the only perfectly secure system (assuming your cryptography is good.)

When judging the efficiency of new compression schemes, the size of the decoder is added to the size of the compressed data to get a fair assessment of its efficiency. Otherwise you could win contests with a one-byte file and a 10 GB decoder program that simply contains all the actual data.

Whichever way you design a system, complexity is being pushed from one party to another but never eliminated. For DVD, where most of the complexity is in the player, there is a huge variety of player implementations that each have their own bugs. The author of every disc needs to test against many combinations of players because of that problem.

Likewise, if you push the complexity onto the disc by including executable code there, the player gets simpler but the disc could be buggy. However, in that case, the content author will get a bad reputation for the buggy disc (see the Sony rootkit fiasco he mentions).

This doesn’t just apply to DRM. While he might consider a MPEG4-AVC video file as “passive” in his terminology, it is really a complex series of instructions to the decoder. Look at the number of different but valid ways to encode video and you’ll see it’s closer to a program than to “passive” data.

Now in his definition for “Active DRM”, he is not actually describing the general class of software protection techniques. He is describing a system that is poorly-designed, often due to an attempt to retrofit DRM onto an existing system without it. Of course it makes sense that if you have two ways to access the content, one with DRM and the other without, the additional complexity makes no sense to the end-user or mass copiers. It may make economic sense to the content author, but they have to weigh the potential risks to their business also (annoying users vs. stopping some casual copying.)

Even assuming his terminology makes sense, the Windows Media Center system he references is actually a combination of “active” and “passive”. The cable video stream is encrypted (“passive”), and the Windows DRM component is “active”. In particular, it has a “black box” DLL that checks the host environment and hashes various items to derive a key, hence the problem.

All I can distill from what Alun says is “an unprotected system is made more complex by adding DRM.” I agree, but this doesn’t say anything larger about “active” versus “passive” DRM.

Full disclosure: I was previously one of the designers of the Blu-ray protection layer (BD+), a unique approach to disc protection that involves both cryptography and software protection. You can consider me biased, but my analysis should be able to stand on its own.

IOMMU – virtualization or DRM?

October 3, 2007October 3, 2007 ~ Nate Lawson ~ 5 Comments

Before deciding how to enable DMA protection, it’s important to figure out what current and future threats you’re trying to prevent. Since there are performance trade-offs with various approaches to adding an IOMMU, it’s important to figure out if you need one, and if so, how it will be used.Current threats using DMA have centered around the easiest to use interface, Firewire (IEEE 1394). Besides being a peripheral interconnect method, Firewire provides a message type that allows a device to directly DMA into a host’s memory. Some of the first talks on this include “0wned by an iPod” and “Hit by a Bus“. I especially like the latter method, where the status registers of an iPod are spoofed to convince the Windows host to disable Firewire’s built-in address restrictions.

Yes, Firewire already has DMA protection built in (see the OHCI spec.) There are a set of registers that the host-side 1394 device driver can program to specify what addresses are allowed. This allows legitimate data transfer to a buffer allocated by the OS while preventing devices from overwriting anything else. Matasano previously wrote about how those registers can be accessed from the host side to disable protection.

There’s another threat that is quite scary once it appears but is probably still a long way off. Researchers, including myself, have long talked about rootkits persisting by storing themselves in a flash-updateable device and then taking over the OS on each boot by patching it via DMA. This threat has not emerged yet for a number of reasons. It’s by nature a targeted attack since you need to write a different rootkit for each model of device you want to backdoor. Patching the OS reliably becomes an issue if the user reinstalls it, so it would be a lot of work to maintain an OS-specific table of offsets. Mostly, there are just so many easier ways to backdoor systems that it’s not necessary to go this route. So no one even pretends this is the reason for adding an IOMMU.

If you remember what happened with virtualization, I think there’s some interesting insight to what is driving the deployment of these features. Hardware VM support (Intel VT, AMD SVM) were being developed around the same time as trusted-computing chipsets (Intel SMX, AMD skinit). Likewise, DMA blocking (Intel NoDMA, AMD DEV) appeared before IOMMUs, which only start shipping in late 2007.

My theory about all this is that virtualization is something everyone wants. Servers, desktops, and even laptops can now fully virtualize the OS. Add an IOMMU and each OS can run native drivers on bare hardware. When new virtualization features appear, software developers rush to support them.

DRM is a bit more of a mess. Features like Intel SMX/AMD skinit go unused. Where can I download one of these signed code segments all the manuals mention? I predict you won’t see DMA protection being used to implement a protected path for DRM for a while, yet direct device access (i.e., faster virtualized IO) is already shipping in Xen.

The fundamental problem is one of misaligned interests. The people that have an interest in DRM (content owners) do not make hardware or software. Thus new capabilities that are useful for both virtualization and DRM, for example, will always first support virtualization. We haven’t yet seen any mainstream DRM application support TPMs, and those have been out for four years. So when is the sky going to fall?