Previously, we discussed how DMA works in the PC architecture. The northbridge is only aware of physical addresses and directs transactions to the appropriate devices or RAM based solely on that address.
Unlike within the CPU where there is virtual memory translation and protection, the chipset previously did not perform any translation of physical addresses or place restrictions on which addresses can be accessed. From the RAM’s perspective, a memory access that originated from the CPU or from the integrated Ethernet is exactly the same. Stability and security depended on the device driver properly programming the device to DMA only to physical addresses that were within the buffer assigned by the OS. This was fine except for device driver or hardware bugs, since ring 0 code was trusted.
With system-wide virtualization and DRM becoming more common, ring 0 code is no longer trusted. To avoid DMA corruption between guests or the host, the hypervisor previously would create a fake device for each OS instance. The guest talked to the fake device, and the host would multiplex the transactions over the real device. This has a lot of overhead, so it would be preferable to let the guest talk directly to the real device.
An IOMMU provides translation and protection for physical addresses. The hypervisor sets up a page table within the northbridge that groups page table entries by their device IDs. Then, when a DMA request arrives at the northbridge from a device, it is looked up by its ID, translated into the actual destination physical address, and allowed or denied based on the protection settings. If a write is denied, no data is transferred to RAM. If it’s a read, all bits are set to 1 in the response. Either way, an abort error is returned to the device as well.
DMA protection (AMD: DEV, Intel: NoDMA table) is currently available in shipping products and physical address translation (AMD: IOMMU, Intel: VT-d) is coming very soon. While these features were implemented separately, it is expected that they will usually be used together.
There have been a few surprising studies of IOMMU performance. The first paper, by IBM researchers, shows that the overhead in setting up and tearing down mappings consumed up to 60% more CPU than without. They discuss various mapping allocation strategies to address this. However, they all have their disadvantages. One of the strategies, setting up the mappings at guest startup and never changing them, interferes with the hypervisor strategy called “ballooning”, where resources are only allocated to a guest as it uses them. This is what allows VMware to run guests with more RAM available to them than the host actually has. Read the paper for more analysis of their other strategies.
Another paper, by Rice University researchers, proposes virtualization support built into the devices themselves (“CDNA”). They build a NIC that maintains a unique set of registers for each guest. Each guest believes it has direct access to the NIC, although requests to set up DMA go through the hypervisor. The NIC hardware manages the fair scheduling of DMA among all the register contexts, so actual packets going out on the wire will be balanced between the various guests sending them. This approach requires no IOMMU, but each device needs to be capable of maintaining multiple register contexts. Again, read this paper for a different take on device virtualization.
This research shows that an IOMMU is not the only way to achieve DMA protection, and it’s important to carefully design how a hypervisor uses an IOMMU to prevent a loss of performance. Next time, we’ll examine some usage scenarios for IOMMUs, both in virtualization and DRM.