Building a USB protocol analyzer

The recent effort by bushing‘s team to develop an open-source USB protocol analyzer reminded me of a quick hack I did previously. I was debugging a tricky USB problem but only had an oscilloscope.

If you’ve been following this blog, you know one of my hobby projects has been designing a USB interface for old Commodore floppy drives. The goal is to archive old data, including the copy-protection bits, before the media fails. Back in January 2009, I was debugging the first prototype board. Most of the commands succeeded but one would fail immediately every time I sent it.

I tried a software USB analyzer, but it didn’t show any more information. The command was returning almost immediately with no data. Debugging output on the device’s UART didn’t show anything abnormal, except it was never receiving the problem command. So the problem had to be between the host and target’s USB stacks, and possibly was in the AVR‘s hardware USB state machine. Only a bus analyzer could reveal what was going on.

Like other hobby developers, I couldn’t justify the cost of a dedicated USB analyzer just to troubleshoot this one problem, especially in a design I would be releasing for free. Since I did have an oscilloscope at work, I decided to build a USB decoding stack on top of it.

USB, like Ethernet and TCP/IP, is a combination of protocols. The lowest layer is the physical cabling and bit signalling. On top of this is packet framing and device addressing. Next, each device has a set of endpoints. These are analogous to TCP/UDP ports and support control, bulk, or interrupt message types. The standard control endpoint (address 0) handles a set of common configuration messages. Other endpoints are device-specific.

High-speed signalling (480 Mbit/s) is a bit different from full/low-speed, so I won’t describe it here. Suffice to say, you can just put a USB 1.1 hub between your device and host to force it to downgrade speeds. Unless you’re trying to debug a problem with high-speed signalling itself, this is sufficient to debug protocol-level issues.

The USB physical layer uses differential current flow to signal bits. This balances the charge, decreasing the latency for line transitions and increasing noise rejection.  I hooked up probes to the D+ and D- lines and saw a trace like this:

Each zero bit in USB is signalled by a transition, low to high or high to low. A one bit is signalled by no transition for the clock period. (This is called NRZI encoding). Obviously, there’s a chance for sender and receiver clocks to drift out of sync if there are too many one bits in a row, so a zero bit is stuffed into the frame after every 6 one bits. It is discarded by the receiver. An end-of-packet is signalled by a single-ended zero (SE0), which is both lines held low. You can see this at the beginning of the trace above

To start each packet, USB sends an 0x80 byte, least-significant bit first. This is 7 transitions followed by a one bit, allowing the receiver to synchronize their clock on it. You can see this in the trace above, just after the end-of-packet from the previous frame. After the sync bits, the rest of the frame is byte-oriented.

The host initiates every transaction. In a control transfer, it sends the command packet, generates an optional data phase (in/out from device), and ends with a status phase. If the transaction failed, the device returns an error byte.

My decoding script implemented all the layers in the quickest way possible. After taking a scope trace, I’d dump the samples to a file. The script would then run through them, looking for the first edge. If this edge was part of a sync byte, it would begin byte-aligned decoding of a frame to pass up to higher-level functions. At the end of the packet, it would go back to scanning for the next edge. Using python’s generators made this quite easy since it was just a series of nested loops instead of a complicated state machine.

Since this was a quick hack, I cut corners. To detect the SE0 end-of-packet, you really need to monitor both D+ and D-. At higher speeds, the peaks get lower since less current is exchanged. However, at lower speeds, you can ignore this and just put a scope probe on the D- line. Instead of proper decoding of the SE0, I’d just decode each frame until no more data was expected and then yield a fake EOP symbol to the upper layers.

After a few days of debugging, I found the problem. The LUFA USB stack I was using in my firmware had a bug. It had a filter for standard control messages (such as endpoint configuration) that it handled for you. Class-specific transactions were passed up to a handler in my firmware. The bug was that the filter was too permissive — all control transfers of type 6, even if they were class-specific, were captured by LUFA. This ended up returning an error without ever passing the message to my firmware. (By the way, the LUFA stack is excellent, and this bug has long since been fixed).

Back in the present, I’m glad to see the OpenVizsla project creating a cheaper USB analyzer. It should be a great product. Based on my experience, I have some questions about their approach I hope are helpful.

It seems kind of strange that they are going for high-speed support. Since the higher-level protocol messages you might want to reverse-engineer are the same regardless of speed, it would be cheaper to just handle low/full speed and use a hub to force devices to downgrade. I guess they might be dealing with proprietary devices, such as the Kinect, that refuse to operate at lower speeds. But if that isn’t the case, their namesake, the Beagle 12, is a great product for only $400.

I have used the Total Phase Beagle USB analyzers, and they’re really nice. As with most products these days, the software makes the difference. They support Windows, Mac, and Linux and have a useful API. They can output data in CSV or binary formats. They will be supporting USB 3.0 (5 Gbps) soon.

I am glad OpenVizsla will be driving down the price for USB analyzers and providing an option for hobbyists. At the same time, I have some concern that it will drive away business from a company that provides open APIs and well-supported software. Hopefully, Total Phase’s move upstream to USB 3.0 will keep them competitive for people doing commercial development and the OpenVizsla will fill an underserved niche.

10 thoughts on “Building a USB protocol analyzer

  1. The only one I knew about is Wireshark, on Linux you can see the USB-frames. But that is to high level.

    Thanks for the long explanation, it was very interresting.

  2. On high speed: Some devices refuse to operate at lower speeds. Cameras in particular; the host driver can say, “Oh, you’re not running at full speed, I’m not even going to light you up,” and you’ll have no chance at all to examine the traffic, because there won’t be any.

    1. That’s interesting. I haven’t run into anything like that yet. The $800 savings in buying the low/full speed Beagle analyzer has come with no loss of access to devices. Some run slower than is useful, but still work. If I ran into one of those devices that required high speed, I guess I’d upgrade.

      P.S. I enjoy your blog

    1. Cleverscope. I liked the software and the sample depth (8 MS @100 Mhz) was good. If I were to get one now, I’d consider the Picoscope due to higher sample rates.

      Gage is good for PCI boards, but they’re even more pricey than Picotech.

      And, of course, standalone scopes are getting deeper capture buffers as well. However, I like the portability of a USB scope so I’ll probably stick with that for now.

  3. Hi Nate! Someone pointed this article to me at CCC; if I had seen it earlier, I would have responded earlier. Thanks for the thoughtful questions.

    It seems kind of strange that they are going for high-speed support. Since the higher-level protocol messages you might want to reverse-engineer are the same regardless of speed, it would be cheaper to just handle low/full speed and use a hub to force devices to downgrade. I guess they might be dealing with proprietary devices, such as the Kinect, that refuse to operate at lower speeds. But if that isn’t the case, their namesake, the Beagle 12, is a great product for only $400.

    This is a fair point, and one that actually hadn’t occurred to us. Forcing the downgrade seems like an ugly hack to me, especially knowing that the hardware required for high-speed is still much cheaper than $400, given “normal” retail markup vs. somewhat-obscene test-equipment markup. (As you note, the Kinect probably would not have worked with that hack, nor would such a trick help with debugging throughput issues on high-speed links in experimental hardware that a would-be hardware hacker might be trying to create.)

    I would go into a rant right about now about how the test equipment industry is some huge ripoff — $5000 JTAG probes vs $50 OpenOCD modules, $20000 logic analyzers vs $200 open-source/hackable logic analyzers — but I did have the opportunity to talk to a friend-of-a-friend at CCC who has designed and sold some test equipment in the past (at my “ripoff” prices). He did help give me some proper perspective — mainly that when a commercial, professional widget manufacturer is trying to work out some hardware bugs, they need some assurance that they will be able to get proper support from their equipment vendors to resolve any issues in the test equipment, lest they spend crucial engineering-time debugging phantom glitches that are caused by cheap test equipment.

    We’re at the completely other end of the spectrum — at the end of the day, we’re going to be delivering hardware with very little support beyond a small warranty against manufacturing defects. We have had many offers from friends to help write the software/firmware/etc, and I’m confident that we will be able to bring together enough volunteer resources to make a working product that does what we expect it to do. This is not a safe bet for commercial test equipment vendors to make, though maybe it will encourage them to sell cheaper unsupported hardware and to not harass people that post PCB photos of their hardware online.

    Also, for the record — I was not aware of any of the nice features (cross-platform support, API, useful data export) that the Total Phase software offers. Good for them — if they had put a little bit more effort into advertising all of that and brought their hardware prices down a bit, I may never have felt the need to dive into yet another yak-shaving project. :) As you say, there is probably enough room for different tiers of product, and hopefully the success of projects like ours will inspire others (either amateurs, or incumbent vendors) to serve the hobbyist/tinkerer in other areas.

    1. Thanks for the thoughtful comment. A forced downgrade via USB 1.1 hub does work fine, unless either your host insists on high-speed transfers or you’re trying to debug a problem in the J/K chirp or whatever.

      I agree test equipment has been quite expensive and doesn’t seem to be following Moore’s law. If you look at what $3k gets you in oscilloscopes over the past 10 years, it’s more of a slightly increasing line than an exponential curve (maybe 100 Mhz -> 250 Mhz over 10 years).

      Recently, however, it seems like there are more hobbyist-grade test equipment companies. The various PC scope firms, Total Phase, etc. are all driving down prices on equipment with decent support. It’s still a commercial product and not cheap, but better than Tek or LeCroy.

      Your project is a nice extension of this growth. While there have been various simple open source logic analyzers (e.g., SPI), I haven’t seen anything nearly as complex as USB 2.0. It’s great to see you moving things forward here. I was just concerned about the potential loss of revenue to Total Phase, but that is ultimately unavoidable. They will have to figure out how to compete anyway, if not with Open Vizsla, then with China.

Comments are closed.