I’m starting to get some queries about the challenge Tom, Peter, and I issued to Joanna. In summary, we’ll be giving a talk at Blackhat showing how hypervisor-based rootkits are not invisible and the detector always has the fundamental advantage. Joanna’s work is very nice, but her claim that hypervisor rootkits are “100% undetectable” is simply not true. We want to prove that with code, not just words.
Joanna recently responded. In summary, she agrees to the challenge with the following caveats:
- We can’t intentionally crash or halt the machine while scanning
- We can’t consume more than 90% of the CPU for more than a second
- We need to supply five new laptops, not two
- We both provide source code to escrow before the challenge and it is released afterwards
- We pay her $416,000
The first two requirements are easy to agree to. Of course, the rootkit also shouldn’t do either of those or it is trivially detectable by the user.
Five laptops? Sure, ok. The concern is that even a random guess could be right with 50% probability. She is right that we can make guessing a bad strategy by adding more laptops. But we can also do the same by just repeating the test several times. Each time we repeat the challenge, the probability that we’re just getting lucky goes down significantly. After five runs, the chance that we guessed correctly via random guesses is only 3%, the baseline she established for acceptability. But if she wants five laptops instead, that’s fine too.
I don’t have a problem open-sourcing the code afterwards. However, I don’t see why it’s necessary either. We can detect her software without knowing exactly how it’s implemented. That’s the point.
The final requirement is not surprising. She claims she has put four person-months work into the current Blue Pill and it would require twelve more person-months for her to be confident she could win the challenge. Additionally, she has all the experience of developing Blue Pill for the entire previous year.
We’ve put about one person-month into our detector software and have not been paid a cent to work on it. However, we’re confident even this minimal detector can succeed, hence the challenge. Our Blackhat talk will describe the fundamental principles that give the detector the advantage.
If Joanna’s time estimate is correct, it’s about 16 times harder to build a hypervisor rootkit than to detect it. I’d say that supports our findings.
[Edit: corrected the cost calculation from $384,000]
In case, you’re wondering how to get $416,000 from her numbers:
2 people * 8 hours/day * 5 days/wk * 26 wks * $200/hr = $416,000
Huzzah, Nate! May I call your attention to our HotOS submission if you’re looking for inspiration. I actually have sample code for the tlb pressure detection mechanism we outline, as well.
Thanks,
Keith
Keith: thanks. Yep, I’m aware of your paper and a little miffed that since HotOS comes before Blackhat, you got there first (although arguably Peter’s TLB paper was even earlier.) :) We have some techniques that are still novel though.
As a side note, I see you’re with VMWare. My inspiration came from working with the guts of the BIOS as FreeBSD’s ACPI maintainer. Perhaps the more horror you’ve seen in the depths of the x86 beast, the less likely you are to believe perfect emulation of it is possible.
Also: your comment earlier this year was right on. I hope this contest will at least provide some experimental evidence, versus dualing papers.
I think the purpose of open sourcing the code is to give something back to the community, as an example of both a rootkit, and a detection tool.
The first 4 changes seem reasonable. The money is completely ridiculous, however. Good luck!
“After five runs, the chance that we guessed correctly via random guesses is only 3%”
A little coin metaphor … either I don’t get randomness and probability, or you don’t:
There are two laptops: A and B.
She puts the rootkit on one of the laptops ==> she marks one of the laptops as being the “head” or the “tail” of the coin.
You say “it’s in laptop A” or “it’s in laptop B” ==> this is equivalent to “calling heads” or “calling tails”
The final part is throwing the coin: it flips and lands. You look at it and see if your guess matched randomness.
After repeating the process enough times you’ll get a 50/50 success rate.
How did you obtain you magical 3%?
(Nitpicking: I imply here that the algorithm doesn’t gain knowledge after being run once).
You must use conditional probability. The probability of guessing right 5 times by pure chance is equivalent to the probability that a sequence of 5 coin flips is exactly equal to another sequence of 5 coin flips in the same order and all, not just the % of tails and heads. If the sequence is just one coin flip the probability is 50%. If the sequence is 2 coin flips long the probability is 0.25 and so on. For five it is 1/(2^5)= 0.03125, or about 3%
Yeah, you’re right, I wasn’t thinking of “being right 5 times in a row” but, “what’s the probability of a random guess being correct if you repeat the experiment 5 times”.
One shouldn’t do probabilities, read slashdot and comment on blog whilst being a bit drunkish.
atomico: exactly, repeating the experiment 5 times gives a conditional probability that we merely guessed right of ~3%. In other words, there’s a 97% chance we could not guess right 5 times in a row just by randomly guessing.
Of course, this assumes Joanna can revert the laptops back to a “clean” state after each test run. We already agreed to bring more than two laptops in case this isn’t easy for her to do.
I think the money part could be done differently. If she’s right, you pay her (at a fair rate — $200/hr seems high for a programmer in Russia, especially considering how much she’d get out of the contest). If you’re right, she gets nothing, and maybe pays you.
Peter, what organization(s) with $400K want to buy a rootkit? … that will be open-sourced after they buy it? … that if we’re right, is easily detectable?
Oh-oh-oh… Some specimens of womenfolk drive me mad. “Never tell never”…
Dear Joanna,
I understand that you need some money and reputation. Everybody needs that.
But remember that there is exist some critical mass of crazy ideas, after which your image will be destroyed.
My suggestion to you: do smth. real instead populism.
Best regards,
not your fan
dk: It’s more interesting to focus on the claims and research, not ad-hominem attacks.
Nate Lawson: Actually you are absolutely right. It was my “weak moment”. =)
But life teach me that there is no absolute truth. And “blue pill” is a top-weapon just till the moment of beginning of development and testing. =)))
Joanna, I’m with you! My experience tells me that it’s rather easy to trick most of AV progs))) Not 100%, of cource)) But enough to do the job ;-) And then.. well, let them detect)))) I tested my tool with the most popular AV progs some days ago)) The results are as the following:
Symantec – Sleeps…
Kaspersky – Sleeps…
Panda – Sleeps…
Avast – Sleeps as well…
BitDefender – Sleeps…
And only NOD32 is a little bit nervous)))
One more thing I want to say is that the lion’s share of success falls on so called social engineering)) The primary target to trick is USER, not AV prog!
2Zanon Zealous:
She is expert in social engineering. =) This PR action about “blue pill” is one of SE tools =)))
2 dk:
Right you are, my dear friend!)))) IMHO, this very SE tool is very popular amoung our superstars)))
Peter, for the record, she is Polish not Russian. I don’t support the idea of paying any money on the development of BluePill for the contest, however.
atomico and Nate: Conditional probability while flipping a coin when the sequence DOES NOT matter? Were you sleeping on classes on the subject or what? It doesn’t matter if you guess right 3 times in a row, and than 2 times wrong, or the other way around. There is NO connection between those events, so in this case there is a big difference if you use 5 PCs at once or 2 PCs 5 times in a row. Such probabilities are not comparable.
I’m curious about the outcome of this contest though.
KubuS: I believe your comment should be addressed to atomico only. I didn’t say the particular order of guesses (right/wrong) mattered.
Your comment regarding 5 PCs versus 2 PCs 5 times doesn’t make sense. Remember, the goal was to protect against random guessing, assuming in the latter case that both PCs were fully reverted back to their original state (to keep each round fully independent). If you have 2 PCs and one of them has BP and one does not, a single guess is binary (left or right has BP), 50% probability. With 5 PCs and each of them has BP or not, each PC is a binary guess (BP or not), 50% probability. Probability-wise, these are EXACTLY the same.
The contest hasn’t occurred yet. Based on the code we’d written and the New Blue Pill code Joanna released, our checks would have detected BP. She would then have a chance to review our code, change NBP to detect our particular detector, and this would go on indefinitely (same as AV ecosystem).
Joanna has moved on from talking about whether hiding/detecting is possible to “detectors are just hacks and will be too complex to use in the real world”. The changing terms of the debate are frustrating, but a natural outcome of the fact that there is no actual virtualized malware in the world.