Wednesday, August 17, 2011

A note on malware detection performance

Yesterday, most news outlets published stories about a study done by NSS Labs showing that Internet Explorer 9 provides overwhelmingly better protection against phishing and malware sites than competing browsers, including Firefox. Specifically, the claim is that Internet Explorer detects 99.2% of all malicious internet content, compared to a measly 7.6% for Firefox.

This isn't the first such study conducted. Over 2 years ago, at the release of Internet Explorer 8, a similar study was done with similar but less extreme results. That study was harshly criticized for having a number of flaws. There is no indication that any of criticism was addressed in the new study.

Although many in the community have yet again pointed out the problematic aspects of the study, and hence properly put the results under scrunty, I believe some issues are worthy of a more detailed explanation.

Accuracy of the reported results

The main issue with the study is that it does not consider the tradeoff between false positives, namely, reporting malware when a download is in fact safe, and false negatives, namely, allowing the user to visit a dangerous site or allowing him or her to download malware. Due to the way the test was conducted, false positives were mostly neglected. In other words, the browser that simply gave most warnings, regardless of accuracy, had a large advantage compared to its competitors.

The SmartScreen service in Internet Explorer will give the user a warning as soon as he or she tries to download any uncommonly downloaded piece of software. This is done regardless of whether the download is malware or not. It should be obvious that by crying foul in every uncertain situation, you will cry foul in a dangerous situation, too, leading to a very high score in this test.

Unfortunately, giving a large number of false positives severely diminishes the value of the warnings. If you claim "the sky is falling" at every opportunity, users will start to ignore the warnings, and the effectiveness of the protection drops from 99.2% to 0%, which is infinitely worse than what other browsers offer. The situation should be familiar to Microsoft, who faced similar problems with the initial versions of their User Account Control feature in Windows Vista. It wasn't until the number of false positives dropped that the protection really became effective.

The report's tackling of false positives is limited to the following paragraph.
In addition, NSS maintains a collection of ‘clean URLs’ which includes such sites as yahoo, Amazon, Microsoft, Google, NSS Labs, major banks, etc. Periodically clean URLs were run through the system to verify browsers were not over-blocking.
Aside from the lack of real data, it's astonishing how biased this sample is. As if any major browser vendor, or provider of malware protection, would put out an update and fail to notice that he is blocking his users from visiting a major website! Of course, extending the sample even a little bit to, for example, homepages of random people, would have immediately shown Internet Explorer to produce false positives.

The problems of trading false positives for false negatives and evaluating differing tradeoffs are not new - they're a known topic for anti-spam filters, which must deal with similar tradeoffs when being compared against each other. One representation that is often used in anti-spam research papers is to present the results in the form of ROC curves. That certainly would have been more informative than a single percentage.

It's quite possible that Internet Explorer does in fact offer the best malware and phishing protection of the popular browsers. Microsoft has worked hard to improve in their security reputation, clearly focused on this area, and I'm sure they have smart engineers capable of writing good software. However, studies like these do their work no justice, don't help users make informed choices, and don't help us identify the real risks users are exposed to. This is a missed opportunity.

User privacy in collaborative filtering

The SmartScreen Filter that Internet Explorer uses is a collaborative filter. By it's nature, such a filter is dependent on gathering information from its users. The exact information gathered is described in their privacy policy. I've reproduced some relevant paragraphs below.
If you opt in to SmartScreen Filter, it first checks the address of the webpage you are visiting against a list of high-traffic webpage addresses stored on your computer that are believed by Microsoft to be legitimate. Addresses that are not on the local list and the addresses of files you are downloading will be sent to Microsoft and checked against a frequently updated list of webpages and downloads that have been reported to Microsoft as unsafe or suspicious.

When you use SmartScreen Filter to check websites automatically or manually, the address of the website you are visiting will be sent to Microsoft, together with standard computer information and the SmartScreen Filter version number. To help protect your privacy, the information sent to Microsoft is encrypted. Information that may be associated with the address, such as search terms or data you entered in forms might be included.

Some information about files that you download from the web, such as name and file path, may also be sent to Microsoft. Some website addresses that are sent to Microsoft may be stored along with additional information, including web browser version, operating system version, SmartScreen Filter version, the browser language, the referring webpage, and information about whether Compatibility View was enabled for the website.
While the amount and nature of the information sent to Microsoft may indeed be necessary to achieve the level of protection SmartScreen is claimed to give, it obviously comes at a severe cost of user privacy. 

I don't believe most people at Mozilla think it's reasonable to collect the information above, nor would I expect most of our user-base to feel comfortable with it. For this reason, its unlikely we would put up an identical service, let alone enable it by default.
The malware protection included in Firefox uses a cookie to provide Quality-Of-Service information wrt. updates to the providers of malware lists (currently Google). Although we did feel that that was a reasonable tradeoff, some users nevertheless object to it, and some effort is underway to remove even this.

That being said, our browser is extensible and has a wide variety of third-party add-ons, including some that extend it to give similar functionality, if the user feels the privacy versus security tradeoff is acceptable. A popular one seems to be Web-Of-Trust. Feel free to check out our security & privacy add-ons site to see additional protection available for Firefox.

Performance differences between Firefox, Chrome and Safari

One interesting thing that did come out of this study is that Chrome offers slightly better protection than Firefox or Safari. This is interesting, because all three browsers use exactly the same malware and phishing protection: the Google SafeBrowsing API. Firefox has some minor tweaks compared to Chrome to improve user privacy, but those should not have worsened the results, so it was expected that they would score similarly.

I spent some time looking at this, and what happened is that Google has been enhancing their SafeBrowsing system to detect malware downloads, tested this protection in Chrome and recently included it in the stable releases. Because the public documentation for the API hasn't been updated, Firefox and Safari have so far have not implemented this extension. 

We are currently discussing with Google on rectifying this, so Firefox will probably soon include the improved protection as well.

Concluding remarks

Being exposed to malicious content doesn't mean being infected by it. We do what we can to keep the browser secure and updated, and to inform the user when his plugins are out of date, and potentially exposed. We have made improvements that allow a user to more easily spot when he or she is being phished. Detecting malware is just one protection. Making it ineffective is another.

Regardless of what we think of the Internet Explorer results, we're still left with a claimed 7.6% detection rate for Firefox out of the box. This means that our current default detection is largely ineffective, and users have much better odds to be exposed to malicious content than that they have of being blocked by us. Even if the study numbers are inaccurate, this order-of-magnitude result probably does hold. I doubt we currently detect and block more than 50% of the malware out there.

That is not a result to be proud of, and we should improve it if possible.