Wednesday, August 17, 2011

A note on malware detection performance

Yesterday, most news outlets published stories about a study done by NSS Labs showing that Internet Explorer 9 provides overwhelmingly better protection against phishing and malware sites than competing browsers, including Firefox. Specifically, the claim is that Internet Explorer detects 99.2% of all malicious internet content, compared to a measly 7.6% for Firefox.

This isn't the first such study conducted. Over 2 years ago, at the release of Internet Explorer 8, a similar study was done with similar but less extreme results. That study was harshly criticized for having a number of flaws. There is no indication that any of criticism was addressed in the new study.

Although many in the community have yet again pointed out the problematic aspects of the study, and hence properly put the results under scrunty, I believe some issues are worthy of a more detailed explanation.

Accuracy of the reported results

The main issue with the study is that it does not consider the tradeoff between false positives, namely, reporting malware when a download is in fact safe, and false negatives, namely, allowing the user to visit a dangerous site or allowing him or her to download malware. Due to the way the test was conducted, false positives were mostly neglected. In other words, the browser that simply gave most warnings, regardless of accuracy, had a large advantage compared to its competitors.

The SmartScreen service in Internet Explorer will give the user a warning as soon as he or she tries to download any uncommonly downloaded piece of software. This is done regardless of whether the download is malware or not. It should be obvious that by crying foul in every uncertain situation, you will cry foul in a dangerous situation, too, leading to a very high score in this test.

Unfortunately, giving a large number of false positives severely diminishes the value of the warnings. If you claim "the sky is falling" at every opportunity, users will start to ignore the warnings, and the effectiveness of the protection drops from 99.2% to 0%, which is infinitely worse than what other browsers offer. The situation should be familiar to Microsoft, who faced similar problems with the initial versions of their User Account Control feature in Windows Vista. It wasn't until the number of false positives dropped that the protection really became effective.

The report's tackling of false positives is limited to the following paragraph.
In addition, NSS maintains a collection of ‘clean URLs’ which includes such sites as yahoo, Amazon, Microsoft, Google, NSS Labs, major banks, etc. Periodically clean URLs were run through the system to verify browsers were not over-blocking.
Aside from the lack of real data, it's astonishing how biased this sample is. As if any major browser vendor, or provider of malware protection, would put out an update and fail to notice that he is blocking his users from visiting a major website! Of course, extending the sample even a little bit to, for example, homepages of random people, would have immediately shown Internet Explorer to produce false positives.

The problems of trading false positives for false negatives and evaluating differing tradeoffs are not new - they're a known topic for anti-spam filters, which must deal with similar tradeoffs when being compared against each other. One representation that is often used in anti-spam research papers is to present the results in the form of ROC curves. That certainly would have been more informative than a single percentage.

It's quite possible that Internet Explorer does in fact offer the best malware and phishing protection of the popular browsers. Microsoft has worked hard to improve in their security reputation, clearly focused on this area, and I'm sure they have smart engineers capable of writing good software. However, studies like these do their work no justice, don't help users make informed choices, and don't help us identify the real risks users are exposed to. This is a missed opportunity.

User privacy in collaborative filtering

The SmartScreen Filter that Internet Explorer uses is a collaborative filter. By it's nature, such a filter is dependent on gathering information from its users. The exact information gathered is described in their privacy policy. I've reproduced some relevant paragraphs below.
If you opt in to SmartScreen Filter, it first checks the address of the webpage you are visiting against a list of high-traffic webpage addresses stored on your computer that are believed by Microsoft to be legitimate. Addresses that are not on the local list and the addresses of files you are downloading will be sent to Microsoft and checked against a frequently updated list of webpages and downloads that have been reported to Microsoft as unsafe or suspicious.

When you use SmartScreen Filter to check websites automatically or manually, the address of the website you are visiting will be sent to Microsoft, together with standard computer information and the SmartScreen Filter version number. To help protect your privacy, the information sent to Microsoft is encrypted. Information that may be associated with the address, such as search terms or data you entered in forms might be included.

Some information about files that you download from the web, such as name and file path, may also be sent to Microsoft. Some website addresses that are sent to Microsoft may be stored along with additional information, including web browser version, operating system version, SmartScreen Filter version, the browser language, the referring webpage, and information about whether Compatibility View was enabled for the website.
While the amount and nature of the information sent to Microsoft may indeed be necessary to achieve the level of protection SmartScreen is claimed to give, it obviously comes at a severe cost of user privacy. 

I don't believe most people at Mozilla think it's reasonable to collect the information above, nor would I expect most of our user-base to feel comfortable with it. For this reason, its unlikely we would put up an identical service, let alone enable it by default.
The malware protection included in Firefox uses a cookie to provide Quality-Of-Service information wrt. updates to the providers of malware lists (currently Google). Although we did feel that that was a reasonable tradeoff, some users nevertheless object to it, and some effort is underway to remove even this.

That being said, our browser is extensible and has a wide variety of third-party add-ons, including some that extend it to give similar functionality, if the user feels the privacy versus security tradeoff is acceptable. A popular one seems to be Web-Of-Trust. Feel free to check out our security & privacy add-ons site to see additional protection available for Firefox.

Performance differences between Firefox, Chrome and Safari

One interesting thing that did come out of this study is that Chrome offers slightly better protection than Firefox or Safari. This is interesting, because all three browsers use exactly the same malware and phishing protection: the Google SafeBrowsing API. Firefox has some minor tweaks compared to Chrome to improve user privacy, but those should not have worsened the results, so it was expected that they would score similarly.

I spent some time looking at this, and what happened is that Google has been enhancing their SafeBrowsing system to detect malware downloads, tested this protection in Chrome and recently included it in the stable releases. Because the public documentation for the API hasn't been updated, Firefox and Safari have so far have not implemented this extension. 

We are currently discussing with Google on rectifying this, so Firefox will probably soon include the improved protection as well.

Concluding remarks

Being exposed to malicious content doesn't mean being infected by it. We do what we can to keep the browser secure and updated, and to inform the user when his plugins are out of date, and potentially exposed. We have made improvements that allow a user to more easily spot when he or she is being phished. Detecting malware is just one protection. Making it ineffective is another.

Regardless of what we think of the Internet Explorer results, we're still left with a claimed 7.6% detection rate for Firefox out of the box. This means that our current default detection is largely ineffective, and users have much better odds to be exposed to malicious content than that they have of being blocked by us. Even if the study numbers are inaccurate, this order-of-magnitude result probably does hold. I doubt we currently detect and block more than 50% of the malware out there.

That is not a result to be proud of, and we should improve it if possible.

Friday, July 29, 2011

Bringing SafeBrowsing to mobile devices

Firefox Mobile is in most respects the exact same browser as you get with Firefox on the desktop. Nevertheless, to make it usable on mobile devices, some small compromises have been made. One of these is that SafeBrowsing is currently not enabled in Mobile builds. We are now pretty close to fixing this, but there are some interesting issues remaining.

What is SafeBrowsing?
    SafeBrowsing is a Google-designed protocol that allows Firefox and Chrome to alert the user whenever (s)he visits a webpage that is either phishing for private information or contains malware. It works by incrementally downloading a compressed and encrypted list of "bad" URLs and checking every URL the user wants to visit against the locally stored list. If we find a potential match, we query the SafeBrowsing server directly to verify if it's a true match, and to see if it is still valid. If it is, we pop up a warning message and block the page from loading.
    Mozilla has added some additional features in Firefox to ensure our users' privacy when this feature is enabled. For example, when the Google server is queried to verify an URL, Firefox will send several random requests along with the true one, masking the exact website being visited.

    Why is/was it not enabled in Firefox Mobile?

    SafeBrowsing was initially prototyped in Google Toolbar for Firefox, and eventually contributed by Google engineers into Firefox itself just before version 3.0. The implementation in Firefox, while working, isn't terribly efficient. There are several reasons for that, including the deadline to get it shipped in Firefox 3.0 and the need to make protocol changes to be able to handle the entire Firefox user-base without putting undue stress on the Google-provided servers.
    One of the shortcuts made was to use SQLite to store the URL database. While SQLite is a fine database, it's not the most terribly efficient way to store a highly compressed dataset. Furthermore, because it needs to be probed on each and every URL loaded, and the loads are essentially random, it requires caching to deliver good performance. When processing an update, the behavior can get really bad.
    The net result is that the current SafeBrowsing implementation can use up to 50Mb of disk space, and in some circumstances, the same amount of memory. Gobbling up such an amount of resources would not be acceptable to people using a mobile device, so we had to disable it. 
    Because downloading and constructing the list happens in the background, the memory usage of Firefox can go up even when you are not visiting any sites. This causes many desktop users to mistake it for a memory leak, and it hence has been flagged as a priority in the MemShrink efforts, too.

    How is this being solved?

    The URL list is distributed as a set of SHA-256 hashes truncated to 32-bits (hereafter called prefixes), generated by passing the real URLs through a canonicalization algorithm and hashing the result. Each bunch of prefixes has an associated chunk-id, which is used during updates, and a list-id, which tells us whether it is malware or a phishing attempt. The real problem is how to store these 32-bit values efficiently and still be able to quickly search through them. The current SafeBrowsing database contains about 600K entries. Each entry contains both a 32-bit hash of the hostname and the prefix hash. We can drop the hostname hashes, as they are only a small performance optimization (*). The raw storage required, if we further assume the chunk-id and the list-id can be merged into a single 32-bit number, is about 5Mb. Simply sorting the list sorted by prefix allows O(log(n)) look-up and O(n) processing on updates, which is fast enough as long as we keep the list in memory.

    While 5Mb is an order of magnitude improvement over the 50Mb we have now, it is interesting to try to improve it further. Mobile devices have a limited amount of storage, and particularly for devices that have no option to insert extra memory cards, space comes at a premium. Firefox consumes about 25Mb of flash storage (which already generates a lot of feedback that it is "way bloated"), and increasing that by 20% just for phishing protection seems excessive. 

    Current improvements

    There are 2 interesting data-structures that allow further compression of this prefix set while still allowing efficient lookup: Bloom filters and Prefix Tries. Note that the prefixes themselves, being truncated SHA-256 hashes, are essentially random numbers and will hence resist traditional compression techniques.

    Bloom filters work by passing each prefix through a series of hashes, and putting a mark in a bit array for each hash. Because the marks might collide, it is a probabilistic algorithm with potentially false positives. On the upside, you can trade the storage requirements of a Bloom filter against the probability of false positives. We considered using a Bloom filter for our SafeBrowsing database, but the mere possibility of false positives requires a secondary storage to correctly handle updates and to prevent leaking privacy information when it's not strictly necessary. Google found this an acceptable compromise for desktop usage and chose this solution in their re-implementation of SafeBrowsing in Chrome, backing up a 1.9Mb Bloom filter in memory with a larger raw store on disk.

    Prefix Tries work by observing that, when sorted, many prefixes will have the leading part in common. By storing only the differences between those prefixes, good compression can be achieved. Given that the list of prefixes is sorted, and we occasionally code a full value to allow restarting the decompression, fast lookup is possible by doing a binary search in the uncompressed values and then decompressing the deltas. On the SafeBrowsing prefixes, about 50% compression is achievable while still maintaining fast lookup. Prefix Tries do not suffer from false positives, and do not need  backup storage.

    A preliminary implementation of Prefix Tries that allows us to reduce the memory requirements for desktop Firefox from a worst case of 50Mb to about 2.1Mb is now ready in bug 669410 and is expected to be in the Nightly builds soon. We will probably reduce this to 1.2Mb after verifying the performance impact of dropping the (*) optimization mentioned above. Work is ongoing to add (compressed) storage of chunk-id to this, removing SQLite entirely and reducing the disk space requirement to about 2.4Mb. This should be good enough to enable SafeBrowsing on mobile.

    Future optimization

    While this is a nice improvement, we do want "smaller! faster!" whenever possible, so it's interesting to study if we can go beyond this. Because the prefixes are random values, I personally don't believe much further gain can be had there (though I'd be thrilled to be proven wrong!). Perhaps if there were some way to obtain the original URLs, yes, but I understood from Google that they cannot do this as they license parts of the URL list from a third party.

    The chunk information is an interesting candidate. The chunk numbers come from a more or less consecutive range, many chunk numbers reappear multiple times, and there are currently only about 16K distinct chunks in use. If we could sort the data by chunk-id and use delta compression on them, they could probably be stored in less than a bit per entry. Unfortunately, sorting the data would break the prefix trie compression and we end up gaining 2 bytes/entry and losing 2 bytes/entry at the same time. Sorting by chunk makes the lookups slower, so this isn't useful.

    Now I have a crazy idea: what if we didn't store the chunk-id information at all? According to my reading of the specification, chunk-id information is only really needed in 2 specific cases:
    1. When the client connects, he has to tell the server which chunk ranges he currently has. There is no need to actually store the chunk-id for every prefix to know this. We can simply track how much prefixes are in every chunk, and increment or decrement this number on every chunk add or delete.
    2. The protocol allows the server to send an adddel or subdel command, which requires the client to expire an entire chunk. Note that normal add or sub commands don't require us to know the chunk-id, as they include the prefixes on which they act as well.
    So what do we do if we get an adddel or subdel?

    We drop the entire database and start downloading again.

    I know you must be thinking I am crazy and wasteful at this point, but observing how Google's server actually behaves, it might very well work perfectly. The key to realize is that expiring entire chunks is not a normal occurrence. What is far more typical is that individual entries in a chunk are expired piece by piece in due time, until the chunk is empty. The only way I seem to be able to convince the server to expire entire chunks, is to purposely claim I have ancient chunks in my database. In normal usage, meaning regular connection to the internet, those expires don't seem to occur. And if we do have a database with ancient chunks, dropping and re-downloading it entirely isn't going to be that wasteful anyhow.

    I believe this behavior would be perfectly fine to do and still comply with the current protocol, but obviously something like this shouldn't be implemented until the Google guys can confirm they don't think we're insane and that the server does indeed behave as described (or can be patched to do so).

    One optimization that can be applied server-side is to estimate whether it would be cheaper to send an adddel/subdel, and hence drop the database, or to send the sub chunks for all entries it wants to remove. This ensures that the absolute worst-case behavior is in no case more than a doubling of bandwidth, with the typical case being no change in required bandwidth at all. Clients who need this optimization can be detected by the client-id and client-ver that the protocol already requires.

    If we're willing to define a new protocol, the chunk-id numbers can go entirely, and be replaced by adding a sequential number to every update. In that case, the client only has to tell the server the first and last update it has ever seen, and server nor client need to transmit or store chunk-ids at all.