Highly predictive blacklists: What, how, and caveats

General blacklisting is not always efficient--SRI International and the SANS Institute have developed highly predictive blacklists. Find out more in this article.


Internet surfing is tedium busting entertainment for employees, but a source of never-ending frustration for security professionals.

Add to this the penchant for employees to click on anything that looks interesting and the black hats' increasing creativity, and you have the right conditions for network infrastructure meltdown.

For years, the most fundamental defense has been the perimeter firewall. Properly configured, it protects from bad or malicious traffic egress or ingress.

One way to prevent unwanted access to or intrusion from known problem sites is configuration of firewall packet filters, based on IP address blacklists. However, general blacklisting is not always efficient. To enable organizations to be more proactive, and minimize firewall processor allocation for blacklist filtering, SRI International and the SANS Institute have developed highly predictive blacklists (HPB).

Blacklists vs. whitelists
Theoretically, the ideal way to prevent employees from reaching problem sites is whitelisting. If a site's address is not in the list of approved destinations, packets are stopped at the firewall.

The same is true for incoming packets. Traffic from external sources which do not appear in a firewall's whitelist is dropped before it can enter the internal network. Whitelisting, however, is not usually practical in the real world.

Maintenance of whitelists is neverending, with continuous requests for additional site access and countless access control groups representing unique real or perceived needs of departments, management levels, etc. There is also the problem of ensuring sites on the whitelist don't turn to the dark side, either purposely or via unwanted infection.

Because of the problems surrounding whitelist implementation, most organizations opt for blocking known or suspected malicious sites, or blacklisting.

Traditional blacklist implementation
Blacklists fall into two categories: global worst offender lists (GWOL) and local worst offender lists (LWOL).
GWOLs include all known problematic addresses based on information gathered from hundreds or thousands of locations across the global Web. Sites like the SANS Institute's DShield.org provide lists of IP address and ports which present general threats to connected entities. GWOLs invariably block IP addresses that will never present a threat to a specific local network. Although this approach works in the short term, the firewall may eventually be overloaded by an unnecessarily large filter set. Your network might be safe, but at the cost of potentially unacceptable latency issues.


A LWOL is built at the organization level. It's based on traffic seen at the organization's firewall. Security engineers can use either of these lists to proactively block traffic from those locations. While a GWOL is relatively proactive, including information your firewall might not have actually seen yet, a LWOL can only block traffic your firewall is already aware of. In other words, using a LWOL is completely reactive. By the time you decide to block an address or port, it might already be too late.

Security engineers can use one or both of these approaches to block unwanted ingress or egress packets. Neither GWOLs nor LWOLs provide proactive protection for a specific connected network. Using technology similar to Google's PageRank algorithm, Jian Zhang (SRI), Phillip Porras (SRI), and Johannes Ullrich (SANS Institute) have developed, tested, and documented a different approach–HPBs.
Overview of how HPB works

HPBs use a multiphase approach to produce blacklists potentially unique for each organization or organization type participating in the HPB process. (Refer to Figure 1).
HPB Phases
Figure 1 (Zhang, Porras, & Ullrich)
All phases feed and are fed by DShield.org. Contributors (anyone with a firewall log to share) upload firewall logs. In the first phase, logs are pre-filtered to remove unreliable alert content, including:
  • Invalid or unassigned address spaces
  • Network addresses of Internet measurement services, web crawlers, or common software update sources
  • Common source ports, such as TCP 53 (DNS) and 25 (SMTP)
The logs are added to those of other contributors for the second phase, in which attack sources are prioritized by contributor, based not only on how many instances of a specific type of attack, but also the types of networks/organizations targeted.

Finally, phase three assesses the severity of each attack.

Relevance and severity rankings are based on a contributor's network characteristics (i.e., size, industry, etc.). The output of the HPB process is a prioritized blacklist for each contributor, optimized to help prevent unneeded firewall filter set entries while providing proactive defense against attacks seen by similar contributors.
The downside
HPB seems like a good idea, but it has issues. First, there is a privacy consideration. The HPB project is still in testing. Submitting your firewall logs to DShield.org might expose their contents to developers, analysts, third party consultants, etc. Be sure you and your management team are OK with this before jumping into the pool.

Second, relevancy depends on other organization like yours participating in the process by submitting logs to DShield.org. As the number of similar organizations increases, the relevance rating becomes increasingly, well, relevant.

Finally, HPBs are not the Holy Grail. Employees will continue to find ways around blacklists configured at the firewall, including online proxy services. Any firewall-based blacklisting solution should be supported with a Web filtering solution which blocks access to "workarounds". WebSense is a good solution for those with a budget. For others, OpenDNS might be sufficient.

The final word
HPBs seem like a great solution. However, we might have some distance to travel before we reach the point at which they provide information significantly more useful than traditional blacklisting approaches. Before depending on this new technology, be sure you understand what the downsides mean to your organization.

Tom Olzak is an IT professional with over 24 years experience. He holds CISSP and MCSE certifications and an MBA. Currently, he is Director of Information Security for HCR Manor Care.