Device and Traffic Filtering Techniques: What You Should Know

Device and Traffic Filtering Techniques: What You Should Know

April 7, 2017, Darren Spruell

mm

In the vast World Wide Web, there is a multitude of precious commodities, but one rules above all: traffic. Traffic describes visitors to websites, with each single click and background request counting as a minuscule but significant drop in a vast pool of monitored, tracked, and often commoditized data points. Everyone wants a piece of the traffic pie, and with today’s heavily web-driven Internet, there’s enough to go around. 

However, not all traffic is created equal. Beggars can be choosers, and on both sides of the equation, legitimate web traffic buyers, as well as black hat elements, can be very picky in the makeup and characteristics of traffic they desire. Depending on a traffic buyer’s goals, some traffic can fetch a premium while some other may simply be dropped on the floor, never seeing the light of an analytics dashboard.

With the high demand for quality traffic, traffic filtering has become a lucrative operation. Here’s what you need to know.

What is traffic?

Traffic is what connects a computer with a web browser and an actual human being with digital content sitting on a hosting server. Without the former, the latter has no purpose nor value whatsoever making traffic an essential commodity for legitimate web companies and the criminal underground economy alike.

Traffic is redirected and shuffled around the web, the visitor info of which is passed along as part of partnerships and business contracts. In fact, entire ecosystems and economies have developed around traffic that buy, sell, and trade it. Products and services have been designed and deployed to aid in analyzing and classifying it. Code that tracks traffic is deployed at several points along the way, usually on services no one readily sees or recognizes, each gathering their small amount of insight that can drive content to optimize the flow of traffic.

Traffic and the insight thereof is critical to the giant exchanges in the online advertising space and the very many niche affiliate network programs that help monetize and squeeze every last fraction of a penny of indirect marginal profit out of the edges of the URL visited. It is also a key commodity in the underground economy, playing an important role in malware delivery operations

Monetizing traffic in malicious web space via traffic filtering

This practice runs rampant across the web traffic space, from web advertisements to malicious traffic distribution feeding the latest malware variant to the un-patched masses. The latter of these is arguably the most interesting because Internet attackers can cast wide nets to reach many potential victims in their campaigns. For instance:

  • Spam is distributed to large lists of recipients, sending malicious URLs and attachments to contacts that may have been stolen from address books, harvested from websites, collected from data breach dumps, or purchased from various sellers and marketing database suppliers.
  • Web traffic is hijacked and rerouted from thousands of compromised websites, bringing all kinds of visitors who are running various desktop operating systems and web browsers and using different types of mobile devices.
  • SEO efforts pump thousands of websites running cloned content that would never be sought out by visitors to the top of search results, drawing more clicks.
  • Traffic is outright purchased, leveraging the economy of scale of the underground to supply more volume. Some actors tap into the advertising space and the significant amount of traffic it can provide. Others seek out the bounties of web traffic from the adult publisher space and low-quality ads space supplying traffic from popups, popunders, and unintended affiliate offer clicks.

Suffice to say, at the end of the day, many avenues can bring enormous volumes of traffic. Once this traffic is on hand, many actors take on the task of refining it to capture the desired value.

For “traffers,” or actors dealing illegally in traffic, it’s a matter of separating the wheat from the chaff. Traffers attempting to monetize web traffic directly or to sell it to others need to be able to deliver as clean and valuable a product as possible. For someone casting a wide net and wanting to deliver a high-value product to buyers, or pre-filter the noise on collected information or reduce costs processing data with an external service, this means a potentially significant amount of filtering along the line. Filtering is what refines a flood of traffic into a higher-value set of traffic that can achieve intended goals. Filtering is also what some actors rely on to preserve the lifetime of their campaigns or reduce the chances of facing prosecution for their actions.

Filtering techniques

It is helpful to understand the many ways in which attackers filter clients in their infrastructure.

There are a few main reasons why filtering occurs:

  • Optimization: Making sure traffic is unique. Traffers often get paid only for unique traffic.
  • Targeting: Making sure traffic meets requirements for location, user or device type, and similar constraints.
  • Anti-analysis/anti-research: Making sure malicious traffic distribution and end-threats are not analyzed and detected by security companies or related investigators.

The following chart lists a few known techniques, showing whether they are considered lower-level or higher-level, whether detection logic is implemented on the remote server or via client-side code, and additional context:

Technique Layer Implemented Type
Passive OS fingerprinting Low (client IP stack) Server-side Anti-analysis
User-agent Medium (specified in HTTP

headers or via DOM)

Server-side or Client-side Targeting, anti-analysis
Device language Medium (specified in HTTP headers) Server-side or Client-side Targeting, anti-analysis
Traffic type Low Server-side Targeting, anti-analysis
Geosource Low (based on socket remote

IP address)

Server-side Targeting, optimization, anti-analysis
Device attributes/feature support High Client-side Targeting, anti-analysis
Header anomalies Medium Server-side Anti-analysis
Information leakage Low Client-side Anti-analysis
Cookies High Server-side and Client-side Optimization, anti-analysis
Ad platform High Server-side Targeting
Time zones Low Server-side Targeting, anti-analysis

As noted, many options exist for filtering devices and network traffic, operating at multiple levels and on both sides of the connection. Below, I’ve offered more information on some of these techniques.

  • Passive OS fingerprinting: a technique based on the ability to detect operating system family, and sometimes version, by passively observing TCP/IP header attributes on inbound traffic to a host. Much of the research in this area originates from Michal Zalewski’s work on p0f, and the capability to filter traffic by OS fingerprint has been implemented in forms such as OpenBSD’s packet filter (PF) and the OSF extension for Linux Netfilter. In early 2016, Trustwave SpiderLabs shared observations around the use of this feature in connection with the Neutrino exploit kit, allowing the attackers to filter out remote Linux clients while accepting connections from Windows clients.We’ve more recently seen signs of this technique in use with other traffic distribution schemes such as NeutrAds. A redirector was observed at the following address in February of this year:

cdnsilo.space        45.32.107.117

45.32.107.117    AS20473 | US | AS-CHOOPA – Choopa LLC

Testing from a macOS client, traffic is allowed:

$ nc -w3 -vz 45.32.107.117 80

45.32.107.117 80 (http) open

Testing from an OpenBSD client, traffic is also allowed:

$ nc -w3 -vz 45.32.107.117 80

Connection to 45.32.107.117 80 port [tcp/www] succeeded!

However testing from a Linux client, the connection is refused and traffic is filtered:

$ nc -w3 -vz 45.32.107.117 80

nc: connect to 45.32.107.117 port 80 (tcp) timed out: Operation now in progress

And of course, traffic from Windows clients (the primary target operating system in these kinds of attacks) is allowed.

  • User agent. User agent-based filtering is crude, but broadly effective and works by simply dropping or routing traffic when the user-agent request header doesn’t match the desired device/browser type. This filtering can be implemented server-side, but it is common to see this handled client-side in JavaScript, leading unwanted clients to filter themselves out.
With the high demand for quality traffic, traffic filtering has become a lucrative operation. Here's what you need to know.

Fig-1 User-agent based filtering in malicious traffic distribution chain

  • Traffic type. Often, actors will filter web traffic by traffic type without focusing on the device characteristics. For example, it is possible to filter traffic by source such that desktop/residential traffic can be distinguished from server/hosting traffic, or that traffic from mobile carriers can be identified to target mobile clients, and so on. Often, this is achieved by mapping the routing autonomous system (AS) advertising the client’s source IP address to a traffic type, and implementing filters based on AS.
  • Geosourcing. This is a common and widely used manner of traffic selection based on the apparent geographical origin of traffic and used in both legitimate targeting and black hat traffic filtering. For example, an actor attempting to distribute a banker trojan to a small group of countries in which fraud will be committed to cash out cards may want to distribution the trojan only to victims within target countries to bypass fraud protections at banks. Or, he may want to leverage an existing network of money mules within the target region.

    By geolocking campaigns to specific countries or regions, upstream traffic distribution systems (TDS) may filter or redirect traffic closer to the client, before traffic is forwarded to backend systems such as exploit kits and malware hosting infrastructure. Another advantage of geosource-based filtering is that traffic from security researchers or eCrime investigators originating in other regions may be dropped, limiting disclosure of details that could impact that malicious actor’s operation.
  • Device attributes/feature support. This manner of filtering overlaps with the area of device and browser fingerprinting that has recently advanced with the evolution of more recent web standards such as HTML5 and features like WebGL and richer multimedia support in browsers. It is trivial today to fingerprint many devices or browsers at various levels, and filtering traffic based on presence or absence of specific features on the client is a step beyond simple user-agent based logic.

This technique has been seen in various ways in malicious traffic distribution, and one recent example is the manner in which the pre-landing page implemented with the RIG exploit kit uses it to filter out bot traffic. A sample is shown below:

With the high demand for quality traffic, traffic filtering has become a lucrative operation. Here's what you need to know.

Fig-2 RIG exploit kit bot filtering code relies on detection of actual browser features vs. the user-agent advertised by the client

  • Information leakage. Web browsers support many features allowing remote websites to query information and capabilities from the client, but one thing browsers should not be able to do is allow the site to query low-level and underlying information from the operating system. However, browsers like MSIE have done this in various ways over the years. A class of information-disclosure vulnerabilities in components such as Internet Explorer and Microsoft XML (MSXML) have allowed malicious web apps such as exploit kits and redirectors to query for the presence of specific files and read data from files on the client filesystem.

    These vulnerabilities allow attackers to abort attacks early in the process when a remote system is determined to be running specific anti-malware or virtualization software, reducing or postponing likelihood of detection. They also enable attackers to perform the type of anomaly-based detection described above and filter out defensive tools such as virtual honeypot clients, sandboxes, and crawlers based on determination that they do not have files such as video drivers or OEM manufacturer logo files on their filesystem—files commonplace across many legitimate desktop systems (the intended victims of the actors).

    This technique has been utilized across threat actor classes in the space, leveraged during both targeted cyber espionage operations as well as in widespread criminal malware distribution campaigns.

Examples of information leakage vulnerabilities used include:

  • CVE-2013-7331 – MSIE XMLDOM res://; various threats; patched with MS14-052 (reference)
  • CVE-2015-2413 – onload res:// variant, used in Magnitude and a redirector to Magnitude, Angler; MSIE 6-11 patched with MS15-065 (reference)
  • CVE-2016-0162 – identified in AdGholas, MSIE 9-11 patched with MS16-037 (reference)
  • CVE-2016-3351 – MimeType – MSIE 9-11 patched with MS16-105 (reference)
  • CVE-2016-3298 – MSIE 9-11 patched with MS16-118, MS16-126 (reference)

A recently disclosed occurrence from Trend Micro and Kafeine shows continuation of use with the AdGholas malvertising group:

  • CVE-2017-0022 – identified in AdGholas and added to Neutrino EK, MSXML patched with MS17-022 (reference)
With the high demand for quality traffic, traffic filtering has become a lucrative operation. Here's what you need to know.

Fig-3 Information disclosure related device probing utilized during malvertising campaign

  • Ad platforms. Ad platforms such as demand-side platforms (DSPs) provide tools advertisers need to target audiences by leveraging data gathered in the ad ecosystem. Attackers in the ads space may tap into capabilities of ad platforms to gain access to targeted audiences based on criteria such as geography, device/browser types, interests, publisher classes, and more. This enables malvertisers, for example, to enable traffic filtering closer to the origin and not have to deal with matters further downstream.

A continuing trend

As noted, many ways of filtering traffic and devices exist today, and this remains an evolving and active area of research for both legitimate parties as well as Internet attackers. It is likely that as the window of opportunity to capitalize on the web delivery space closes with the continual hardening of the desktop browser and OS ecosystem, attackers in this space will seek new avenues to optimize their efforts and counter mitigation efforts that impede their profits. Particularly, the techniques around using information leakage vulnerabilities to probe remote clients for detection of virtual machines and bots presents interesting opportunities to adversaries since these vulnerabilities are often slow to be mitigated by affected vendors and sometimes slower to be patched among the victim base, who view such vulnerabilities as low priority enhancements.

Awareness of these types of tactics is valuable for defenders as it can aid in the detection of such activity, in classification of various attacker operations, and highlight advances undertaken by relatively advanced threat groups who utilize these tactics as they attempt to maintain their positions and profits in the underground. RiskIQ monitors this type of activity on a continual basis using our web data collection platform driven by virtual users and URL Intelligence services. Staying informed on the use of tactics such as these in the threat space is helpful in today’s battle to protect users and systems from external threats.

Questions? Feedback? Email research-feedback@riskiq.net to contact our research team.

Share: