defocus view of city night.
In the vast World Wide Web, there is a multitude of precious commodities, but one rules above all: traffic. Traffic describes visitors to websites, with each single click and background request counting as a minuscule but significant drop in a vast pool of monitored, tracked, and often commoditized data points. Everyone wants a piece of the traffic pie, and with today’s heavily web-driven Internet, there’s enough to go around.
However, not all traffic is created equal. Beggars can be choosers, and on both sides of the equation, legitimate web traffic buyers, as well as black hat elements, can be very picky in the makeup and characteristics of traffic they desire. Depending on a traffic buyer’s goals, some traffic can fetch a premium while some other may simply be dropped on the floor, never seeing the light of an analytics dashboard.
With the high demand for quality traffic, traffic filtering has become a lucrative operation. Here’s what you need to know.
Traffic is what connects a computer with a web browser and an actual human being with digital content sitting on a hosting server. Without the former, the latter has no purpose nor value whatsoever making traffic an essential commodity for legitimate web companies and the criminal underground economy alike.
Traffic is redirected and shuffled around the web, the visitor info of which is passed along as part of partnerships and business contracts. In fact, entire ecosystems and economies have developed around traffic that buy, sell, and trade it. Products and services have been designed and deployed to aid in analyzing and classifying it. Code that tracks traffic is deployed at several points along the way, usually on services no one readily sees or recognizes, each gathering their small amount of insight that can drive content to optimize the flow of traffic.
Traffic and the insight thereof is critical to the giant exchanges in the online advertising space and the very many niche affiliate network programs that help monetize and squeeze every last fraction of a penny of indirect marginal profit out of the edges of the URL visited. It is also a key commodity in the underground economy, playing an important role in malware delivery operations
This practice runs rampant across the web traffic space, from web advertisements to malicious traffic distribution feeding the latest malware variant to the un-patched masses. The latter of these is arguably the most interesting because Internet attackers can cast wide nets to reach many potential victims in their campaigns. For instance:
Suffice to say, at the end of the day, many avenues can bring enormous volumes of traffic. Once this traffic is on hand, many actors take on the task of refining it to capture the desired value.
For “traffers,” or actors dealing illegally in traffic, it’s a matter of separating the wheat from the chaff. Traffers attempting to monetize web traffic directly or to sell it to others need to be able to deliver as clean and valuable a product as possible. For someone casting a wide net and wanting to deliver a high-value product to buyers, or pre-filter the noise on collected information or reduce costs processing data with an external service, this means a potentially significant amount of filtering along the line. Filtering is what refines a flood of traffic into a higher-value set of traffic that can achieve intended goals. Filtering is also what some actors rely on to preserve the lifetime of their campaigns or reduce the chances of facing prosecution for their actions.
It is helpful to understand the many ways in which attackers filter clients in their infrastructure.
There are a few main reasons why filtering occurs:
The following chart lists a few known techniques, showing whether they are considered lower-level or higher-level, whether detection logic is implemented on the remote server or via client-side code, and additional context:
| Technique | Layer | Implemented | Type |
| Passive OS fingerprinting | Low (client IP stack) | Server-side | Anti-analysis |
| User-agent | Medium (specified in HTTP headers or via DOM) | Server-side or Client-side | Targeting, anti-analysis |
| Device language | Medium (specified in HTTP headers) | Server-side or Client-side | Targeting, anti-analysis |
| Traffic type | Low | Server-side | Targeting, anti-analysis |
| Geosource | Low (based on socket remote IP address) | Server-side | Targeting, optimization, anti-analysis |
| Device attributes/feature support | High | Client-side | Targeting, anti-analysis |
| Header anomalies | Medium | Server-side | Anti-analysis |
| Information leakage | Low | Client-side | Anti-analysis |
| Cookies | High | Server-side and Client-side | Optimization, anti-analysis |
| Ad platform | High | Server-side | Targeting |
| Time zones | Low | Server-side | Targeting, anti-analysis |
As noted, many options exist for filtering devices and network traffic, operating at multiple levels and on both sides of the connection. Below, I’ve offered more information on some of these techniques.
cdnsilo.space 45.32.107.117
45.32.107.117 AS20473 | US | AS-CHOOPA – Choopa LLC
Testing from a macOS client, traffic is allowed:
$ nc -w3 -vz 45.32.107.117 80
45.32.107.117 80 (http) open
Testing from an OpenBSD client, traffic is also allowed:
$ nc -w3 -vz 45.32.107.117 80
Connection to 45.32.107.117 80 port [tcp/www] succeeded!
However testing from a Linux client, the connection is refused and traffic is filtered:
$ nc -w3 -vz 45.32.107.117 80
nc: connect to 45.32.107.117 port 80 (tcp) timed out: Operation now in progress
And of course, traffic from Windows clients (the primary target operating system in these kinds of attacks) is allowed.
Fig-1 User-agent based filtering in malicious traffic distribution chain
This technique has been seen in various ways in malicious traffic distribution, and one recent example is the manner in which the pre-landing page implemented with the RIG exploit kit uses it to filter out bot traffic. A sample is shown below:
Fig-2 RIG exploit kit bot filtering code relies on detection of actual browser features vs. the user-agent advertised by the client
Examples of information leakage vulnerabilities used include:
A recently disclosed occurrence from Trend Micro and Kafeine shows continuation of use with the AdGholas malvertising group:
Fig-3 Information disclosure related device probing utilized during malvertising campaign
As noted, many ways of filtering traffic and devices exist today, and this remains an evolving and active area of research for both legitimate parties as well as Internet attackers. It is likely that as the window of opportunity to capitalize on the web delivery space closes with the continual hardening of the desktop browser and OS ecosystem, attackers in this space will seek new avenues to optimize their efforts and counter mitigation efforts that impede their profits. Particularly, the techniques around using information leakage vulnerabilities to probe remote clients for detection of virtual machines and bots presents interesting opportunities to adversaries since these vulnerabilities are often slow to be mitigated by affected vendors and sometimes slower to be patched among the victim base, who view such vulnerabilities as low priority enhancements.
Awareness of these types of tactics is valuable for defenders as it can aid in the detection of such activity, in classification of various attacker operations, and highlight advances undertaken by relatively advanced threat groups who utilize these tactics as they attempt to maintain their positions and profits in the underground. RiskIQ monitors this type of activity on a continual basis using our web data collection platform driven by virtual users and URL Intelligence services. Staying informed on the use of tactics such as these in the threat space is helpful in today’s battle to protect users and systems from external threats.
Questions? Feedback? Email research-feedback@riskiq.net to contact our research team.
At RiskIQ, we track many different Magecart groups. We continually observe evolutions in the techniques they employ to skim card…
At the request of our customers, March 9th, RiskIQ's team of trained intelligence analysts began compiling disparate data and intelligence…
At the request of our customers, March 9th, RiskIQ's team of trained intelligence analysts began compiling disparate data and intelligence…
For the past ten years, RiskIQ has been crawling and passive-sensing the internet to help security teams prepare for a…
The COVID-19 pandemic is making life unrecognizable for most of us and has presented a host of new, unique challenges…
On Thursday, February 20th, around 3 pm GMT, criminals RiskIQ identifies as Magecart Group 8 placed a JavaScript skimmer on…