Blog

RiskIQ uses multiple methods of automated detection that all play off each other to track threat patterns, which enables our customers to protect their digital presence from threats. Every piece of data we collect helps train and optimize the accuracy of our models—even a simple redirector that leads to a run-of-the-mill phishing page.

What’s in a Redirector?

Sometimes, detections are straightforward. During the loading process, RiskIQ’s web crawlers gather a web page’s full document object model (DOM) and the page sequence of a crawl. For example, the crawl below shows a simple redirection script that sends users to a typical phishing page:

Every piece of data we collect helps train and optimize the accuracy of our models, even a simple redirector that leads to a run-of-the-mill phishing page.

Fig-1 Response body captured by RiskIQ Crawlers showing the malicious redirect

 

Every piece of data we collect helps train and optimize the accuracy of our models, even a simple redirector that leads to a run-of-the-mill phishing page.

Fig-2 The phishing page it leads to

As part of our crawl analysis, we analyze the structure of a rendered web page, looking for indications of foul play. In this case, the phishing page was detected immediately by our machine-learning model. However, not all phishing detections are as straightforward as the one above, so we often rely on layering on additional methods, as we’ll describe below.

With any machine-learning algorithm, active feedback and retraining are crucial to maintaining the quality of the model. RiskIQ uses an active learning feedback loop, whereby humans regularly review pages that are then fed back into our machine learning models. In the course of reviewing pages, a RiskIQ data scientist came across the pages above and labeled each one—a phishing page, and a redirector to a phishing page.

By doing so, these pages were also fed into all of our machine learning models, including our dynamic model that performs a behavioral analysis of a web page as it loads. This behavioral analysis quickly identified several other pages with similar patterns.

At first glance, it may seem that behavioral analysis is not a great tool to use for detecting this redirector chain—after all, there’s not too much to it!  It simply sets the `loc` variable and then calls `self.location.replace = loc; window.location = loc;.’ However, given that JavaScript allows for almost endless variations in even simple tasks (see: 535 ways to reload a page with JavaScript), even this small signature was distinct enough that almost all of the pages the model flagged were malicious, written presumably by the same bad actor. In the end, we did see some false positives at a rate of about 10%, but to reduce that number, we turned to yet another method of detection at our disposal: signature-based detection.

Signature-based detection commonly targets patterns in a script, a URI, or some other feature of a page. Our in-house signature-based detection engine can target these, plus many other features enhanced by our crawler, our machine-learning algorithms, and our numerous data sets. These features include, for example, all dependent requests of a page, full redirection sequences, web components such as servers or javascript libraries that are present, IP address, ASN, results from natural language processing classifiers, SSL certificates, and more. In the case of this redirector, we were able to quickly add a signature detection script that reduced our false positive rate to zero.

Security professionals are often playing a game of whack-a-mole. Once a new threat is detected, threat actors will change their modus operandi, rendering previous detections obsolete. With RiskIQ’s varied detection techniques, however, small changes are not enough to evade detection. If an actor tries to evade signature-based detection by changing URI patterns, we can track them by dynamic analysis of the page or page structure, and vice versa. Even if they obfuscate their code, update their page structure, change their URI patterns, upgrade their web components, and change their scripts (quite a lot of work!), we can still track them through our massive amounts of other data: whois, passive DNS, tracking IDs, cookies, and our proprietary host pairs dataset, to name several. Signing up for RiskIQ Community Edition now will allow you to start pivoting on these data sets today.

Security will always be a game of whack-a-mole, but through large-scale data collection across a variety of data types and automated detection algorithms, RiskIQ continues to make it harder for the moles to hide.

Contact us for more information about how RiskIQ collects internet data and uses it to automate and optimize threat detection.

Share:

Connect with us
Featured Post

RiskIQ’s 2019 Evil Internet Minute: All the Cyber Threats Jammed Into 60 Seconds