Derived Data Sets

Correlate attacks based on share web page attributes and associations

Built on terabytes of collected data from across the internet, RiskIQ extracts and analyzes internet data to create new data sets that aid in discovering, understanding, and mitigating digital threats.

These data sets provide customers with insight into web page attributes and associations based on RiskIQ’s vast crawling infrastructure and can provide security analysts with new data sets through which to investigate and track attacks to their organizations.

Website Metadata and Trackers

RiskIQ gathers the full DOM during the loading process of pages that we crawl. We extract details such as website trackers, analytics codes, social network accounts and other unique details. These values can provide insights into additional infrastructure that typically goes unnoticed by static data sets. RiskIQ has data about trackers from includes IDs from providers like Google, Yandex, Mixpanel, New Relic, Clicky, and more.

For example, often when a website’s HTML is scraped and reposted for something like a phishing campaign, malicious actors don’t bother to change things like the associated Google Analytics ID, tracking pixels, cookies, or social networks connections. Being able to search official tracking codes can surface pages where the threat actor has forgotten to change this information, leading to security teams finding and shutting down a malicious campaign.

Also, like most digital organizations, some hacking organizations utilize tools like Google Analytics to measure the success of their malicious campaigns. We can find other instances across the internet where we’ve seen the same malicious actor’s analytics tracker and uncover additional campaigns associated with them.

Host Pairs

Host pairs are unique relationships between pages that are observed by RiskIQ when we crawl a web page. Each pair has a direction of child or parent and a cause that outlines the relationship connection. These values provide insight into redirection sequences, dependent requests or specific actions within a web page when it loads.

The connection could range from a top-level redirect (HTTP 302) to something more complex like an iframe or script source reference. What makes this data set powerful is the ability to understand relationships between hosts based on details from visiting the actual page. Host pairs relies on knowing website content, so it’s likely to surface different values that other sources like passive DNS and SSL certificates do not.

See an Example of How Host Pairs Uncovered Malicious Infrastructure

Web Cookies

As RiskIQ virtual users crawl the internet, they capture everything that happens under the hood when the virtual users visit a website. This includes capturing any cookies that might be dropped by the site to track user behavior or note the status of the user’s machine. Cookies are yet another source of information that can tie pieces of infrastructure together across attack campaigns, or connect seemingly unrelated assets together. RiskIQ correlates cookie source name and data with infrastructure hosting the cookies to allow analysts to pivot and find other sites with related cookies.

Threat actors often use cookies to track users who have been delivered a malicious payload so as not to try to infect a user again. Threat hunters who are investigating a cookie as a possible indicator of compromise can search the RiskIQ internet database for that cookie.