Magecart Strikes Again
Ticketmaster, British Airways, and Newegg have all been compromised. Who’s next? Read our research to see how we discovered the breaches.
Learn More
IDG Connect: 2017 State of Enterprise Digital Defense Report
Findings quantify the security management gap and business impact of external web, social, and mobile threats.
Get the Research Report
RiskIQ Digital Threat Management Platform Datasheet
Learn about our platform and products.
Read the Datasheet
Frost & Sullivan: The Digital Threat Management Platform Advantage
The material benefits of a platform-based approach to security outside the firewall.
Read the Report
Rackspace Accelerates External Digital Threat Investigation with RiskIQ PassiveTotal
Download Case Study
EMA Radar™ Q4 2017 Report
RiskIQ ranked a technology and value leader in digital threat intelligence management.
Get the Analyst Report
April 19, 2016, Steve Ginty
Did you realize that in loading this blog post, your web browser made over 50 network requests for resources in order to construct it? The modern web is a complex graph of dependent requests made up of images, code libraries, page content and other references. Every day, RiskIQs crawling technology makes nearly 2 billion HTTP requests online and saves the contents of the session inside of a database. Using years of this data, engineers at RiskIQ put together our latest public dataset, host pairs.
Simply put, host pairs are two domains (a parent and a child) that shared a connection observed from a RiskIQ crawl. The connection could range from a top-level redirect (HTTP 302) to something more complex like an iframe or script source reference. What makes this new dataset powerful is the ability to understand relationships between hosts based on details from visiting the actual page. Unlike our other datasets, host pairs relies on knowing the website content, so its likely to surface different values that other sources like passive DNS and SSL certificates could miss.
To illustrate how an analyst could use this data, take the domain antivirus.safetynote[.]xyz as an example. This domain was observed attempting to phish users and was placed on the RiskIQ blacklist.
PassiveTotal provides us with a bunch of infrastructure we could explore, but let’s see what a simple query to host pairs could give us. Running a query for all children of antivirus.safetynote.xyz reveals over 30 different domains:
Without knowing much about these domains, its clear that most seemed to be a typo-squatted version of a legitimate brand. In parenthesis, we can also see that the connection between our original query and the children were mostly through top-level redirects. From here, we can pick one of these domains, say wsj.om, and look at all the parent domains.
Running some of the other .xyz domains in PassiveTotal reveals that they too are blacklisted and were used as phishing sites. From here, we could continue exploring other domains parent and child relationships to build a larger graph and find more suspect items. In this particular example, performing those queries eventually leads to a bunch of shady ad providers and other blacklisted infrastructure. It’s worth noting that without host pairs, most of these sites would have remained undiscovered since there was no immediate infrastructure connection.
Starting today, users can access the host pair data directly from the PassiveTotal web interface. When available, parent and child hosts will be shown inside of a tab sorted on the last time they were observed in a crawl. Each listing will also include the cause of the relationship which users can use to help prioritize their investigations.
In the example above, we quickly went from one domain to over 50 based on just two host pair relationships. As you could imagine, showing that data in a text-based format can quickly become unwieldy. Fortunately, PassiveTotal has transforms for Maltego and have added the host pairs data into that system.
Sticking with the same example above, we can explore this data at scale without having to worry about lists of domains coming back. Instead, we see several hub-spoke connections detailing the relationship and the context as to where it was observed.
What makes Maltego nice is that we can then use the Get Enrichment transform to then identify any of the entities with tags suggesting they are malicious. Additionally, this helps cluster some of the activity based on the tag value which makes it easier to identify exactly what we may want to block.
Similar to the text-based data, graphs too can become large, so we recommend going slowly and keeping your work saved to avoid adding too many connections that would clutter up your work.
If you’d like to start playing around with host pairs data in your own application, you can access it directly using our API. An example script can be found in our Python library examples folder and highlights the use case of showing domains from a host pair relationship that has at least one tag associated with it.
This script is simple, but quickly lets you perform some of the same work we did in this post without having to leave your terminal. Using the PassiveTotal libraries, it’s easy to generate your own logic or use cases.
The addition of host pairs inside of the PassiveTotal platform is a massive step in improving infrastructure connections and providing detailed context to queries. Having full session details, cookies, web page content and more has allowed us to move beyond the traditional static datasets and offer insight into more dynamic processes.
Over time, we’d like to begin merging more of the RiskIQ blacklist/threat data directly into this data source so users could begin filtering out host pairs based on known malicious ones instead of seeing them all. For now, we would love to hear any feedback from you no matter how you use the data!