Internet Data Sets

The Most Comprehensive, Internet-Scale Data Sets Available

Petabytes of Internet Data at your Fingertips

At RiskIQ, data is in our DNA. Since our inception, we have been gathering petabytes of passive DNS and WHOIS data, and through our crawling of the entire internet, have amassed data sets that include SSL certificates, newly observed domains, web and analytics trackers, mobile apps, and the components that make up the web pages we see every day.

These data sets can be used by security professionals and threat analysts to connect the dots between threat infrastructure and understand the attack vectors and patterns used by attackers.

White Paper: Using Internet Data Sets to Understand Digital Threats

Passive DNS

Passive DNS

Domain Name Service (DNS) is a lot like a phonebook for the internet. It tells your browser the IP address of the server that hosts the website associated with a domain name. Passive DNS collection involves gathering the domain request and IP response from DNS providers across the internet when they happen. This can provide insight into when resolutions change and where they change to.

RiskIQ collects 1,000 gigabytes of passive DNS data daily

Threat actors need to establish infrastructure to conduct their attacks, and one of these infrastructure elements is often DNS. For example, a piece of malware may include a hardcoded domain name that is seemingly legitimate. To execute an attack, a threat actor may changes that domain’s DNS record to resolve to a malicious IP address to deliver a payload or to encrypt data through ransomware. RiskIQ also includes sources of active DNS resolution when a specific domain or IP is queried.

WHOIS Records

WHOIS

Thousands of times a day, domains are bought and transferred between individuals, and domain registrants must provide information about themselves when registering. This information gets stored in a WHOIS record associated with the domain.

WHOIS is a protocol that lets anyone query for ownership information about a domain, IP address, or subnet. RiskIQ has a vast database of WHOIS data, which is available to query for registrant information. WHOIS records provide information that includes the name, email address, street address, and phone number of the individual who registered the domain.

Attackers need to establish infrastructure to originate their attack as well as set up servers to communicate with their malware. Often, attackers register multiple domains at the beginning of an attack campaign for use during all phases of their operations.

WHOIS data can provide an organization with insight into who is behind an attack campaign. Using domain registration information, an organization can unmask an attacker’s infrastructure by linking a suspicious domain to other domains registered using the same or similar information.

URL Intelligence

URL Intelligence

Through RiskIQ’s internet crawls, we have records of billions of pages and the associated HTTP requests. Through our vast database of URL intelligence, combined with other industry-leading blacklists and URL data feeds, threat analysts can query a specific URL for any information that RiskIQ has on that URL. If RiskIQ does not have any information, the URL will be crawled and evaluated for malicious code, iframes, redirects, or drive-by-downloads that could lead to compromise.

SSL Certificates

SSL Certificates

Securing user transactions and interactions on the internet is an essential part of everyday life. SSL certificates are files that digitally bind a cryptographic key to a set of user-provided details and assist in providing this security. Beyond securing your data, certificates are a way for analysts to connect disparate malicious network infrastructure. SSL certificates can provide context by showing whether a domain or IP is legitimate based on its certificate, identify self-signed certificates versus third-party authority, and identify IP clusters and additional certificates based on shared certificates.

RiskIQ has collected more than 30,000,000 SSL certificates since 2013

SSL certificates are typically used by malicious actors in a few different ways. Some are self-signed, so they have no real credibility, and are associated with a website or web server performing a malicious function that RiskIQ has seen in the wild. Some SSL certificates are used to encrypt command and control communications for a piece of malware, so the data isn’t visible. And sometimes information about the certificate can be used to surface connections among subject alternate names for certificates.

Deep and Dark Web

Deep and Dark Web

RiskIQ automates the detection, monitoring, and remediation of threats from outside the firewall targeting your organization, employees, and customers. But, like the visible part of an iceberg, the surface web is only one part of the digital risk equation—visibility across the surface web, the deep web, and the dark web is critical to protecting your organization from digital risk.

Each of these realms requires specialized technology to detect threats so that security operations teams can take the appropriate action.

We now enable customers to also search across deep and dark web forums where threat actors may be collaborating about impending attacks, planning campaigns, disclosing information about your organization or customers, or selling or discussing a data breach related to your business.

OSINT - Open Source Intelligence

OSINT - Open Source Intelligence

Open source intelligence is community gathered information that is available from public sources. RiskIQ gathers this information from media and press sites, web-based community sites, and public data available via search engines to consolidate it into our platform for easy access and correlation with existing data.

Mobile Apps

Mobile Apps

RiskIQ crawls and scans more than 150 mobile app stores (yes, there are more than just the Apple App Store and Google Play store) daily, taking inventory of the apps, versions, and code that exists in each of the stores. With RiskIQ’s knowledge of nearly 20,000,000 apps, organizations can ensure that their official mobile apps have not been compromised and are hosted only in stores authorized for distribution.

RiskIQ has a database of nearly 20,000,000 mobile apps

Threat actors and hackers often will download the application binary (the app’s code), make small changes that infect users with malware, spyware, or viruses, and then re-post the app to an unmonitored app store where an unsuspecting user might download it, thinking it’s official and legitimate.

Monitoring for these occurrences of rogue and unofficial apps in RiskIQ’s Mobile App data set safeguards your customers and their mobile devices from attacks.