Blog

Internationalized domain name (IDN) homographic attacks, or “Punycode” attacks, made headlines back in April and are once again gracing front pages. IDN homographic attacks, a type of typosquatting, occur when threat actors create domains that use non-ASCII characters that seem unique, but once processed by the browser, have a striking similarity to existing brand names. Threat actors continue to return to this tactic because of its potency in tricking end-users.

These attacks are possible because URLs may only contain a limited number of characters (a subset of ASCII), so these hostnames are encoded via Punycode. Punycode is an algorithm used to transform a Unicode string into an ASCII string. Punycode transforms a Unicode sequence into a string of ASCII characters which can be used in a hostname label. Only letters, digits, and hyphens are allowed. Punycode sequences always start with the characters “XN–.”

For example, “riskiqbånk.com” would be encoded as “xn--riskiqbnk-ora.com.” A web browser would then decode this hostname and display it as “riskiqbånk.com” to the user.

Internationalized domain name homographic attacks, or “Punycode” attacks, made headlines back in April and are once again gracing front pages.

Fig-1 A prime example of a Punycode attack against Wikipedia inside the RiskIQ tool. In this example, the first “i” in the “wïkipedia.com” domain has a subtle umlaut over it

The subtle visual difference between “riskiqbånk.com” and “riskiqbank.com” is easy to miss, arguably more so than other types of typosquatting such as misspellings of the brand name (“riskiqbannk.com”). Because they read so similarly to the sites users trust, domains registered with Punycode can be convincing enough to get people to click on malicious links or provide sensitive information.

Furthermore, lots of popular and widely available methods for detecting domain threats (e.g., string matches and edit distance-based algorithms) may fail to detect this domain due to the obfuscation of the brand name in the Punycode-encoded version “xn--riskiqbnk-ora.com,” leaving many organizations potentially vulnerable to this type of domain threat.

Given the number of new IDNs getting registered all the time—RiskIQ analyzed over 10 million new IDN registrations and 227 million newly observed subdomain names created in the past month alone, 1,000 of which were detected as dangerous to a RiskIQ External Threats customer—this should be a concerning gap in visibility for any security or digital risk management program.

How Can RiskIQ Help?

RiskIQ External Threats detects IDN homographic attacks and other types of domain threats (such as subdomain infringements that pose a risk to an organization, its employees, and its customers’ security.

First, we convert any Punycode-encoded hostnames to ASCII and map homoglyphs to their ASCII equivalents. This step ensures that the domain or hostname we analyze is what a user would likely interpret it to be. In the example above, we would analyze “xn--riskiqbnk-ora.com,” which would display to a user as “riskiqbånk.com,” and convert it to “riskiqbank.com” to examine its similarity to brand names or other keywords of interest.

Next, we compare the similarity of observed domains and hosts to the brand names and official domain names of an organization. Each keyword term can be specified to allow exact matches, regular expression matches, or “fuzzy” matches in a domain or subdomain name. While a basic string distance or other simple algorithms may give us words that are roughly similar, it will not interpret a hostname as a human user would and may result in both missed detections as well as false positives. For example, “friskyband.com” has a small word distance to “riskiqbank.com,” but a human user would not think that “friskyband.com” was somehow associated with the brand “RiskIQ Bank.”  

To mimic most closely how a user would interpret a hostname, we augment simple word distance calculation with a dynamic programming approach to parse a domain into its most probable word segments. For example, the parser would deduce that “riskiqbank.com” should be tokenized as “riskiq / bank / com.”  Since many domain infringement attempts will take advantage of common misspellings and common “fat finger” typos, such as “riskiqbanck.com” or “riskqbank.com” or “riskiqvank.com,” the parser will also look for slightly misspelled words when it parses a hostname.

Then, an algorithm developed by the RiskIQ Data Science team uses several features of the parsed hostname to consider the context, such as whether it includes the keywords we are looking for, whether they are misspelled, and how many other “real” words are in the hostname (vs. random characters) to decide if a hostname is likely infringing or not.

Notably, this approach can also handle infringements that “span” multiple parts of a domain as an additional layer of obfuscation, which is becoming increasingly popular with threat actors, e.g., “risk.iqbank.com.” This is another type of threat that can be easily missed by other detection methods.

Internationalized domain name homographic attacks, or “Punycode” attacks, made headlines back in April and are once again gracing front pages.

Fig-2 RiskIQ’s unique approach to detecting Punycode attacks

Monitoring and Responding to Domain Threats

Any practical approach to managing risks from third-party domains does not end after initial detection. To truly know what kind of risk a domain or hostname poses to an organization, you need additional context about who owns that infrastructure and how they are using it. You also need a highly automated and scalable process for using that information to identify the highest risk items from a large set of possible threats.

External Threats leverages the core Internet intelligence data sets and capabilities of the larger RiskIQ platform to exclude legitimate domains owned by your organization automatically and enrich detected threats with additional details, including:

  • Whether domains have live website content hosted on them
  • Whether visitors navigating to a domain get redirected elsewhere
  • Whether there is malware, phishing content, or brand-related text or images in web content associated with these domains or their redirect destinations
  • Whether domains are parked
  • Whether domain names are capable of sending and receiving email
  • Whether a domain is related to other observed threats to your organization

Prioritization according to key risk indicators combined with automated continuous monitoring of domain threats over time to alert on changes in threat level makes it easy for users to focus efforts on a small number of urgent items today—without losing sight of the larger set of threats which might become tomorrow’s top priority. With all the information necessary to decide about how to respond to a potential threat, RiskIQ’s built-in threat mitigation capabilities allow users to quickly report abuse to the appropriate parties inside or outside of your organization and automatically track your reported items through to resolution.

Contact us today to find out more about how RiskIQ can help defend your organization from Internationalized domain name (IDN) homographic attacks.

Share:

Connect with us
Featured Post

RiskIQ’s 2019 Evil Internet Minute: All the Cyber Threats Jammed Into 60 Seconds