MarkOfTheWeb: A Calling Card for Careless Russian Agents
Digital interference from the Russian Federation is nothing new. Their virtual trespassing efforts have been outed and heavily discussed in the news—even more so in recent months (as you've probably noticed). Russian digital incursion into the United States political climate allows them to adjust the direction of discourse and push buttons when and where needed to help achieve a desirable outcome for the Kremlin. To carry out these active measures, the Russian state relies not only on agents and spies who do physical work but also those who operate digitally.
Luckily, not all Russian digital agents are as smooth as James Bond. Sometimes, they slip up and leave traces of their origins. One such slip up occurred recently. In August 2017, the staff lead of Missouri Democratic Senator Claire McCaskill was spear-phished by the digital arm of the Russian state in an attempt that resembled the infamous attacks against John Podesta and Colin Powell.
By downloading a login page directly from the internet, the agent attempted to fool the high-ranking staffer into giving up his credentials. However, unfortunately for our hapless Russian agent, the phishing page they spun up included more information than intended. With the breadcrumbs they left behind, we were able to tap into RiskIQ’s repository of internet data to trace the origin of the agent and uncover other targets, which gave us clues about their motives, which, suspiciously, seem to align with those of Russia.
Following Breadcrumbs to the Kremlin
To follow this agent to the doorstep of the Kremlin, we first needed to be able to break down their infrastructure into its most basic parts. At RiskIQ, we do a lot of web crawling. We process over two billion pages every day, and that number is growing all the time. Playing in such a big sandbox of web data yields unique insights, especially as we build on the archive of information we’ve collected over a decade of crawling the web. As we crawl, not only do we pull in the webpage content but also are able to obtain a record of everything that happens on a webpage. We can then pinpoint why a resource is pulled in, which function executes it, and for what reason.
In this article, we’ll be looking at how we pair some of our component detection with extracted attributes and how we used this to uncover Russian attacker infrastructure, find phishing pages, and perform attribution on a foreign state that failed to realize they left artifacts.
A Russian spear-phishing email was sent to Senator McCaskill’s staff lead Kyle Simpson notifying him of a security incident requiring him to update his password. They were linked to the following URL:
The domain used in the above phishing setup, qov.info, has been sinkholed by Microsoft, but at the time, displayed the following page (this page was made smaller to fit in the report):
Although phishing itself isn’t super advanced in any way, what is interesting here is how this page was created. It was, in fact, copied from the official Office 365 website. If we jump into the source of the page we find the following marking at the top:
Make note of the second line starting with ‘<!--.’ This HTML comment is a browser-generated marker. Where does it come from?
When setting up a phishing page, the actors had a few options. These include setting up a completely new page, mimicking the existing page, or using the original page. In this case, the operators downloaded the original page by visiting it in their browser and hitting right-click u2192 “Save As”. This saves the page as displayed with all its child resources required to function properly—the only difference between this new web page and the original is an added a marker to the content right before the opening HTML tag. This marker is called the MarkOfTheWeb marker.
The format of the MarkOfTheWeb tag is always the same. We’ll explain the exact reasoning behind this marking in the next section, but for now, just note that this mark tells us from where the operators copied the page.
From this marking, we can tell that the page was saved from the official Microsoft Azure page used to log into Office 365 online. Furthermore, if we look at the page, we find the following details:
The page that was saved was, in fact, the actual password reset page for the account of Kyle Simpson, which saved the operator some time setting up the page for this specific target. Anyone can get to the password reset field for any legitimate account—the Russian operator in this attack used this functionality to get his entire phishing page setup finished. All they needed was a way to grab the credentials as they were posted.
When RiskIQ researchers were pointed to this phish, it was already attributed to a Russian state operation, but we also gathered our own evidence pointing to Russia based on this page. Remember the MarkOfTheWeb tag in the page? There’s something interesting going on with it.
When you visit Google from French IP space, you will be redirected to google.fr which has the French localized Google search page. Although you can change the language back to English, you will remain on the google.fr domain. Something similar happened to the Russian operator saving the phishing page. Let’s take a close look at the source URL below:
Something should jump out at you immediately: there’s a tag at the end of the URL with a country code. When visiting the localized Office 365 login page, the user’s origin is tagged in the URL (the domain stays the same). For example, visiting this page from France would get you frO365 as the final added tag. The ruO365 tag tells us the user who copied this page was coming from Russian IP space.
There are even more markers on this page that leak another victim of this operator’s phishing campaign which, as far as we know, hasn’t been publicly reported. When you go onto the Office 365 login page and attempt to log into a different account, the service will automatically remember the previous account, a behavior similar to other services in which you have to explicitly remove the previous session.
While remembering login information is a way to make it easier for users to quickly switch between accounts, it’s also the Achilles’ Heel for this Russian operator. If we go all the way to the bottom of the page, we can find artifacts of their old-account-re-login behavior:
Highlighted in the red box is a previous account used on the service: DrachukS@rferl.org. The domain for the email is linked to RadioFreeEurope/RadioLiberty which is a United States-funded news outlet focused on getting news to areas of oppression and conflict. The email address we extracted at the top belongs to a writer named Serhiy Drachuk (https://www.radiosvoboda.org/author/17595.html), who writes articles on events in Ukraine with a focus on Crimea, the former Ukrainian territory recently annexed by Russia.
Given the political tensions in Ukraine with regards to Russia, it is very likely Mr. Drachuk’s work at RadioLiberty led him to be targeted by the Russian state’s phishing efforts.
In the investigation above, we showed how we can use small artifacts left by the attackers to gain additional insights. The MarkOfTheWeb artifact itself does not tell the whole story, but because it tends to be overlooked by the bad guys, it can be a crucial clue in connecting operators with their malicious campaigns. MarkOfTheWeb itself comes from a marking in the copied page, but its functionality goes beyond that. The marking is for security while viewing locally saved page content again in a browser. Its technical effect for zone lockdowns in Internet Explorer is well documented on the MSDN.
Other browsers also make use of MarkOfTheWeb in a similar fashion. The marking simply tells the browser in what context/zone/website the page was originally running so any kind of dynamic content will still work and execute in a safe way. MarkOfTheWeb also ensures that no dubious cross-domain content requests are possible.
With MarkOfTheWeb the browser loads up the page in the way it would normally, by setting the zone, i.e., the domain it was copied from, which aids in protection against cross-origin resource sharing (CORS). The idea behind preventing CORS is that you want certain resources to only work with content from the domain it is currently on or the domain from which it was loaded. This protects users from attacks in which an operator would abuse the page they are on to suddenly talk to a different domain—Facebook.com to steal their session, for example.
While MarkOfTheWeb is useful as a security mechanism, RiskIQ likes to use it for different purposes: finding the bad guys.
Making Use of MarkOfTheWeb
We’ve been observing the artifacts shown above for a long time and decided to add MarkOfTheWeb to our tagged (web) components and extracted attributes. This means that you can now browse RiskIQ’s data repository to look for any host that had, at some point in time, a page that was copied from somewhere else.
We are extracting the source URL from which it was copied and splitting it up into multiple smaller chunks so analysts can make some interesting pivots. Let’s go through a few to give you an idea of what this entails, and pull up any host that has a MarkOfTheWeb component: https://community.riskiq.com/search/components/Content/MarkOfTheWeb
As of this article’s publishing, we are at nearly 40,000 hosts tagged with MarkOfTheWeb, and we will only see this data set grow. Scrolling through this data set, we can see some obvious suspects, such as supportmicrosoft.webcindario.com. Let’s dig into it a bit more by pulling up its information in RiskIQ PassiveTotal:
Looking at this host, we have some indicators to start with—the components tell us what the website is running and that it includes the MarkOfTheWeb tag. The name, “support microsoft,” is very suspect. If we look at the tags at the top, we see that RiskIQ’s systems have already classified the page as a phishing page:
What is interesting with MarkOfTheWeb is that our systems will also try to extract the source URL from the tag in various ways:
- Full source URL which is named MarkOfTheWebSourceURL
- Full source host which is named MarkOfTheWebSourceHost
- If the source URL was hosted on a bare IP address, we extract the source host as MarkOfTheWebSourceAddress
If we go to the ‘Trackers’ tab on PassiveTotal we can see the extracted attributes:
We can see two extracted full URLs as well as two extracted source hosts. The page was copied from Microsoft’s official live login prompt and put up as a phish. Pulling a snapshot of what the page looked like around the time this marking was visible confirms the phishing page:
As we’ve shown, MarkOfTheWeb can give us some conclusions as to where a page was copied from and the intentions of the page’s creator. What’s even more interesting about our data sets is that you can query this data in two ways: We can see from what host a page was copied or we can ask RiskIQ PassiveTotal to give any host associations it has for hosts that were copied from the same host.
To find host associations in RiskIQ PassiveTotal, an analyst can simply click the artifact value, which can be queried here: https://community.riskiq.com/search/trackers/MarkOfTheWebSourceHost/login.live.com. This gives us a list of hosts on which a page was seen that was copied from login.live.com. You’ll notice a lot of domains in the list that don’t look (ph)fishy perse—often phishers will compromise a legitimate website to set up a phishing page, so although the hosts were likely hosting a phishing page at some point, doing so wasn’t their initial purpose.
For example, in the list, we will find schoorsteenveger-scheemda.nl, which in itself isn’t a bad website (it belongs to a Dutch chimney sweeping company). However, the website was compromised and abused for phishing. Opening it up in RiskIQ PassiveTotal, we find MarkOfTheWeb trackers:
We can see that, at some point, it hosted a page that was copied from the Microsoft Live login page, just like our previous example. In our Crawling data, we find the page with the MarkOfTheWeb markings was hosted at 'http://schoorsteenveger-scheemda.nl/drdqfa/working/7tt30e87kn2zcwnw0e93xgb3.php'
The page itself looked like this:
This is a pretty obvious case of a compromised host abused for phishing purposes, like so many others RiskIQ sees on a daily basis.
Lazy tech support scammers
The extracted MarkOfTheWeb attributes also give us some insights into operations from bad guys active in the tech support scam business. We’ve seen infrastructure linked that wasn’t public as well as different groups stealing(?) each other’s page designs.
For example, here is a host that is frequently seen being copied from for tech support while itself was also a tech support page. We don’t know the exact reasoning behind this kind of behavior, but we think it is simply laziness on the part of the criminals. If someone else made a nice page, why not copy it and swap out your support phone number? Easy.
The MarkOfTheWeb marker is an extremely useful artifact when investigating websites and their origins. In many cases, the bad guys won’t want to rebuild the page from scratch and will copy a page they intend to mimic for phishing campaigns by saving it to disk manually. Luckily, this leaves breadcrumbs for us that lead us directly to the actor!
As we’ve now seen, MarkOfTheWeb can even be used for attribution of attacks on United States officials by Russian agents as part of their ongoing efforts to disrupt Western institutions. Our unique perspective, provided by our massive data sets of web content, allows us to see things MarkOfTheWeb and all other web components and attributes that others cannot. When we detect and correlate these web components, we can help find the relationships between malicious sites and actors.
While our phishing models don't rely on MarkOfTheWeb, it can add confidence to your conclusions. Using RiskiQ PassiveTotal, analysts can hunt around for MarkOfTheWeb artifacts and some of its values, to give you a start for investigations here are some examples that will yield interesting results:
- Pages saved from the Uber registration page:
- Pages saved from the official Facebook login prompt:
- Pages saved from the Paypal website:
- Pages saved from the Google account login website: