This investigation was a joint effort between Malwarebytes Threat Intelligence’s Jérôme Segura, DeepSee’s Rocky Moss and Antonio Torres.
Over a dozen unique domains were found selling ad inventory through Google Ad Manager, even though the pages were embedded invisibly under the content of illegal movie & porn streaming sites
Streaming sites in the DeepStreamer fraud ring generated an estimated 210,550,928 visits in January 2023, as measured by Similar Web
There was not a single seller in common between each of the sites used for laundering (the “money sites”), but most offered their inventory for sale through Google Ad Manager
Using extremely conservative estimates, which factor in a 50% ad-block rate & 70% ad-unit fill rate, we project advertiser spend on this scheme between $120k – $1.2 million in January 2023 alone
Working with a leading ad buying platform, we were able to confirm there were hundreds of millions of bid requests generated for these domains between January and February 2023
Online video streaming sites have always been some of the most visited destinations on the web. Legitimate ones will typically require a subscription fee or rely on advertising as part of their business model. Unfortunately, at any given point in time, there are thousands of sites that allow users to illegally stream pirated content, and they often manage to devise strategies that allow them to monetize their illegally sourced content with programmatic advertising.
Researchers at DeepSee and Malwarebytes have identified an invalid traffic scheme that has gone undetected for over one year via a number of illegal video streaming platforms. DeepStreamer used different techniques to evade detection and forge traffic by surreptitiously loading “money sites” (ad-monetized sites used to monetize/launder the human traffic to pirate sites) filled with Google ads completely hidden from view, while internet users were watching movies.
Not only are these streaming sites breaking the law by using copyrighted material, they are also defrauding advertisers to the tune of $1.2million per month, based on conservative estimates.
A deceptive business model
DeepSee researchers contacted Malwarebytes about a scheme they had observed recently via a video streaming website called moviesjoy[.]to. DeepSee’s crawlers had observed the site mikerin[.]com loading ads deep under the content of moviesjoy, but it wasn’t exactly clear how this was happening.
Interestingly, the site claims to offer free HD movies and TV series with “absolutely zero ads on our site. Once you hit the play button, you can start streaming right away, without any interruptions in the middle.”
On the internet if something is “free”, it usually means you are the product in some shape or form. Hosting and streaming costs money that needs to be recouped so the service can stay online.
What we identified was not entirely surprising but was quite clever. The platform does indeed rely on ads but rather than having them visible on the site, they are embedding and hiding them.
While the site owner could display ads to their visitors, there is no way legitimate advertisers (meaning those that would pay more) would accept traffic coming from a site offering pirated movies.
The trick consists of loading ads from seemingly regular websites and not showing them to anyone. Those “legitimate” websites are embedded and hidden into the page as iframes while users are watching movies.
There are 4 Google ads that load per page and the pages reload periodically. Advertisers are buying ad space for mainstream content but on websites that are inserted as invisible iframes into illegal video streaming platforms.
Rather than using more simple techniques such as popunders, DeepStreamer relies on intermediary domains that create hidden iframe containers within the existing page.
The code that they use is highly obfuscated and detects the presence of debuggers. Capturing network traffic externally will only show some static elements, and not the dynamically created iframes.
Here is the overall traffic view, from the streaming site (moviesjoy) to the money site (mikerin):
There are several anti-debugging tricks being used, the first one actually from the online video streaming site itself:
The domain hosted at adtrue[.]top (or adtrue[.]info) plays an important role in loading the money domains by performing a HEAD /dynamic/ads/ HTTP request, and yet it shows an enigmatic 404 code response.
We were able to replay the attack by putting a breakpoint on adtrue[.]info using an external web debugger (Fiddler) and observed that it started loading the domain immediately responsible for rendering the money site.
It appears though that all these intermediary domains are connected and watching for each other.
Hidden iframe containers
Let’s look at the difference between static and dynamically rendered content with mikerin[.]ml which is related to mikerin[.]com (money site with ads) only appears to load jquery.js:
However, we can take a shortcut and see what the Document Object Model (DOM) looks like by saving the current webpage as a complete *.html,*.html file using the browser UI.
While the HTML saved from mikerin[.]ml showed very little information, the DOM provides a lot more useful information since it shows objects that have been rendered by the browser.
There is a new element called “containerIframeBlog____” that is referring to the money sites which are ordinary looking blogs with Google ads. The iframe’s properties make it so that nothing is visible to the user.
One way to confirm those iframes without triggering the anti-debugger code is by launching Chrome’s Task Manager:
What we refer to as the money sites are WordPress sites with a number of blog articles and Google ads. At first glance, everything looks legitimate but that is simply a decoy to fool everyone.
What we noticed are articles that are completely clean, while others contain ad fraud code. Of course, you will only get to the latter if your referer is one of the movie streaming sites.
There is one problem though. If visitors truly came from a pirated site, then ad networks would not allow their customer’s ads through.
This is where referral forging comes into play. We can see that DeepStreamer is spoofing the referer and choosing from one of their own (Google, Bing or Facebook):
Another issue is that the invisible iframes will not reflect user activity, and yet it is important to pretend humans are scrolling and clicking on the articles. The next piece of code from the ad fraud script does just that:
If the money site was not hidden as an iframe, this is what it would look like while performing ad fraud:
Perhaps as a measure to avoid creating too many ad requests, these embedded pages do not often refresh ad units within the context of a single page-view. Instead, they generate a visit to a new spoofed page every 2-3 minutes, as demonstrated in this code snippet (looking at the interval object in particular for details on timing):
This is also confirmed by our packet captures from manually generated visits to these pirate sites; a new page is loaded every 2-3 minutes.
(Un)intended 3rd Party Measurement Evasion
One interesting side effect of embedding the money domain as DeepStreamer here has: estimates from SimilarWeb were completely thrown off! Take for example the SimilarWeb results for 2 money sites that generated hundreds of millions of ad opportunities in the same measurement period (Nov ‘22 to Jan ‘23):
Similarweb has no idea they exist & are generating these kinds of ad traffic volumes. This makes it seem like SimilarWeb measures traffic for domains that are navigated to in the browser address bar, and not accounting for hidden / embedded pages. This could be both a blessing and a curse.
On the plus side: many ad exchanges check for 3rd party traffic metrics from tools like SimilarWeb before making a publisher’s inventory available, and organizations doing that basic check will protect themselves from exposure to sites like this. Put another way: a quality specialist would see that there’s no traffic to mikerin[.]com, or guiadosabor[.]com, and the sites would not be approved for the platform subsequently.
This begs the question: how were these publishers able to sell their inventory through Google’s ad exchange? What checks and balances were in place to ensure that the traffic volumes to those sites were believable?
One negative outcome of this measurement scenario is that researchers who rely on SimilarWeb insights can not know about the “money” sites’ connections to pirate domains; the connection from source -> “money” site is lost given the absence of SimilarWeb “related sites” data.
DeepSee’s crawl data revealed ground-truth connections between the pirate & “money” sites, but it could not account for the volume of traffic directed at the “money” sites. Luckily, since these sites load every time someone visits the pirate sites, it’s possible to estimate the visit counts to the “money” domains by understanding traffic volumes to the pirate sites which embed them.
The Roster of Embedded Sites
By working with the team at Malwarebytes, DeepSee was better able to profile the activity of a monetized site involved in this scheme, and set about the task of mapping the active ones to their pirate/source domains. What we found are 14 active content domains, loaded by 250+ unique pirate sites, which cumulatively generated hundreds of millions of visits in January:
In order to arrive at the estimated visit statistics, we used data from Similarweb. Not every pirate domain was found in their dataset due to recent registration, or low traffic volumes.
Now that we had identified a sample of ad-monetized domains, we needed to make sure these ad units were actually firing off impression trackers, meaning the advertiser would be charged for presenting their ads on the page.
In order to confirm this, DeepSee analyzed data its crawlers had gathered when visiting the pirate sites in question, and compared the number of Google ad requests generated to the number of corresponding Google impression trackers fired.
This dataset, composed of 6,748 crawls performed between January 1st and February 27th 2023 showed the following:
Of the 35,269 Google ad requests measured, DeepSee measured 25,387 corresponding impression trackers, making for a fill rate of ~72%
The “money” sites loaded a median 4 ad units per-page load; confirmed by manual inspection performed by Malwarebytes
In DeepSee’s limited manual tests, generated by visiting the pirate sites & running packet capture software, there was a measured fill rate of ~80%
Perhaps more troubling, ~98% of the sessions that DeepSee crawlers generated were from known data centers, performed without any attempt to cloak the IP.
(For more information on how to do this kind of auditing yourself, check out this explainer from MonetizeMore)
These data points in hand, we could now construct an estimate of how much advertisers might be spending on this inventory. For complete insights into the dataset we used to create these estimates, alongside the complete list of Source:Money domain mappings, check out our companion document
After matching the pirate source domains to SimilarWeb data, and summing the visit counts, we counted 221,823,394 cumulative visits generated.
Using the visit data, and the time-on-site metrics from SimilarWeb, we arrived at a weighted average time-on-site of ~7.75 minutes per visit
Visitors immediately cause 4 ads to load upon a page load, and another 4 ads load on average each 2.5 minutes when the page reloads. This makes for an average 16.40 ad exposures per visit for each user
Multiplying average exposures per user by the number of visits yielded a total of 3,636,840,849 estimated ad exposures in January, but we had to add a few modifiers to this figure:
According to data compiled by Statista, ~50% of desktop web users block ads, and that number is ~30% for mobile browser users. We chose to use the more conservative 50% figure, and removed half of the projected impressions from the pool, leaving 1,818,420,425 estimated ad exposures in January
As we previously mentioned, DeepSee crawlers measured a fill rate of ~72% for Google ad units on the money sites during our visits. Factoring in a slightly more conservative 70% fill rate left us with 1,272,894,297 estimated ad exposures in January
Given our final figure of 1,272,894,297 estimated ad exposures in January, the advertiser spend was estimated to be between $127k and $1.27 million, depending on the average price of these advertisements, which was never disclosed to us. We broke our estimates down across several probable price points for this media:
At this point, it was clear that advertisers were really buying this space, so we started asking around for evidence that could point us to who was selling the space.
The Non-Google DSP Perspective
The data in this section was provided to DeepSee by a leading DSP (demand-side advertising platform) with global reach, who agreed to participate under condition of anonymity (we’ll call them DSP “A”) . They provided reporting, from their perspective, on the count of bid requests generated by the money domains dating back to August of 2020. Most helpfully, they also provided the supply-path related to an opportunity, which tells us the exchange & seller name related to the opportunity.
As a disclaimer, there are a few limitations of this dataset:
This is just the perspective of one DSP, and we can’t claim to know that these sellers created a similarly large share of opportunities presented to all other DSPs. We suspect they do, but without input from Google in particular, it can’t be confirmed.
These sites seemed to monetize extremely poorly outside of Google; fewer than 1% of requests resulted in an ad being delivered via the DSP we polled.
That low fill rate was echoed by another non-google exchange we polled, who told us that only .1% of opportunities they created resulted in ads being loaded
On the other hand, we observed that the Google filled these ad units upwards of 70% of the time, implying spend was mainly coming from users of Google’s DSP
Understanding the above, the below table shows the top sellers offering space on these money domains, and the ad exchange the opportunity came through.
Google Was the Top Exchange Offering These Opportunities; There Was Not 1 Particular Seller in Common
Top Seller Per Domain, Ordered by Magnitude of Ad Opportunities Presented to DSP “A” Since August 2020
In this investigation, we uncovered a network of streaming websites and bogus domains created for the purpose of illicitly gaining revenue from advertisements by a threat actor we called DeepStreamer.
We were impressed by the technical complexity of the code and underlying infrastructure. The perpetrators took many steps to prevent reverse engineering and tracking metrics were not accurately representing the scale of the abuse at play.
We have notified Google and other industry partners and some actions have already taken place. Malwarebytes users are not participating in this invalid traffic scheme defrauding advertisers as we already block the fraudulent domains used.
Malwarebytes believes that when people and organizations are free from threats, they are free to thrive. Founded in 2008, Malwarebytes CEO Marcin Kleczynski had one mission: to rid the world of malware. Today, Malwarebytes’ award-winning endpoint protection, privacy and threat prevention solutions and its world-class team of threat researchers protect millions of individuals and thousands of businesses across the globe.
The effectiveness and ease-of-use of Malwarebytes solutions are consistently recognized by independent third parties including MITRE Engenuity, MRG Effitas, AVLAB, AV-TEST (consumer and business), Gartner Peer Insights, G2 Crowd and CNET.
The company is headquartered in California with offices in Europe and Asia. For more information and career opportunities, visit https://www.malwarebytes.com.
DeepSee uses highly sophisticated crawlers, combined with rigorous network analysis, in order to capture the behaviors websites present when visited by actual humans, and contextualize those behaviors within the graph of the internet.
DeepSee uses this data to arm advertising professionals with ground-truth signals about content appropriateness, ad-density, on-page technologies, backlink makeup, and more.
This dataset enables the sell-side to effectively & automatically moderate the quality of the inventory they offer, and empowers the buy-side to quickly generate robust blocking / targeting lists.
Indicators of Compromise
Domains launching invisible iframes:
Have a question or want to learn more about our cyberprotection? Get a free business trial below.