A recent study by NewsGuard, trackers of online misinformation, makes some alarming discoveries about the role of artificial intelligence (AI) in content farm generation. If you’ve previously held your nose at the content mill grind, it’s probably going to become a lot more unpleasant.
Content farms are the pinnacle of search engine optimisation (SEO) shenanigans. Take a large collection of likely underpaid writers, set up a bunch of similar looking sites, and then plaster them with adverts. The sites are covered with articles expressly designed to float up to the top of search rankings, and then generate a fortune in ad clicks.
If you’ve ever searched for something and walked into a site which spends about 4 paragraphs slowly describing your question back to you before (maybe) answering it, congratulations. I share your pain.
The worst part about this kind of content production is that in recent years many otherwise legitimate sites now write like this too. The pattern to look out for is as follows:
- A paragraph or two describing your problem back to you as if you’re ten years old.
- A paragraph break with a large advert.
- Another 3 paragraphs which may or may not answer your question.
On top of that, sites don’t just populate with reasonable, genuine questions. They now fill up with ludicrous questions, or answer the questions badly. Not only is garbage like this unhelpful itself, it also keeps you away from the good stuff.
This is the current state of play before we throw AI-generated content into the mix. What did NewsGuard find?
49 news and information sites which appear to be “almost entirely written by artificial intelligence software”. There’s a broad spread of languages used on these sites, ranging from Chinese and Tagalog to English and French. This helps ensure the content is being seen by as many people as possible, as well as clogging up search engines that little bit more. Some of the key points:
- Lack of disclosure of ownership / control, making it hard to assess bias or possible political leanings.
- Topics include entertainment, finance, health, and technology.
- “Hundred of articles per day” published on some of the sites.
- False narratives are pushed by some of the sites.
- High advertising saturation.
- Generic names like “News Live 79” and “Daily Business Post”.
As for the actual written content itself, it is said to be filled with “bland language and repetitive phrases”. This is one of the key indicators of AI-generated content. Additionally, many of the sites began publishing just as the various content creation AI tools, tools like ChatGPT, started to be used by the public. Quite a coincidence!
Other strong indicators include:
- Phrases in articles which are often used by AI in response to prompts. One example given is “I am not capable of producing 1500 words… However, I can provide you with a summary of the article”.
- No bylines given for authors. Reverse image searches for a handful of supposed authors reveal that images have been scraped from other sources.
- Generic and incomplete About Us or Privacy Policy pages, some of which even link to About Us page generation tools.
If a smoking gun was even required at this point, the dead giveaway would be the inclusion of actual error messages produced by AI text generation tools. One example, from an article published in March of this year, includes the following text:
“As an AI language model”, and “I cannot complete this prompt”.
Despite this, site owners remain cautious about admitting to any use of AI to produce the content farm rings. In April of this year, NewsGuard attempted to get some answers from the websites as to who, or what, is creating the content. The results are not encouraging.
Of the 49 sites studied, NewsGuard contacted the 29 sites which included some form of contact details. Two sites confirmed use of AI, 17 did not respond, eight provided invalid contact details, and two didn’t answer the questions provided.
Since the story broke, Google has removed adverts from some pages across the various sites flagged. Ads were removed completely from sites where the search giant found “pervasive violations”. Although two dozen sites were reported to be making use of Google’s ad services, the use of AI-generated content is “not inherently a violation” of ad policies.
Nonetheless, given the content created is likely to be low value and little more than click bait, it seems likely that this kind of site is not long for Google’s ad world. A number of other ad-based organisations pulled their ads when contacted by Bloomberg. Even so, this is very much a game of whack-a-mole with the SEO spammers in the driving seat.
It’s very likely we’ll see campaigns like the above dedicated to other unpleasant online activities. What if the spam-filled SEO magnet sites churn out endless content to lure visitors to phishing pages? Or Bogus sign up forms? It’s not a stretch to imagine dozens of sites fired out by AI generators linking to fake downloads and bogus browser extensions.
As many people have noted in the above linked articles, the high speed and lost cost of generation here are key to getting bad things online as quickly as possible. When you can register sites in bulk and have the AI bots filling all of them with a text firehose, the fear is that advertising networks and abuse departments may not be able to keep up. All this happened in the same week that AI “Godfather” Geoffrey Hinton left Google, warning of the dangers posed by rogues misusing AI.
If you run an advertising division, now is probably a very good time to check if AI-generated content is addressed by your policies and update accordingly. Just don’t run it through an AI first.
Malwarebytes removes all remnants of ransomware and prevents you from getting reinfected. Want to learn more about how we can help protect your business? Get a free trial below.