With large language models needing increasing amounts of training data to feed their algorithms, content creators have to deal with incresing numbers of AI scrapers trying to use their content to train these models. AI models effectively get to use human creativitiy for free and answer search queries on the search pages themselves rather then direct users to external sources like blogs. This has the potential to impact website traffic because the search engines have less of a need to send their users to extenal sites.
I’m curious how creators are dealing with AI scrapers on their sites and what solutions have you decided on? I can see three general positions about this issue with room for granularity in between:
Accept that AI scrapers are hear to stay and let them use your content without trying to stop or inhibit them.
Use the robots.txt file to block AI scrapers on your site. This only works for bots that respect the rules in robots.txt. Perplexity was recently accused of lying about its user agent and ignoring robots.txt. And Ghost Pro starter users can’t upload a custom theme and change the default robots.txt file, so this option may have minimal impact.
Proxy your site behind a CDN like cloudflare that let’s you block AI scrapers. This may be the most draconian approach and may also block some false positives, but this may be the only way to block bots that don’t honor robots.txt. But AFAIK, this option won’t work with Ghost Pro because they provide a CDN through Fastly which is not configurable to the end user.
So what approaches are you using for your blog or are you not worried about it at all?
Make sure the unique content you make is locked behind the paywall, make some portion (SEO, linkbait, listicles) full access but then have good high quality info behind your membership and that they have to login to read.
Since not rendered without sign in those are fully isolated and your content only in that setup.
I will not argue, but I have begun to split it like this:
old style SEO Keyword - public through general info, member for anything after in real unique detail
Pinterest - public
Social Question - simple answer in first 3 paragraphs, member for anything after in real detail
Actual unique - simple answer in first 3 paragraphs, member for anything after in real detail
affiliate content - 100% public no wall
To me offering a free tier that I can email after is more valuable then just giving it all away for free anymore, Google started a war, now they should get the punishment for those bad decisions.
Work harder to find those 1000 Superfans, stop trying to be the answer to everything as Google punishes for this now haphazardly.
Has anyone tried Anubis or any of the other scraping blockers? I understand that Iocaine might be a bit heavier than Anubis, and I’ve heard that Cloudflare is good enough (but I think the splash page thing sucks). Very much not a developer, but definitely willing to sacrifice SEO for not having content being scraped by GAI people.
Many aren’t on Ghost (Pro) but in that instance you can easily open up conversations on setup to allow SSL to be done in Cloudflare instead of in Ghost(Pro) but from their side its something they need to re-evaluate then as they provide no help in the AI and bot fight.