With large language models needing increasing amounts of training data to feed their algorithms, content creators have to deal with incresing numbers of AI scrapers trying to use their content to train these models. AI models effectively get to use human creativitiy for free and answer search queries on the search pages themselves rather then direct users to external sources like blogs. This has the potential to impact website traffic because the search engines have less of a need to send their users to extenal sites.
I’m curious how creators are dealing with AI scrapers on their sites and what solutions have you decided on? I can see three general positions about this issue with room for granularity in between:
- Accept that AI scrapers are hear to stay and let them use your content without trying to stop or inhibit them.
- Use the
robots.txt
file to block AI scrapers on your site. This only works for bots that respect the rules inrobots.txt
. Perplexity was recently accused of lying about its user agent and ignoringrobots.txt
. And Ghost Pro starter users can’t upload a custom theme and change the default robots.txt file, so this option may have minimal impact. - Proxy your site behind a CDN like cloudflare that let’s you block AI scrapers. This may be the most draconian approach and may also block some false positives, but this may be the only way to block bots that don’t honor
robots.txt.
But AFAIK, this option won’t work with Ghost Pro because they provide a CDN through Fastly which is not configurable to the end user.
So what approaches are you using for your blog or are you not worried about it at all?