How do you deal with AI scrapers on your blog?

With large language models needing increasing amounts of training data to feed their algorithms, content creators have to deal with incresing numbers of AI scrapers trying to use their content to train these models. AI models effectively get to use human creativitiy for free and answer search queries on the search pages themselves rather then direct users to external sources like blogs. This has the potential to impact website traffic because the search engines have less of a need to send their users to extenal sites.

I’m curious how creators are dealing with AI scrapers on their sites and what solutions have you decided on? I can see three general positions about this issue with room for granularity in between:

  1. Accept that AI scrapers are hear to stay and let them use your content without trying to stop or inhibit them.
  2. Use the robots.txt file to block AI scrapers on your site. This only works for bots that respect the rules in robots.txt. Perplexity was recently accused of lying about its user agent and ignoring robots.txt. And Ghost Pro starter users can’t upload a custom theme and change the default robots.txt file, so this option may have minimal impact.
  3. Proxy your site behind a CDN like cloudflare that let’s you block AI scrapers. This may be the most draconian approach and may also block some false positives, but this may be the only way to block bots that don’t honor robots.txt. But AFAIK, this option won’t work with Ghost Pro because they provide a CDN through Fastly which is not configurable to the end user.

So what approaches are you using for your blog or are you not worried about it at all?

1 Like

Make sure the unique content you make is locked behind the paywall, make some portion (SEO, linkbait, listicles) full access but then have good high quality info behind your membership and that they have to login to read.

Since not rendered without sign in those are fully isolated and your content only in that setup.

1 Like

Good point about the protected content. Hardest decision is to know what to make public and what to project.

I will not argue, but I have begun to split it like this:

old style SEO Keyword - public through general info, member for anything after in real unique detail

Pinterest - public

Social Question - simple answer in first 3 paragraphs, member for anything after in real detail

Actual unique - simple answer in first 3 paragraphs, member for anything after in real detail

affiliate content - 100% public no wall

To me offering a free tier that I can email after is more valuable then just giving it all away for free anymore, Google started a war, now they should get the punishment for those bad decisions.

Work harder to find those 1000 Superfans, stop trying to be the answer to everything as Google punishes for this now haphazardly.

2 Likes