Fastly WAF impacting SEO?

Issue Summary

I do get results on google (checked using the filter **site:**callof.monster) but only on the pages I submitted manually for indexing in google search console are now displayed (and very poorly ranked) .

Some other tools like the free sitemap finder have no issues finding the sitemap themselves :

When I try to fetch the sitemap.xml file from AI buddies like GPT or Grok, Grok receives a 403 error and GPT a 500 one :

What I suspect :

  • I strongly suspect that the Fastly WAF is not properly configured on the callof.monster property for these bots.

I opened a ticket to Ghost support a few days ago to mention that problem (no answer yet), so that they could take a quick look at the WAF logs and see what rule is denying the surf here; there’s probably some whitelisting to edit to include these non-malicious agents or a rule that needs a bit of tweaking.

But I was wondering if others here had observed some similar behavior ?

  • In google search console, can you fetch the sitemap.xml properly ?
  • If you ask one of these AI to check it, is it ok for you ?

Maybe one thing that could matter regarding my issue : when I first created my Ghost account and website a few months back, the site title (and therefore ghost.io url) was not the same (I changed it like 2-3 weeks ago). At the time the website was not opened / ready, so I did not check in the search tools see if it was ok though. But I was wondering if somehow a change in this property could affect the WAF part.

One last thing I suspected was maybe the .monster tld not handled properly by Google, but given the difficulty to crawl for other healthy services, I really suspect an issue with how the Fastly WAF property is configured by default.

So it is most certainly one thing that can only be solved by Ghost staff who has access to my Fastly property, but I was wondering if anyone ran into similar issues recently (as I’m new to Ghost) ?


Setup information

Ghost Version
6.0.5-0-g524d7c21+moya - Hosted w/ a Ghost(pro) subscription (creator).

Can’t help with the overall issue (site not indexed), but without seeing the EXACT requests that Grok or ChatGPT make, you cannot make a conclusion here.

Especially the screenshot of Grok is well…questionable. To be fair, I have never used Grok myself, but any other AI chat interface shows tool calls. The screenshots you provided just look like text output, literally like Grok just hallucinated and pieced things together.

LLMs are not intelligent. They are just predicting what the next word (well, token) should be. If you tell an LLM that you have issues with a sitemap, it is not unlikely that it will use that context to “predict” the next phrase should be “well, that’s because there’s a 500 error”.

So, as I said: without seeing the exact request that Grok/ChatGPT make, the conclusion is flawed.

However, even if Grok/ChatGPT make actual requests, a WAF is able to tell requests from Google and those form AI chat interfaces apart. So, even if there are actual requests being made, the only logical conclusion you can make is that the AI interfaces are blocked, not that WAF is blocking Google.

What does it say in your Google Search Console?

Also, as harsh as it sounds, Google does not index everything. It’s very selective and might have had a look at your content, and determined that that’s not a priority to index at the moment.

2 Likes

Hi Jannis :waving_hand: ,

1st of, thanks for taking the time to read about my problem :+1: , I really appreciate (you probably have some good time-management skills - and automation - as it’s hard to imagine how you can run all those sites with their associated support ‘and’ come here from time to time to see others issues as well :smiley: !).

I’m sorry I should have posted what I can see in the Google Search Console as everything started from there (‘thought i did :/), here is what I can see :

…and when I go clic on that to see the details, it claims “Sitemap could not be read” :

I totally get your point on the “conclusion”, I get the flaw in the “grass is green, my bag is green, then my bag is grass”, and clearly i’t’s not because AI agents have issues in the browsing that the root cause is the same.

But starting from this “Sitemap could not be read”, it triggered some intuition (not logic ;p) regarding “why” it can’t be read, hence I only say that I “suspect” (lol) something with the WAF (hence my request to support see if they can have a quick read on the WAF logs regarding the google bot for this url, see if anything appears suspicious).

Regarding the AI stuff and what the exact query looks like, clearly I won’t have it. I did ask though what kind of user-agent was used and what the raw request looked like and it’s “supposed to” look like this :

So I tried the following directly from home :

curl -X GET "https://callof.monster/sitemap.xml" -H "User-Agent: GrokBot/1.0 (+https://x.ai; grok@x.ai)" -H "Accept: */*" -H "Host: callof.monster"

and it went well (no issue getting the results) :

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="//callof.monster/sitemap.xsl"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://callof.monster/sitemap-pages.xml</loc>
        <lastmod>2025-08-27T07:33:57.528Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://callof.monster/sitemap-posts.xml</loc>
        <lastmod>2025-08-26T18:18:56.000Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://callof.monster/sitemap-authors.xml</loc>
        <lastmod>2025-08-27T08:08:29.974Z</lastmod>
    </sitemap>
    <sitemap>
        <loc>https://callof.monster/sitemap-tags.xml</loc>
        <lastmod>2025-08-21T21:25:53.000Z</lastmod>
    </sitemap>
</sitemapindex>

Also, as harsh as it sounds, Google does not index everything.

Indeed, but I thought this was for poorly structured content of low quality (but this may have changed), and - let me know if I’m wrong - I think it does not affect the sitemap.xml fetching, meaning it should still be in success in the google console (even if pages are not going to be indexed because of whatever).

Something I tried more recently, is to see if I could fetch the sitemap from my forums (hosted somewhere else than the ghost hosting services) located at https://forum.callof.monster/sitemap.xml .

I have the same error in the google search console (sitemap could not read), so…now I’m starting to wonder if there’s maybe something wrong w/ the .tld .monster -_- (when I’ll have a bit more time I’ll see if there’s any contact for google support).

ps: I saw that the forums are hosted at Hetzner Online GmbH, whichI believe is your primary DC as well for Magicpages ^^.
If this issue remains over time, I’d be curious to try my website at Magic’, see what the behavior would be !

In case it helps:
I for example block AI bots and as much bots as I can on my servers, so a random request from any AI will return an error.

Try to submit the sitemap again to google and see if they can fetch it again.

Also, google can index your website without a sitemap, it is not a requirement, it helps of course.

1 Like

Ok thanks for the feedback :wink: .

Just curious, ‘any specific reason for you to do that ?

Try to submit the sitemap again to google and see if they can fetch it again.

Yep ! I do it from time to time, as it does not say that it is useless to do so (contrary to pages indexing that enter a dedicated queue). So far no changes ^^.

Also, google can index your website without a sitemap, it is not a requirement, it helps of course.

Yes ! I guess I’ll submit the main categories for now to be indexed as well (if it helps).
I never contacted Google directly, I’ll try to see if there’s any chance I can open a ticket somehwere as they certainly have more details on why it can’t fetch the sitemap (and robots.txt).

I would second that. Or…maybe just wait. In your screenshot it shows that you just submitted it today. Might be worth just leaving it as is and then checking back in a few days.

Indexing/ranking on Google is never a “I do something once and it will work” action.

While I’d be more than happy to welcome you at Magic Pages, I doubt that this will have any influence at all on the sitemap. The hosting infrastructure is the last thing I’d check, and tens of thousands of sites rank well on Google within the Ghost(Pro) setup.

1 Like

Yep, I re-submitted it today. Going to leave it as-is and wait for a few days as suggested ;).

Yes that would be very weird indeed that it is working for others but not mine behind the same WAF config (assuming there’s nothing “specific” for each provisioned website). I can definitely exclude this lead here about the WAF (especially after seeing that I have the exact same issue on forum.callof.monster).

I’m going to patiently wait to see if anything changes in the next days, invest in advertising here and there to grow the audience, and see if I can submit a question somewhere in Google support services (no real hope here as I’m just one among millions). As I never met this issue before I’d like to push as far as I can in terms of investigation just to ‘understand’ what’s happening exactly ^^.

If I manage to get any feedback on this, I’ll make sure to reply here later on just in case someone else encounter the same kind of “issue”.

Thanks again guys for taking the time to read all this :+1: .

2 Likes

Bots will consume my traffic and resources from my servers and there are legion these days, I pay the servers for my readers not for bots :wink:
I might lose some links from the AI but I am happy with that.

Good luck with the indexing!

1 Like

Ok - yes I get it regarding the bots : dropping something on the web is now like dropping a piece of meat in a river full of piranhas. At my job I can see that on our main corp website at least 1/3 of the traffic is related to bots (over 200 hits/sec on 600h/s total average) and 5% completely scrubbed.

But habits change and I realized I now tend to 1st turn to Ai agents to ask something (instead of going through the usual search engine). Obviously that’s not the case for everyone (yet) and would depend on the audience, but if I had the chance to tune in the WAF I think I’d try to make sure GPT (and co) could access the content in order to answer someone’s question related to my website’s content ^^ (as now they tend to indicate the associated sources) and eventually lead to me (could be double-edged if the person got the answer before visiting my website though).

Thanks for the indexing luck, so far in terms of update :

  • Ghost support thinks that indeed it ‘could’ be related to my .tld (they gave me the example of the .xyz tld that apparently has a poor reputation w/ Google). I have few other domains with more standard .tld that I purchased and I will redirect them (301) to my current website see if it helps (learning progressively about SEO).
  • On Google community forums, people point out issues w/ the sitemap fetching. It is said that it can pop as “couldn’t fetch” and “sitemap could not be read” whereas in reality it is fetchable, it’s just pending (related to what @jannis mentioned earlier, “wait” looks like the plan in that case). And indeed, if I launch a url inspection on my sitemap URL in the google search console, I can see that it is reachable by Google, so there’s absolutely no issue for Google to actually “fetch” it.

I’ll continue building up the community, write articles and so on…give it a month or so to see how it progresses in terms of ranking, and if it’s still a super slow-motion I may switch to a standard ‘.com’ tld see if that’d make a diff for Google.