Bulk Changes to Mobiledoc | Migration from Wordpress

Hello together,

I am looking for solutions in terms of bulk changes.

The challenge looks as follows:

I have migrated my blog from WP to ghost. The website link is https://bitcoin-2go.de

Now, most of the old wordpress content is having a code pattern similar to

<a href="/content/images/wordpress/202x/"><img src="..."/></a>

Since I have not migrated the images, all of these anchor tags link to non-existing sources. Thus, they throw a 404 error.

Therefore, I need to get rid of all existing anchor tags linking to the old folder structure. To do so, I have used the following regex expression:

<a href="/content/images/wordpress"(.*?)</a>

After re-uploading the edited mobile-doc file I first considered my mission as successful since all the images had been removed from the content. So far, so good.

After running a crawl with ahref I had to realize that for all of the existing articles new permalinks/slugs had been created with the following scheme:

https://bitcoin-2go.de/link-to-article/__GHOST_URL__/content/images/2021/07/b2go-lightmode.svg

whereas the last part “GHOST_URL/content/images/2021/07/b2go-lightmode.svg” equals the path to the site logo.

As a consequence, this produces an enormous amount of 404 error. How can I get rid of this or what did I do wrong?

Thanks for your help

__GHOST_URL__ is replaced with your configured site URL when rendering, it means that changing your URL (even changing subdirectory) is possible without having to touch your content at all.

It’s not clear from your description what replacement you actually performed. Can you provide a before, after-expected, and after-actual? How did you perform the change, via the API or directly in the database?

Hey Kevin,

thanks for the reply. Okay, let’s get more concrete of the changes I have performed.

My goal was so automatically remove all anchor-tags linking to a non-existing wordpress image. Thus, I removed via regex all a-tags containing the path /content/images/wordpress or /wp-content/uploads.

All actions have performed in the .json-file containing the content of my blog.

I have used the following regex:

<a href=\\\\\\"/content/images/wordpress/(.*?)</a>

and

<a href=\\\\\\"/wp-content/uploads/(.*?)</a>

What would be the expected result?

The expected result would be that all previously broken images within my articles are now removed. And this is indeed true. They are all removed.

However, I have run a crawl with ahref and received the following unexpected result:

I do now know why but obviously the path to my site logo is appended to the permalink of every single article I have.

Now, I want to get rid of all those 404-errors and have a proper link building.

Hopefully, it became more clear what my actual goal is.

Best Regards

You show the regexes you’re using for matching but what are you replacing the matches with?

The actual result of one of the replacements would also be a lot more useful than ahref reports.

I am deleting the matched patterns since all of the anchor tags refer to non-existing images.

Attached you see a screenshot of the process

Step 1: Regex match

Step 2: Click “Replace all” leading to the following result

And now, we’re coming back to the ahrefs report. After I have deleted all the anchor tags with the broken images, Ghost somehow adds the path to my site logo to all existing permalink.

This is where the confusion arises.

How does the data look in the database after you’ve imported - both the mobiledoc and rendered html fields? And what is the HTML output when viewed on your site?

For the answer, I will refer to the article for which I have sent the screenshot above.

Question 1: How does the data look in the database after you’ve imported?

The mobiledoc part looks exactly the same:

The HTML part looks fine as well:

The rendered result looks fine too. Since I have not deleted the surrounding <p> tag neither the <em> tag we can still see a subtitle. However, this is fine.