Advice on importing 250+ MB of posts?

utrenkner · October 22, 2018, 8:03am

I am trying to import over 250 MB of articles from an old CMS into ghost 2.2.4. I successfully imported 50, 100 and 500 articles. Now I wanted to have a go at the full import of almost 30 000 articles. But the import fails for no clear reason. Chrome’s console just says “Error: Server was unreachable”.

Looking at GitHub, it seems others have asked for a CLI import option already. So this does not seem an option at the moment.

Is importing chunks of data (e.g. 1000 articles at a time) the only option? How would you approach such an import?

To be clear: This is not a supported configuration. We use HardenedBSD (a fork of FreeBSD) as operating system and h2o as web server. Everything seems to run smoothly and - as mentioned above - even the import works (with smaller files).

Kate · October 22, 2018, 8:21am

Very large imports currently suffer from the fact that we don’t use polling, tracked here. And if your process has not enough memory, it will probably die with such a large file. The max size i have tested was 25mb.

So i would try to import the 250mb locally with a script. And you need to ensure that you give the node process enough memory. As soon as you have imported the file successfully, you could dump your MySQL database and upload it to your server.

The alternative is: splitting the big file into multiple smaller files (as you tried already). But that takes very long i guess.

utrenkner · October 22, 2018, 2:20pm

Thanks a lot. I went with the chunked way. That went so-la-la.

FYI:
I tried to import 1000 posts at a time and sometimes only a part was imported and the unspecified (network?) error showed up.

Trying again and again, I managed to import almost everything. But 9000+ entries were duplicates, triplicates etc… I could easily identify them in the database because their slugs end on -2, -3 etc. Thus, I managed to delete them with this SQL:

delete p,a from posts p join posts_authors a on p.id=a.post_id where p.slug REGEXP '\-[23456789]$';

Now, I have 27863 posts in the database and have to identify the ones that were not exported/imported.

Kevin · October 22, 2018, 2:36pm

The network error you see in the browser is a timeout on the browser’s part, it doesn’t mean that the import has failed. Although the browser drops the connection the import is still being processed by the server and will continue in the background - this is probably why you ended up seeing duplicates if you were re-trying the same import.

This is why the polling solution was mentioned - we want to be able to show the progress of the background import process rather than having to make the browser wait until the import has finished.

utrenkner · October 22, 2018, 6:47pm

Thank you @Kevin ! Now I got the relation to the polling solution.

system · November 5, 2018, 6:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem with "ghost import" does not work on browser and server terminal Developer help	2	500	November 28, 2020
Import 100mb content from other ghost installation Developer help	15	1476	April 17, 2021
Importing Error Developer help	1	582	September 3, 2020
Import failed - I cannot import a migration archive, it fails! Using Ghost	3	2996	February 18, 2020
Importing posts to Ghost Developer help	11	1297	September 9, 2019

Advice on importing 250+ MB of posts?

Related topics