Looking for a HTML to Ghost process

Hello!

A friend has a website with MANY HTML pages, with words/photos/youtube in them.
I have a list of every page.

Does anyone know of a good process to get these across to Ghost without having to manually create 500+ pages?

Must be a good tool everyone uses?

Thanks in advance
Sam

Hello @samjamesnz. That’s an interesting question! Unforunately, I’m afraid I don’t have a perfect solution for you.

But you could convert all of the HTML pages into a JSON file and then import that file into Ghost, which would avoid a lot of tedious, manual work in the admin dashboard, creating pages manually, one by one.

But then, you still would need to convert all of this existing HTML content into a JSON file, ready for importing. You can have a look at what one of these JSON files looks like by starting with the demo installation of Ghost and exporting the content from the admin dashboard to your computer. Open it up in a text editor and you’ll see the structure of the file and where everything would need to go, for Ghost to properly create individual pages as you intend.

There might be some automated or semi-automated conversion process that can parse the HTML pages and convert them all into this JSON file for you, but I’m not aware of any, off the top of my head. I’ll update this post if I find something. It would certainly be possible to write some code that could automate this process, but then it’s a matter of weighing up whether that will take longer and/or more expertise than copying and pasting the content into a JSON file manually.

Keep us updated. I am interested to know how you go with this.

When you say you have a “list” of all the pages, do you mean a list of URLs? Does your friend have access to copies of all the HTML files on their local computer? Or do they not have access to these files or not know how to access them from the server or wherever their site is being hosted from?

If they don’t have the files and don’t know how to access them directly, the first step might be either figuring out how to access the files (SFTP) or alternatively using some software to “scrape” their own website, which will effectively download all of their HTML files, photos etc. so that they do have a local copy of the entire website. This certainly isn’t the most efficient way to get a copy of your own website files, but it might be the easiest for somebody who is not familiar with more streamlined processes.

I had to do a bulk import when migrating to Ghost. We were able to extract key aspects of each story (e.g., title) programmatically. We then embedded the story minus those elements within an HTML card.

Of course, this does not work for photos that are hosted locally, but we reduced the amount of hand work that needed to be done significantly. You could process the files to extract photo references and then programmatically build a Ghost Gallery object for them.

We also created an “archivist” staff member and a Stacks tag … all imported stories are organized that way making it obvious it’s legacy content.