For anyone looking for a full text full configurable search engine Algolia does the job. It has a web scraper so you can scrape all your post into the index. Very smooth.
If you have a paywall / member only posts, the Algolia scraper will not be able to get the contents of the restricted posts and index them. Frustrating.
How to get content behind a paywall into the search index?
Cathy has a post on how to do this if you’re comfortable with Javascript.
I implemented it with a no-code solution using Make. Here’s the scenario for adding new posts to the Index on Ghost publish…
- Webhook Post Published
- Strip HTML from content
- Create the json object to send to Algolia
- Hit Algolia’s submit API with keys and the json-object
The POST to submit a page to Algolia is
https://{APPLICATION_ID}.algolia.net/1/indexes/{INDEX_NAME}
With two headers, x-algolia-application-id
and x-algolia-api-key
and the json payload
{
"url":"GHOST_URL",
"content":"STRIPPED_HTML",
"headline":"HEADLINE",
"objectID":"GHOST_POST_ID",
"description":"GHOST_EXCERPT"
}
you can add more stuff here, Algolia will happily index fields sent to it. The GHOST_ID is the master key, if you submit the post again Algolia will update the record.
For the initial index build. replace the “Watch post” with “Search Posts” and filter on Status = published. Run once to submit all existing posts and create the index.
And, as @Cathy_Sarisky writes in the blog post, restrict the HTML content from the API, else bad humans can pull your paywalled content from Algolia, it will still be searchable.