Vector database for search with AI embeddings

Hello Community,

I am planning to improve Ghosts search by adding a chatbot that can not just do a syntax search, but also make sense of content and give proper answers (and still refer to the article for further information). My content is pretty long and finding the right answer can be tricky, especially when the keyword is not part of the headline.

This can be done with Chatbot software like Flowise (Open Source) for example. However, data-wise, I would need my Ghost content to be stored in a vector database like Pinecone, Qdrant, SimpleStore, Supabase, etc…, to then use embeddings (I’d do this with Open AI) to search and cluster the knowledge, and to generate answers.

I know there is an Algolia integration available. But I find Algolia not the best solution for what I’m trying to achieve. It is a nice way to improve indexing and auto-complete searching, and for sure a step up from the native search. But it is not made to provide proper AI generated short answers based on my content.

So my idea is to somehow duplicate my Ghost content into a vector database to use it from there for the desired search queries. Not the smoothest solution because there would be a lot of syncing required.

Does anyone have an idea about a different approach or has done something similar? Is there maybe an easy syncing solution between Ghost and a vector storage available?

Same approach as the Algolia integration? Set up webhooks so that new/updated content from ghost gets added to the vector database?

Good point. Will try to do that. My current challenge is to find a solution that not only inputs new content into new vectors, but also deletes old ones in case there was an update.

I’ve seen solutions like airbyte doing that, but I’m still trying to gather more information on how vectors work.

Ghost has a webhook for deleted content, but I can’t tell you what to do on the vector side. :)

1 Like

How about accessing member-only content via API or Webhook? As far as I can see, the API can only access content that is published and not behind a paywall.

You could use either a webhook (which ignores content restrictions), or the admin-api, which can retrieve content for any tier. (And also drafts, so you’ll probably want some filtering!)

1 Like

Perfect. Thanks so much for that @Cathy_Sarisky

you can use n8n for process automation. I do not have experience with ghost but they have ghost integration. I am into AI, RAG and ML so can help you with n8n and vector databases as well

Thank you. I was going for the open source solution airbyte.com. However, I’ll have a look at n8n, pipedream and activepieces (I’m a make.com heavy user so far).