TidySearch Opensource/ad free/java free/no tracking search engine

This is not an advertisement.

Hi, I love Ghost (Self hosting) very much. You all have helped a lot in getting our news site running and again, I thank you all.

So, as a way to give back, I’d like to tell you about this project called TidySearch. I’ve been building this thing of over a year now and we are approaching 2,000 sites that (when they, the corporate controllers don’t block it), has indexed 444,372 links as of the time of this post.

I was over in https://explore.ghost.org/ just now and even found MPAQ News :smiling_face_with_three_hearts: . According to the list, it says “Last week, 13,806 brand new publications got started with Ghost.” I’ve added a few of them but I’m wondering how I could get a list of sites, preferable the RSS links or (better still, Site Name, Website, RSS, Language) import to the database to help boost all these hardworking folks.

At this point, it is still completely hosted in one location but I’d like to someday make it a decentralized network (still trying to make the API) to help kill those corporate controlled systems, like the fediverse and XMPP is doing.

So, if it is possible, I would like some kind of list that I can add to the search engine, again, to help the publications, because TidySearch can help them.

If you would like to visit it, https://search.mpaq.org and my email/XMPP is bob@mpaq.org

As far as I am aware, these are npm installs (that’s at least the answer I got in the past at some point). The number roughly fits https://www.npmjs.com/package/ghost from last week.

Quite easily. Have a look at https://explore.ghost.org/sitemap.xml :wink:

That will list all sites in Ghost Explore. Then you can grab the URL from there.

Keep in mind crawling etiquette though.

Unfortunately, that just lists “categories” but thanks anyway. Getting old adding one at a time :old_woman:

TS is very careful about obeying the rules unlike those corp scums running AI and stealing peoples hard work. If there is the that says noindex, nofollow … and all that, TS automatically sets the bot id high so it never gets hit again, LOL

At the moment, it just uses the RSS feeds because sites are so infested with so much garbage that I have yet to come up with a “site scraper”.