Substack import 429 error

I’m using the migrate CLI to convert Substack posts for import into Ghost. Unfortunately, it’s failing, with a string of errors like this:

  Error: Unable to scrape URL <substack post URL>
  at ScrapeError (/usr/local/lib/node_modules/@tryghost/migrate/node_modules/@tryghost/mg-webscraper/lib/WebScraper.js:24:17)
  at WebScraper.scrape (/usr/local/lib/node_modules/@tryghost/migrate/node_modules/@tryghost/mg-webscraper/lib/WebScraper.js:109:23)
  at runMicrotasks (<anonymous>)
  at processTicksAndRejections (internal/process/task_queues.js:93:5)
  at async WebScraper.scrapeUrl (/usr/local/lib/node_modules/@tryghost/migrate/node_modules/@tryghost/mg-webscraper/lib/WebScraper.js:126:24)
  at async Task.task (/usr/local/lib/node_modules/@tryghost/migrate/node_modules/@tryghost/mg-webscraper/lib/WebScraper.js:160:59) {
errorType: 'ScrapeError',
scraper: 'Web',
url: '<Substack post URL>',
code: 'HTTPERROR',
statusCode: 429
}

HTTP error 429 is “too many requests” which makes a lot of sense in this case, since I’m trying to scrape hundreds of posts from Substack. Is there any way to rate limit this, so it scrapes the posts over a few minutes?

I ended up splitting the posts.csv into several files of about 20 posts each and running the migrate command on those individually. It’s not clever but it works.

1 Like