Dear Ghost team,
Ya’ll are awesome, and I’m loving all the new features. What I’m not loving is accidental breakage on Friday, that won’t be followed by a fixed release until Monday. Friday’s roll-out of 5.61.0 is an example. It broke the admin panel, making new installs unusable. There was a bunch of extra forum traffic this weekend as new users with fresh installs (and some folks who’d just run an update) discovered that their admin panel was broken. How many folks tried out Ghost and gave up, without finding the forum? This was not the first time that the latest version of Ghost has been broken over a weekend. A weekend’s a long time to stay broken.
Ghost Pro doesn’t seem to get updated to the latest version immediately (and good thing - there’d have been a lot of unhappy users over the weekend!), but having the latest version shipped out to anyone installing via the CLI is making things harder for self-hosters, and making it harder for the volunteers on the forum to keep up with community support needs. Friday releases increase the length of time that the self-hosted version risks broken when a breaking bug sneaks out into an npm build, since npm build problems on Fridays seem to get resolved on Mondays.
I like weekends as much as the next person, and I’m not asking the Ghost team to squash bugs on weekends! But… if the team isn’t going to fix new and breaking bugs in new releases over the weekend, maybe Friday is not the right day for new releases, or maybe the behavior of the Ghost CLI should be changed to have a ‘bleeding edge’ version and a ‘currently active on Ghost Pro’ version, with the safe version set as the default.
My unsolicited two cents.
@Cathy_Sarisky I agree. I used to be a tech lead for the largest pet adoption website in the US and we made that as a formal role:
- No production deploys on Fridays (outside of emergencies)
We’d had some on-calls weekends ruined prior to instituting that rule, but none afterwards. It worked so well, we added another:
- No deploys in the last hour of the work day. Now with multi-timezone teams, that doesn’t make as much sense, but I think it’s still worth considering if anyone is going to be around to monitor and possibly rollback a release immediately after it goes out.
Thanks for taking the time to put together this feedback.
I wanted to shed a bit more light on our release and rollout process, which will hopefully enable you, and others, to match your own processes in a way that will give you some of the confidence stability you are looking for:
- As I’m sure you know, the Ghost team commits to the main repository continuously throughout the week.
- On Friday, 4pm UK time, we cut a new release.
- On Monday that release is rolled out to 10% of our Pro users.
- On Tuesday that release is rolled out to the remaining 90% of Pro users.
In nearly all cases, the release that is cut on Friday is exactly the same release that is rolled out to Pro users. On some occasions we discover bugs that require a patch release between weekly release and rollout, but this is rare, and nearly always unrelated to the actual release, usually coming from support tickets raised by Pro users. We also patch bugs like this throughout the week.
Regarding the issue you cited with release 5.61.0, this was an unlucky bug in the build process that meant the 4pm release was completely broken. Once discovered an engineer came online out of hours to deprecate the version at 11pm that evening & prevented anyone else being affected. A fixed build was then released first thing on Monday. It is very unusual to have an issue like this that prevents new sites being created.
We take bugs and issues, and their impact, extremely seriously and fix them with a speed and priority that in my own experience is unrivalled. That being said, prevention is always better than the cure, and we are also always looking to improve our testing to avoid these types of issues.
I really appreciate the time our experts put in to support Ghost users here on the forum and Github, and part of what makes being an Open Source product great is that when we do have unforeseen issues we have an active community who can raise them with us quickly before the impact spreads too far.
@Nick_Moreton Nick, thanks for taking the time to reply! And big kudos to the engineer who did out of hours deprecation.
I very much appreciate how fast the Ghost Pro team is to patch the easier bugs as they get identified.
I didn’t intend to suggest that Ghost Pro users were getting a different release, just that they get a better release schedule. Is there a reason for cutting the npm release on Friday at the end of business? Could it not be cut at the start of business on Monday, just before the Ghost Pro release starts? I do most of my work as a small team of one, but my experience is very similar to Mark’s larger group experience - deploys late in the work day/week are undesirable. I try not to break stuff right before bedtime! Not trying to be critical, just genuinely curious.
No worries :)
In terms of releasing on Friday rather than Monday, the truth is that we tried the alternatives and we learned this was the most healthy and effective way for us to work and ship quality code. We used to have the hard “no releases on Friday” rule and would release on Monday, but this actually lead to more problems - people would overwork, they would push changes on the weekend and ultimately this lead to more bugs and issues.
Obviously the original issue you mentioned affected new installs, but putting that aside, when we talk purely about updating Ghost I would encourage you, and others for whom Ghost is such an important part of their stack, to consider switching up your own update schedule to match our schedule for rolling out on Pro. So rather than pulling updates on Friday, hold off until Monday or Tuesday . We know some users stay 1-2 weeks behind on their installed releases for this very reason.
The Friday release is really for us to guide development cycles, and whilst we have confidence in what we release to the level that we roll it out on to our Pro instances at the next earliest opportunity, we have no expectation that those self hosting should seek to immediately update.
There should be no sort of automatic update enforcement in any of our self hosting tools and systems that I am aware of, but please let me know if I’ve got that wrong.
One part worked well-- After the release was deprecated, a
ghost update would not update to that release any more, so self-hosters no longer would get updated to the new release after that. There wasn’t a visible change on the Github Releases page, but at least those upgrades were stopped.
After this happened, I reviewed the
.npmignore file to see if there were more deadly regex’s that could cause the same kind of failure in the future and I don’t see any.
I wonder if there’s a feasible way to issue a release first as a beta, followed by automated promotion to stable unless the process is stopped because significant issues are reported. (Maybe 24 hours later?)
With the (Pro) rollout, the change goes out to 10% before 100%, and some kind of beta channel could allow a percentage of self-hosters to opt-in the beta channel so that on a percentage of them would get those releases before it’s stable, also minimizing the upgrade risk for that group. The question is: would enough people opt-in to get the releases earlier to be a useful testing cohort to make the extra release step worth the implementation effort.
A variation on that idea would to be bake-in the option to wait a couple of weeks after a release before upgrading. This could be another variable in
config.production.json that the Ghost CLI could factor in when
ghost update is run.
For example, a new question could be added to
ghost install about whether you want delayed updates for stability or not.
Any release marked as a security release would always be considered for immediate use.