Techniques for semi-automatic theme translation

Hello fellow Ghost devs!

I’m setting to work getting a theme-translation ready. The work is tediously repetitive, and begs to be automated!

I thought this could be a place for Ghost devs to share the techniques they have used to speed up the process.

I see that @GBJsolution, @NickAbs, and @PriorityVision have done fantastic work on automating the process of converting translation helper strings into locale files, with the following projects:

I’m sure @Cathy_Sarisky has dealt with this too, and come up with some clever solution.

However, I haven’t found any projects that aim to tackle the “finding all strings need to be wrapped in translation helpers” part of the process.

(Previously, I manually found strings that needed to be translated by looking through the code, and at the running site, and added the translation helper. Then, I would search the theme files for the translation helper, and copy the strings into locale .json files, and use DeepL/Google translate + what fluency I had to translate the .json files.)

So here is my first baby step:

find . -name '*.hbs' -exec cat {} \; | sed -e 's|<[^>]*>||g' -e 's|{{[^}]*}}||g' | strings -- | grep -F -f /usr/share/dict/words

This bash oneliner sends all relevant theme files through a pipeline that strips out html tags and handlbars includes/comments/helpers, and searches for English words.

The good thing is that it gets stuff that’s hidden on the front-end, like error and success messages.

Obviously, this will only get me part way there: it will miss translatable strings in handlebars helpers, such as prefixes and pluralization, and it will also miss mis-spelled, uncommon, or made up words. But it’s a start! Once I have wrapped these strings (I hope to automate this but I might not bother this time around), I will be able to search for prefixes, suffixes, and plural helpers fairly easily. But I’ll probably write a snippet for that too, once I get to it.


If any of you have snippets you would care to share, or have found projects that do a better job of this, let me know!

I’ve got a very rudimentary prototype[1] that walks the Handlebars AST[2] and then uses cheerio to find text in leaf elements. Running it against Source:

post.hbs: —
post.hbs: Read more
partials/email-subscription.hbs: Subscribe
partials/post-card.hbs: By 
partials/post-card.hbs: , 
partials/icons/lock.hbs: lock-1
partials/components/featured.hbs: Featured
partials/components/footer.hbs: Ghost
partials/components/header-content.hbs: Search posts, tags and authors
partials/components/navigation.hbs: Sign in
partials/components/navigation.hbs: Subscribe
partials/components/navigation.hbs: Sign in
partials/components/navigation.hbs: Account
partials/components/post-list.hbs: Latest
partials/components/post-list.hbs: See all 
partials/components/post-list.hbs: Subscribe
partials/components/post-list.hbs: Upgrade
partials/components/post-list.hbs: Recommendations
partials/components/post-list.hbs: See all 

It doesn’t analyze any handlebars expressions right now, but it’s not a lot more effort to add support (especially if you or someone can share examples)

Also, since it’s an AST[3], it’s possible to replace text literals with a translation helper, and analyze usage of the translation helper to find missing/unneeded translation strings


  1. No nerd sniping to see here :disguised_face: ↩︎

  2. 95% of this came from gscan :smiley: ↩︎

  3. It’s not completely simple, because Handlebars doesn’t parse the HTML. This means the location would have to be rebuilt (handle removing marker tokens and translating a cheerio location to a Handlebars location) and it could be error-prone. ↩︎

1 Like

@vikas, very cool! This sounds like a way less hacky approach. Do you have any plans to open-source (or sell) your helper script?

I was hoping it would be a simple sed replace to wrap the strings once I’d found them, but most of the translatable strings include dynamic data from other handlebars helpers. This means the script to programmatically wrap them in translation helpers has to get the interpolation syntax right… cue xkcd #1319 :innocent:

That xkcd is spot on. Most themes just don’t have enough text to merit a bunch of automation work, so if you can get Vikas’ script to identify the text, you’ll be almost done.

1 Like

that was my experience too. If you are using a modern editor like VS code a regex search can be done in the search box:

for example, searching ["']\s*>\s*[A-z|0-9]+ on the hbs files will find most of the strings (you will need to double check for hard coded default strings in forms etc).

It is a quick job to add the translation helper around the strings - the hard bit is getting good quality translations :)

1 Like

Yep, once it’s not so hacky I’ll publish it! I mostly prodded at it because it’s been a bit since I’ve done AST work :grinning:

2 Likes

perhaps this is a good juncture to talk about helping theme developers source translations?

I initially thought we might be able to crowdsource the translations for all themes, but having (inadvertently) analysed half a dozen translated themes in my attempt to source translations (see link above), I now worry that it would quickly become unmanageable (even the small number of themes I looked at contain 200 distinct strings)

A possible alternative would be to create a ‘starter kit’ set of theme locales with a small number of theme-appropriate strings, say 20 to 30 strings so as to not deter volunteer translators unduly. A few random thoughts on this approach:

  • If these locales were distributed with ghost core then perhaps the translation efforts could be tied into the overall - very effective! - translation approach adopted for core, using the same github-based process to manage the work.
  • at some point it would be marvellous if the ‘starter-kit’ translations (and the translations in the core) could be made automatically available in the theme, but in the mean time the locales could be manually copied by theme developers who wanted to use them.
  • the initial list of translation candidates could be chosen by a benevolent dictator and voted on if needed
  • we could ‘seed’ the v1 translations using a technique similar to my slightly hacky script above.

downsides:

  • obviously this will limit the choices of theme developers in terms of text choices, although of course there is nothing to prevent them seeking additional translations as they have to do anyway today.
  • Perhaps a lightweight process could be established to add new strings to the ‘starter kit’ should there be enough demand?

What do you think? Is there a better way?

1 Like

@vikaspotluri123 Fantastic! Thank you. I’ll be making use of it :)

@NickAbs I think that’s a good idea. However, automated translation tools are usually adequate for the “Página N de N” type text that would be common to many themes. Where they fall short is on the longer texts, where more nuance is needed — but these long strings are less likely to be common to many themes.

I ended up using:

  1. my bash oneliner (plus searches for “prefix”, “plural” etc) to find the strings. (Next time I’ll use @vikaspotluri123’s tool instead.)
  2. @NickAbs shell scripts to generate json from them (it created a few syntax errors in the json, the source of which I didn’t manage to track down, but still saved a lot of time)
  3. @parvineyvazov/json-translator (which you can run like so jsontt locales/en.json --module google --from en --to fr es, swapping out your provider and target languages) to populate basic translations.
1 Like

I’ve published an initial version - gt3 (npm) that should cover #1/#2.

I’ve tested against source and attila with promising results.

  • There is no error handling for files with syntax errors (yet)
  • There are some unimplemented Handlebars features (e.g. decorators) because I don’t have an example to test against
  • Not all text is detected yet - please file an issue so I can handle the edge cases!
2 Likes

@vikaspotluri123 This is fantastic! Thank you so much for open-sourcing this.

1 Like