I don’t have the faintest idea how these files have been created: spreadsheets with translatable data that contain tons of HTML markup. Sometimes there are fairly complex HTML structures (like tables), lots of incomplete fragments, and in many cases even some translated items. Possibly somebody has to do a lot of manual copy&paste, but I never could find out …
When you try to process files like this as normal spreadsheets you’ll run into problems: markup can neither be evaluated nor protected, segmentation (and leverage) will be poor, and the result will most likely look like a mess.
The following recipe shows a way to prepare these files for translation.
More than three years ago the W3C has published an excellent note on Best Practices for XML Internationalization. To illustrate their intentions the authors of this note included a lot of examples for bad design – which is indeed very helpful.
How come that many authors of XML data and (even worse) authors of tools that create XML data have got that wrong and apparently use the bad design examples as templates for their data? Among others I have come across the following in recent years:
- Translatable content in CDATA sections
- Translatable attribute values (in some cases even multiple paragraphs!)
- Elements named like <x_01>, <x_02>, <x_03> and so on
- Documents that contain four or more languages
When it comes to localization, XML can be an excellent format. If only the W3C recommendations would be observed.
From time to time I come across XML data – most likely exports from content management systems – with fairly large portions of HTML embedded in CDATA sections. This brings up a problem: due to the nature of CDATA the markup can’t be processed properly because it is recognized as literal text. And apart from this: in most cases the embedded HTML is definitely not well-formed.
SDL has published a little HowTo in the Translator’s Workbench User Guide, but it’s effects are somewhat limited. The main drawbacks are
- Externals will not be recognized at all
- Translatable attributes like alt or title will be locked
The following will show you a more generalistic (but still simple) approach to deal with this nasty situation. It proved to work well with Trados 7.5+ and can most likely be adapted to the CAT tool of your choice.
In HTML authors often have the bad habit to “format” text using <br /> elements. A chunk of text from such a file may look like this:
<p> This is the first sentence.<br />This is the second sentence.</p>
How many of you are still working with Windows XP? I guess a lot …
Launchy is a keystroke launcher that gives you very fast access to your applications. Once the relevant folders (e.g. c:\program files) are indexed all you have to do is to press ALT+Space (can be configures), and a small dialog pops up. Read more…
A good text editor is a real nitty-gritty tool for any localizer. I know that many of you use UltraEdit, but there are excellent open source solutions around as well. From my daily experience I can recommend two of them: