Skip to content

start:

How to create a localized version of OOo


This document descibes how to proceed to create a localized version and gives some brief background information about ISO Codes and the resourcesystem.

Build environment

First of all you need the source code and a working build environment. Please visit

http://tools.openoffice.org/

Building Linux
http://tools.openoffice.org/dev_docs/build_linux.html

Building Windows
http://tools.openoffice.org/dev_docs/build_windows_tcsh.html
http://tools.openoffice.org/dev_docs/build_windows.html

I suggest to join the mailinglist dev@l10n. You can find a lot of people involved into localization there

http://l10n.openoffice.org/servlets/ProjectMailingListList

ISO Codes

Languages are handled within the buildsystem using ISO Codes. As discussed in Issuezilla Task #i8252# we support a subset of RFC 3066 [1] for the language identifier.

RFC 3066 basically says:
1. Use ISO 639-1 [2] if possible
2. Use ISO 639-2 [2] if no ISO 639-1 code exists
3. Use ISO 3166 [3] country code if necessary, to separate to languages with the same language code, e.g. US English and British English.

This means we'll have these codes for example:

ISO 639-1 sv Swedish
ISO 3166 en-US US English

To prevent matching problems, languages with identical language and coutry code like “de-DE” are reduced to the language code “de”. Currently the build environment don't support ISO 639-2.

Resourcesystem

The OpenOffice.org source code contains several file types, where strings and messages are declared:


*.src / *.hrc : Contains main UI strings
*.xrm: Contains readme strings
*.xcu: Contains configuration strings
*.ulf: Container for strings converted to several custom formats (.rc, .par)
*.xhp: Contains online help strings

An example of a src resource file:

...

CheckBox CB_READ_ONLY
{
Text [ de ] = "Nu~r lesen" ;
Text [ en-US ] = "~Read-only" ;
};

...


Untranslatable string in src files have no language code:

Text = "50%"


The untranslatable strings in xml files are marked by the reserved language identifier “x-no-translate”:

...
<Text lang="x-no-translate">OOo</Text>
...

Another reserved language identifier used for documentaion purpose is “x-comment”, these strings are known as the old developer English and no longer used. The comments are outdated. The standart string encoding in the all formats like src , ulf , xcs , xcu is UTF8.

The interface between the source code and the translation tooling is the so called sdf file format ( also know as gsi ). The sdf intermediate file format is introduce here:

http://l10n.openoffice.org/L10N_Framework/Intermediate_file_format.html

Note that this format is strict and should not violated. You can use the tool “gsicheck” in the module transex3 to perfom simple format checks. Deviant to the sdf file format, the language column have been changed from numeric language identifier to ISO Codes.

In the source files are only English and German strings. There is a sdf file particle called „localize.sdf“ placed inside each directory, which contains all other language strings.

sw/source/ui/frmdlg/ frmpage.cxx

frmpage.src

localize.sdf

wrap.src

...

The content of the particle SDF file is merged into the temporary copies of the source files during the build. From that temporary source files, now containing all strings, the resources files are created. You can use the localize tool to collect and merge strings back into that sdf particles.

Real life

What steps are need to create a OOo build for Khmer?

There are some steps nessesary to add a new language to the OOo build environment.

First introduce the ISO Code for Khmer (“km”) to the build environment, add a new entry in solenv/inc/postset.mk:

...

completelangiso=af ar be-BY bg br bn bn-BD bn-IN bs ca cs cy da de el en-GB en-US en-ZA eo es et eu fa fi fr ga gl gu-IN he hi-IN hr hu it ja km kn-IN ko lo lt lv mk ms ne nb nl nn nr ns pa-IN pl pt pt-BR ru rw sk sl sh-YU sr-CS ss st sv sw sw-TZ sx ta-IN th tn tr ts ve vi xh zh-CN zh-TW zu

...


To build the Khmer resources set the new environment variable “setenv WITH_LANG "km" every time before building, you can also use configure to set this variable in the LinuxIntelEnv script.

Now introduce the language to the tools/source/intntl/isolang.cxx

...

{ LANGUAGE_KHMER, "km", "KH" },

...

and tools/inc/lang.hxx

...

#define LANGUAGE_KHMER 0x0453

...


The hexadezimal value in lang.hxx is the Microsoft Language identifier. The MS Lang IDs are used in core code, and are of course also needed when storing documents in MS binary file formats. They are not relevant for UI localization, except in one place where they are used for values in a language listbox, see svx/source/dialog/langtab.src . For futher information about that identifier visit the Microsoft webseites [4]

There are additional steps needed to do for the complete introduction of this language, please consult this document http://www.khmeros.info/tools/localization_of_openoffice_2.0.html

Ensure that you have sourced the file LinuxIntelEnv so you have a proper environment.
First build the whole office. This is needed to build all necessary tools.

cd instsetoo_native
build --all

Use the localize tool to extract and merging the strings

Extracting the strings:

perl -w localize.pl -e -l km=en-US -f khmer.sdf

Note that the Khmer strings are fallbacked to English US, if there is no existing translation yet. Choose the language that best fits as fallback.

Now translate the strings in the new created file “khmer.sdf”. You can do that by simple translate the file by hand or use tooling. Common are web based translation or a conversion to the PO file format and using the corresponding translation tools like KBabel. Please have a look here http://www.khmeros.info/tools/oo2.0_program_translation.html

Merge the strings back:

perl -w localize.pl -m -l km -f khmer.sdf

The tool distribute the translated strings into the sdf particles.

Create the localized build

cd instsetoo_native
setenv WITH_LANG "km"
build --all

After a sucessfull build you should find localised install sets in your instsetoo_native/<platform>/OpenOffice/<package_format>/rpm/install tree

Links

[0] Much detailed additional documentation:
http://www.khmeros.info/tools/

[1] For more information about RFC 3066:
http://www.faqs.org/rfcs/rfc3066.html

[2] For more information about ISO 639-1 and ISO 639-2:
http://www.loc.gov/standards/iso639-2/

[3] For more information about ISO 3166:
http://www.iso.org/iso/en/prods-services/iso3166ma/index.html

[4] Microsoft Language Identifier


The complete list, not necessarily supported by Windows: List of Locale ID (LCID) Values as Assigned by Microsoft
http://www.microsoft.com/globaldev/reference/lcid-all.mspx

NLS information page
http://www.microsoft.com/globaldev/nlsweb/

Table of Language Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_238z.asp

Primary Language Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_61df.asp

SubLanguage Identifiers
http://msdn.microsoft.com/library/en-us/intl/nls_19ir.asp

WD2000: Supported Language ID Reference Numbers (LCID)
http://support.microsoft.com/default.aspx?scid=KB;en-us;q221435