Apache OpenOffice (AOO) Bugzilla – Issue 108682
RTF: Non-ascii metadata in exported RTF not properly encoded
Last modified: 2017-05-20 11:19:27 UTC
When writer saves a file as RTF, some non-ascii characters in metadata (e.g. author name) are written literally without hex-escaping, in conflict with the RTF specification and other software expecting all RTF content to be ascii. For example, if you (having a new empty writer document): 1. Set first/last name in Tools->Options->User Data to Ján Ľos; 2. Update document metadata using File->Properties->General->Reset (with 'Apply user data' checked); 3. Save the file as RTF; you'll end up with this declaration inside the RTF: {\info{\upr{\author Ján ?os}{\*\ud{\author J\'e1n \u317\'3fos}}} ... Note that the U+00E1 A WITH ACUTE ACCENT in \upr is encoded as literal byte 0xe1, not as the proper (and later used) escape sequence \'e1, as required for all characters beyond 7-bit ascii by the RTF specification. Same issue applies to other (but not all) characters in the codepage used in the file. For another example, using the name Ján Hraško, in locale sk_SK.UTF-8 and all language settings set to Slovak, results in a file encoded in codepage 1250, with relevant code: {\info{\author Ján Hraško} (converted, the bytes are again 0xe1 and 0x9e, above 127 but not escaped).
sorry, U+0161 LATIN SMALL LETTER S WITH CARON is 0x9a in cp1250
MRU->HBRINKM: the correct (as MS Word does) format of non-ASCII characters in RTF-Metadata should be as: {\info{\upr{\author J\'e1n ?os}{\*\ud\uc0{\author J\'e1n {\uc1\u317 Los}}}}
I cannot reproduce this bug using ooo320-m19.
Reset assigne to the default "issues@openoffice.apache.org".