Issue 108682 - RTF: Non-ascii metadata in exported RTF not properly encoded
Summary: RTF: Non-ascii metadata in exported RTF not properly encoded
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: save-export (show other issues)
Version: OOo 3.1.1
Hardware: Unknown All
: P3 Trivial with 1 vote (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-26 15:54 UTC by miroslavlos
Modified: 2017-05-20 11:19 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description miroslavlos 2010-01-26 15:54:59 UTC
When writer saves a file as RTF, some non-ascii characters in metadata (e.g.
author name) are written literally without hex-escaping, in conflict with the
RTF specification and other software expecting all RTF content to be ascii.

For example, if you (having a new empty writer document):
1. Set first/last name in Tools->Options->User Data to Ján Ľos;
2. Update document metadata using File->Properties->General->Reset (with 'Apply
user data' checked);
3. Save the file as RTF;

you'll end up with this declaration inside the RTF:
{\info{\upr{\author Ján ?os}{\*\ud{\author J\'e1n \u317\'3fos}}} ...

Note that the U+00E1 A WITH ACUTE ACCENT in \upr is encoded as literal byte
0xe1, not as the proper (and later used) escape sequence \'e1, as required for
all characters beyond 7-bit ascii by the RTF specification.

Same issue applies to other (but not all) characters in the codepage used in the
file. 

For another example, using the name Ján Hraško, in locale sk_SK.UTF-8 and all
language settings set to Slovak, results in a file encoded in codepage 1250,
with relevant code:
{\info{\author Ján Hraško} (converted, the bytes are again 0xe1 and 0x9e, above
127 but not escaped).
Comment 1 miroslavlos 2010-01-26 15:56:47 UTC
sorry, U+0161 LATIN SMALL LETTER S WITH CARON is 0x9a in cp1250
Comment 2 michael.ruess 2010-01-28 10:25:52 UTC
MRU->HBRINKM: the correct (as MS Word does) format of non-ASCII characters in
RTF-Metadata should be as:

{\info{\upr{\author J\'e1n ?os}{\*\ud\uc0{\author J\'e1n {\uc1\u317 Los}}}}
Comment 3 Miklos Vajna 2010-07-30 19:09:51 UTC
I cannot reproduce this bug using ooo320-m19.
Comment 4 Marcus 2017-05-20 11:19:27 UTC
Reset assigne to the default "issues@openoffice.apache.org".