103402 – need to skip diacritics in Hebrew spellchecking

Issue 103402 - need to skip diacritics in Hebrew spellchecking

Summary: need to skip diacritics in Hebrew spellchecking

Status:	CONFIRMED

Alias:	None

Product:	General
Classification:	Code
Component:	spell checking (show other issues)
Version:	3.3.0 or older (OOo)
Hardware:	Unknown All

Importance:	P3 Trivial with 4 votes (vote)
Target Milestone:	---
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-07-08 07:18 UTC by alan
Modified:	2013-03-12 11:44 UTC (History)
CC List:	7 users (show)

See Also:
Issue Type:	PATCH
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
proposed patch (2.88 KB, patch) 2009-07-08 07:19 UTC, alan	no flags	Details \| Diff
revised - changed a < to <= (2.88 KB, patch) 2009-07-08 07:35 UTC, alan	no flags	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description alan 2009-07-08 07:18:20 UTC

Hebrew is usually written without diacritics. However, sometimes the diacritics
are written as special marks located within, above, or below consonants. The
diacritics are represented internally as separate Unicode characters. Hebrew
dictionaries check for words without diacritics and will continue to do so for
the foreseeable future.

This patch filters the diacritics out of a word, before spellchecking it.
(Using breakiterator is not appropriate, since we don't want word-breaking at
the diacritics)

I don't know whether this functionality is needed for other languages as well,
perhaps Arabic or Persian, or maybe some LTR languages. The patch is written in
a generalized way, so that adding a language is fairly easy:

1) add another "case LANGUAGE_WHATEVER" to the "switch (nLanguage)" statement,
and create a string with the diacritics to be skipped 

2) add "|| nLanguage == LANGUAGE_WHATEVER" to the assignments of the boolean
variables

Comment 1 alan 2009-07-08 07:19:35 UTC

Created attachment 63421 [details]
proposed patch

Comment 2 alan 2009-07-08 07:35:24 UTC

Created attachment 63425 [details]
revised - changed a < to <=

Comment 3 elisko 2009-07-08 10:34:58 UTC

As I understand it, Sanskrit-based languages such as Hindi also employ diacritics.

Comment 4 kaplanlior 2010-08-14 19:02:09 UTC

#99796 has a very similar problem, I think the two should be fixed together
(probably the same code). Notice this is not the same problem, just a similar one.

Comment 5 thomas.lange 2010-08-18 07:45:17 UTC

taking ownership as well.

tl->ayaniger: If you provide patches for the linguistic please assign them
directly to me, if by bad luck I may not see them in the issues ML and nobody
else is assigning them to me they will just loiter around, probably until
someone else makes a new comment and have them appear in the ML once more.

tl->nemeth: won't it be possible to take care of this in the spell check
dictionary or hunspell itself? I'm just asking because removing them in the
SpellCheckerDispatcher will have the following two side effects:

a) the replacement word will probably also not provide diacritics which may look
somewhat odd if all the surrounding text is using them.

b) if there ever were another spell checker implementation for Hebrew that could
properly work with diacritics and provide them in replacements as well, then the
patch will effectively suppress that feature. 

Thus I'm a little hesitant until told this patch actually has to be the solution
to take.

Comment 6 kaplanlior 2010-08-21 18:16:21 UTC

#51772 also has a very similar problem, I think the two should be fixed together
(probably the same code). Notice this is not the same problem, just a similar one.

Comment 7 Martin Hollmichel 2011-03-16 11:56:13 UTC

set target 3.x not relevant for 3.4 release

Comment 8 Rob Weir 2013-03-11 15:01:35 UTC

I'm adding this comment to all open issues with Issue Type == PATCH.  We have 220 such issues, many of them quite old.  I apologize for that.  

We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0.

If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know.

On the other hand, if the patch is no longer relevant, please let us know that as well.

If you have any general questions or want to discuss this further, please send a note to our dev mailing list:  dev@openoffice.apache.org

Thanks!

-Rob

Comment 9 kaplanlior 2013-03-12 11:44:15 UTC

The patch is Hebrew specific, I think it should be more general.