Issue 93870 - let spellcheck filter rarely-used words
Summary: let spellcheck filter rarely-used words
Status: UNCONFIRMED
Alias: None
Product: General
Classification: Code
Component: spell checking (show other issues)
Version: 3.3.0 or older (OOo)
Hardware: PC Linux, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: needhelp
Depends on:
Blocks:
 
Reported: 2008-09-14 01:37 UTC by nicklevinson
Modified: 2014-02-24 18:01 UTC (History)
6 users (show)

See Also:
Issue Type: FEATURE
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description nicklevinson 2008-09-14 01:37:13 UTC
One barrier to providing a long dictionary is that rare words are often similar
in spelling to common-word misspellings. Thus, a misspelling can be accepted by
OOo because a rarity with a different meaning matches it.

Words that are correct but rarely appear should be approved after the user is
reminded of their rarity, so they can reconsider usage or misspelling. This
includes spelling variants, technical terminology, foreign borrowings, words
used in a field of scholarship but not elsewhere, and ancient language being quoted.

A solution already exists, by creating or installing multiple dictionaries and
sometimes loading all of them, sometimes only the main one. One supplemental
dictionary would list rarities only. But it would be easier to simply click a
button in the Spellcheck dialog to approve rarities of one or more categories.

In short, if floccinaucinihilipilification were in a document and in a
Spellcheck word list, running Spellcheck would show the word as challenged but
instead of describing it as wrong or not in the dictionary it would be shown as
being in the dictionary but rare.

This rarity check could be optional, with radio buttons to treat all rarities as
either always wrong or always right or as rarities to be considered at every
appearance.

I'm using OOo Writer 2.4.0 without Java Runtime Environment on Linux Fedora Core
4 with Gnome 2.10.0 desktop on a Pentium 4 laptop. I didn't see this feature.

Thank you.

-- 
Nick
Comment 1 auberon 2008-09-14 10:23:59 UTC
There have been discussions about similar spelling of rare words and common
words for the French dictionary which I maintain.

Removing rare words suggest that these words have the same spelling as common
words. And people usually don't know how to write rare ones.

Rare words, in French, don't interfere so often with common-word mispellings,
but common words often appears similar to common-word misspellings. So the
problem is not so much about rarity.

Imho, this is the job of grammar checker to analyse sentences and indicate where
there is a possible confusion.
And this is already what LanguageTool does. We can write rules to prevent
confusion between two correct words. I don't know what has been done for
English, but rules about this problem have been created for French.
http://www.languagetool.org/
http://community.languagetool.org/
Comment 2 eric.savary 2008-09-14 10:36:22 UTC
Reassigned to SBA
Comment 3 nicklevinson 2008-09-16 08:46:07 UTC
In English, grammar checkers are horrible. My main experience with them is with
the one in Microsoft Word. It's my understanding that few people install
standalone grammar checkers. I think only once have I gotten useful advice from
any (I think it was Word's). I could try tweaking them for register and
particulars, but I'd have to have a bunch of -- not just rules -- whole rule
sets to switch between according to purpose and, in my experience, what rule
applies is a judgment call. So I'm going to differ often even from a computer
I've carefully tweaked for my own use. Example: something may be correct but
likely to be misperceived by the reader for whom I'm writing. Writing these
rules is not a project to be completed in a week or two of hard work. It's
enormously complicated. What would you do with the clause "Believe you me"?
Verb-subject-object is a very unusual ordering in most languages, but this one
is idiomatically correct. When would you forbid splitting an infinitive? Winston
Churchill is said to have answered a critique that he had allowed a participle
to dangle, "There are some things up with which I will not put.", which I take
as allowing dangling sometimes. The sentence adverb was considered wrong until
it was eventually accepted. French has l'Académie française as a prescriptive
institutional authority on the language; English has nothing of the sort.

Spelling alone: If rare words interfering with common-word misspellings is not a
problem in some languages, the feature can be disabled (dimmed) for those
languages, to avoid making a user suffer feature overload. But, in my experience
with English, the problem is with some parts of English, some writing
assignments, some kinds of training, and the like. I often add words to my Lotus
spellchecker dictionaries (I use Word Pro on my Win95a platform), including
rarities. As a result, misspellings probably escape detection. People who never
use rare words may never encounter that problem, but if they do add them to
their user dictionary the nondetection problem grows, and is itself hard to
detect. You don't know that you made a mistake, and the very purpose of a
spellcheck is eroded.

An option is not to add rarities, but then I get a ton of alleged misspellings
marking up my document. A better option is to have 2 different graphical signals
that a word might be problematic: one if definitely misspelled and the other if
the spelling is questionable because it matches a rarity.

-- 
Nick
Comment 4 stefan.baltzer 2010-10-06 13:35:00 UTC
This is not an enhancement but a full-blown feature (request) => Change issue
type. Bringing "statistic values" into spell check proposal list with the
long-term goal that the software ALWAYS knows what you wanted to write while you
misspelled a word. Ending in something like a self-adjusting autocorrection?

Note that "rarely used" might be measured via "internet statistic", but this
does hardly ever reflect the individual view of "I want this proposal when I
mis-type that word, thus I want THAT proposal and no other. After all, to use or
to avoid single words is influenced by the subject and the personal skill, taste
and direction when choosing words.

Change component to lingucomponent.