You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Johnny Jenkins <je...@gmail.com> on 2013/09/11 04:24:36 UTC

SpellChecker adding and removing words

I’m wanting to high jack SpellChecker class as a general spell checking and
word suggestion tool. The idea of using this class was to avoid creating my
own.  At first it seems to fit the bill.

However, once I’ve used indexDirectory() I cannot seem to easily add/remove
words.  I can re-instantiate the SpellChecker instance and do another
indexDirectory() with the most recent dictionary, however this takes time
and I want ZERO down time.

Adding words:

I’ve attempted to add documents through an IndexWriter and I can
successfully add words into the F_WORD field.  This allows me test a word
against exist() with a successful outcome.  However when attempting to get
suggestions of a similar misspelled word it will fail to return the newly
added word.  This is because I/it never added the appropriate content into
all the other index fields (gramN, endN, startN etc) which the suggest
methods appear to rely on.



I’ve debugged the creation of the dictionary and can see that there are
some very useful private methods in the SpellChecker class
(createDocument(), addGram()).  It appears to lack word removal methods.



Am I trying to use this for something it is not designed for?  (I gather
its roots are around ‘did you mean’?)

Fwd: SpellChecker adding and removing words

Posted by Johnny Jenkins <je...@gmail.com>.
I've now managed to add words into the spell checker index (i.e. not just
the F_WORD field), unfortunately it is a dirty hack...which wouldn't have
been necessasry if those private methods were made protected.

Next problem...

Deleting words:
1)  (a) create index (b) add word (c) delete word = SUCCESS
2) (a) create index (b) delete existing word = FAIL

With both I'm using the same process, which is:
IndexWriter iw = new IndexWriter(directory, iwc);
Term term = new Term(SpellChecker.F_WORD, word);
iw.deleteDocuments(term);
iw.close();
spellChecker.setSpellIndex(directory); // required so changes are available

Using Luke I can see that deleting new Term(SpellChecker.F_WORD, word)
works fine in scenario 1, but in scenario 2 the document is shown as
deleted and hasDeletions = 1 but the terms still remain.

Any thoughts would be greatly appreciated!

---------- Forwarded message ----------
From: Johnny Jenkins <je...@gmail.com>
Date: 11 September 2013 14:24
Subject: SpellChecker adding and removing words
To: java-user@lucene.apache.org


I’m wanting to high jack SpellChecker class as a general spell checking and
word suggestion tool. The idea of using this class was to avoid creating my
own.  At first it seems to fit the bill.

However, once I’ve used indexDirectory() I cannot seem to easily add/remove
words.  I can re-instantiate the SpellChecker instance and do another
indexDirectory() with the most recent dictionary, however this takes time
and I want ZERO down time.

Adding words:

I’ve attempted to add documents through an IndexWriter and I can
successfully add words into the F_WORD field.  This allows me test a word
against exist() with a successful outcome.  However when attempting to get
suggestions of a similar misspelled word it will fail to return the newly
added word.  This is because I/it never added the appropriate content into
all the other index fields (gramN, endN, startN etc) which the suggest
methods appear to rely on.



I’ve debugged the creation of the dictionary and can see that there are
some very useful private methods in the SpellChecker class
(createDocument(), addGram()).  It appears to lack word removal methods.



Am I trying to use this for something it is not designed for?  (I gather
its roots are around ‘did you mean’?)