You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christian Aschoff <ch...@uni-ulm.de> on 2007/10/12 15:48:23 UTC

Problems with stemming/SpellChecker

Hi,

i tried to implement a 'did you mean'-function (and successed in some  
way). But the hints from the SpellChecker are the stemmed versions of  
the keywords.

For example, the search for the wrong word 'wasseraalfingen' results  
in the hint 'wasseralfing' but should be 'wasseralfingen'. My first  
try was to use a field with the option 'NO_NORMS' but that did not  
worked.

[...]
             indexWriter = new IndexWriter(MiscConstants.luceneDir,  
new GermanAnalyzer(), create);
[...]
             Field didyoumean = new Field 
(LuceneFieldNames.didyoumean, content.toString(), Field.Store.YES,  
Field.Index.NO_NORMS, Field.TermVector.NO);
             document.add(didyoumean);
[...]
         IndexReader indexReader = IndexReader.open 
(MiscConstants.luceneDir);
         SpellChecker spellChecker = new SpellChecker 
(FSDirectory.getDirectory(MiscConstants.luceneDYMDir));
         spellChecker.indexDictionary(new LuceneDictionary 
(indexReader, LuceneFieldNames.didyoumean));
         indexReader.close();
[...]

Has anyone a suggestion how i can get 'unstemmed' hints from  
SpellChecker? Do i have to create some kind of 'unstemmed' index just  
for the creation of the SpellCheckers-index?

Regards,
Christian Aschoff


---
Dipl. Ing. (FH) Christian Aschoff

Büro:
Universität Ulm
Kommunikations- und Informationszentrum
Abt. Informationssysteme
Raum O26/5403
Albert-Einstein-Allee 11
89081 Ulm

Tel. 0731 50-22432
Fax. 0731 50-22471
christian.aschoff@uni-ulm.de

Privat:
Fabristr. 13
89075 Ulm
Deutschland/Old Europe

Tel. 0731 602 803 60
Fax. 0731 602 803 61
Mob. 0171 272 03 04
caschoff@mac.com

Helfen Sie mit: www.meyers-konversationslexikon.de



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problems with stemming/SpellChecker

Posted by Daniel Naber <lu...@danielnaber.de>.
On Saturday 13 October 2007 07:57, Christian Aschoff wrote:

> But as fare as i see (in the API DOC), the GermanAnalyzer is attached  
> to the IndexWriter, i can't find an way to attach an analyzer it to a  
> single field... Or do i miss something?

See PerFieldAnalyzerWrapper.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problems with stemming/SpellChecker

Posted by Christian Aschoff <ch...@uni-ulm.de>.
But as fare as i see (in the API DOC), the GermanAnalyzer is attached  
to the IndexWriter, i can't find an way to attach an analyzer it to a  
single field... Or do i miss something? (There are tons of other  
fields in the index where GermanAnalyzer fits perfect).

Am 12.10.2007 um 19:01 schrieb Daniel Naber:

> On Friday 12 October 2007 15:48, Christian Aschoff wrote:
>
>>  indexWriter = new IndexWriter(MiscConstants.luceneDir,
>> new GermanAnalyzer(), create);
>> [...]
>
> Not NO_NORMS is the problem but GermanAnalyzer. Try  
> StandardAnalyzer on the
> field you get the suggestions from.
>
> Regards
>  Daniel
>
> -- 
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---
Dipl. Ing. (FH) Christian Aschoff

Büro:
Universität Ulm
Kommunikations- und Informationszentrum
Abt. Informationssysteme
Raum O26/5403
Albert-Einstein-Allee 11
89081 Ulm

Tel. 0731 50-22432
Fax. 0731 50-22471
christian.aschoff@uni-ulm.de

Privat:
Fabristr. 13
89075 Ulm
Deutschland/Old Europe

Tel. 0731 602 803 60
Fax. 0731 602 803 61
Mob. 0171 272 03 04
caschoff@mac.com

Helfen Sie mit: www.meyers-konversationslexikon.de



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problems with stemming/SpellChecker

Posted by Daniel Naber <lu...@danielnaber.de>.
On Friday 12 October 2007 15:48, Christian Aschoff wrote:

>  indexWriter = new IndexWriter(MiscConstants.luceneDir,  
> new GermanAnalyzer(), create);
> [...]

Not NO_NORMS is the problem but GermanAnalyzer. Try StandardAnalyzer on the 
field you get the suggestions from.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org