You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Wojtek H <wo...@gmail.com> on 2008/04/01 11:58:55 UTC

stemming in Lucene

Hi all,

Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so this
solution is good for multilingual environment. Maybe this is not
perfect place to ask, but does anyone know about java stemmer using
ispell dicts?
There is aspell-like java spell-checker (Jazzy) but I could not see
how to use it for stemming. We are considering porting part of
postgres tsearch module to java, because tsearch uses ispell dicts for
stemming.
But maybe there is a better way or there are people working on
something like that?

Thanks and regards,
wojtek

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: stemming in Lucene

Posted by Karl Wettin <ka...@gmail.com>.
Wojtek H skrev:
> Snowball stemmers are part of Lucene, but for few languages only. We

org.apache.lucene.analysis contains a few more stemmers.

> have documents in various languages and so need stemmers for many
> languages (in particular polish).

Have you seen Stempel?

http://www.getopt.org/stempel/



       karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: stemming in Lucene

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Wojtek H a écrit :
> Hi all,
>
> Snowball stemmers are part of Lucene, but for few languages only. We
> have documents in various languages and so need stemmers for many
> languages (in particular polish). One of the ideas is to use ispell
> dictionaries. There are ispell dicts for many languages and so this
> solution is good for multilingual environment. Maybe this is not
> perfect place to ask, but does anyone know about java stemmer using
> ispell dicts?
> There is aspell-like java spell-checker (Jazzy) but I could not see
> how to use it for stemming. We are considering porting part of
> postgres tsearch module to java, because tsearch uses ispell dicts for
> stemming.
> But maybe there is a better way or there are people working on
> something like that?
>   
ispell data is nice for phonetic, and for enumerate a huge list of 
words. The ispell dictionnary is one way : pseudo root => word, it looks 
hard to build the inverse function, lemme is splitted in multiple affix. 
But it can be used to find rules, just like 
http://www.getopt.org/stempel/ do.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org