You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Wojtek H <wo...@gmail.com> on 2008/04/01 11:58:55 UTC
stemming in Lucene
Hi all,
Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so this
solution is good for multilingual environment. Maybe this is not
perfect place to ask, but does anyone know about java stemmer using
ispell dicts?
There is aspell-like java spell-checker (Jazzy) but I could not see
how to use it for stemming. We are considering porting part of
postgres tsearch module to java, because tsearch uses ispell dicts for
stemming.
But maybe there is a better way or there are people working on
something like that?
Thanks and regards,
wojtek
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: stemming in Lucene
Posted by Karl Wettin <ka...@gmail.com>.
Wojtek H skrev:
> Snowball stemmers are part of Lucene, but for few languages only. We
org.apache.lucene.analysis contains a few more stemmers.
> have documents in various languages and so need stemmers for many
> languages (in particular polish).
Have you seen Stempel?
http://www.getopt.org/stempel/
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: stemming in Lucene
Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Wojtek H a écrit :
> Hi all,
>
> Snowball stemmers are part of Lucene, but for few languages only. We
> have documents in various languages and so need stemmers for many
> languages (in particular polish). One of the ideas is to use ispell
> dictionaries. There are ispell dicts for many languages and so this
> solution is good for multilingual environment. Maybe this is not
> perfect place to ask, but does anyone know about java stemmer using
> ispell dicts?
> There is aspell-like java spell-checker (Jazzy) but I could not see
> how to use it for stemming. We are considering porting part of
> postgres tsearch module to java, because tsearch uses ispell dicts for
> stemming.
> But maybe there is a better way or there are people working on
> something like that?
>
ispell data is nice for phonetic, and for enumerate a huge list of
words. The ispell dictionnary is one way : pseudo root => word, it looks
hard to build the inverse function, lemme is splitted in multiple affix.
But it can be used to find rules, just like
http://www.getopt.org/stempel/ do.
M.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org