You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by ka...@nokia.com on 2010/11/17 21:18:05 UTC

Stemming using automata

Folks,
I had an interesting conversation with Simon a few weeks back.  It occurred to me that it might be possible to build an automata that handles  stemming and pluralization on searches.  Just a thought...
Karl

Re: Stemming using automata

Posted by Robert Muir <rc...@gmail.com>.

Karl, you are right.

this is one of the ways i originally used this thing.

i've done some relevance experiments along these lines (some summary
results here http://www.slideshare.net/otisg/finite-state-queries-in-lucene).

in this case i compared 3 cases: index-time porter stemming,
index-time plural stemming, and query-time plural stemming (with
automaton).

in general you can get similar results, slower query speed, but more
flexibility. for instance, you could have a queryparser that
implements a stem() operator without indexing everything twice.

probably pretty boring for most people, but in some cases (e.g. lots
of languages) query-time starts to become more attractive...

On Wed, Nov 17, 2010 at 3:18 PM,  <ka...@nokia.com> wrote:
> Folks,
>
> I had an interesting conversation with Simon a few weeks back.  It occurred
> to me that it might be possible to build an automata that handles  stemming
> and pluralization on searches.  Just a thought…
>
> Karl
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org