You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Yilmazel, Sibel" <sy...@navisite.com> on 2006/02/13 19:41:52 UTC

Stemmer algorithms

Hello all,

We have done some preliminary research on Porter2 and K-stem algorithms
and have some questions.

Porter2 was found to be a 'strong' stemming algorithm where it strips
off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
(-able, -aciousness, -ability). K-Stem seemed to be a weak stemming
algorithm as it strips off only the inflectional suffixes (-s, -es,
-ed).

In IR, it is usually recommended using a "weak" stemmer, as the "weak"
stemmer seldom hurts performance, but it usually provides significant
improvement with precision.

However, Porter2 is the most widely used stemming algorithm AND it is a
'strong' stemmer which is contrary to what is said above. 

Can you share your ideas, experiences with stemmer algorithms? Thanks in
advance.

Sibel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stemmer algorithms

Posted by jason <gi...@gmail.com>.
Hi,

I have test some stemmer algorithms in my application. However, i think we'd
better writer a weaker algorithm. I mean, the Porter and some other
algorithms are too strong. maybe an algorithm which can convert plural to
single noun is enough.

On 2/14/06, Yilmazel, Sibel <sy...@navisite.com> wrote:
>
> Hello all,
>
> We have done some preliminary research on Porter2 and K-stem algorithms
> and have some questions.
>
> Porter2 was found to be a 'strong' stemming algorithm where it strips
> off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
> (-able, -aciousness, -ability). K-Stem seemed to be a weak stemming
> algorithm as it strips off only the inflectional suffixes (-s, -es,
> -ed).
>
> In IR, it is usually recommended using a "weak" stemmer, as the "weak"
> stemmer seldom hurts performance, but it usually provides significant
> improvement with precision.
>
> However, Porter2 is the most widely used stemming algorithm AND it is a
> 'strong' stemmer which is contrary to what is said above.
>
> Can you share your ideas, experiences with stemmer algorithms? Thanks in
> advance.
>
> Sibel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Stemmer algorithms

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I can't share any experiences with K-Stem, but I can share that I do remember K-stem people contributing a piece of code that integrated their K-Stem work with Lucene a few (2?) years ago.  Their code had some funky license attached, so it never made it into Lucene, but it was available for download, so you should be able to try both K-stem and Porter and compare.

Otis

----- Original Message ----
From: "Yilmazel, Sibel" <sy...@navisite.com>
To: java-user@lucene.apache.org
Sent: Mon 13 Feb 2006 01:41:52 PM EST
Subject: Stemmer algorithms

Hello all,

We have done some preliminary research on Porter2 and K-stem algorithms
and have some questions.

Porter2 was found to be a 'strong' stemming algorithm where it strips
off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
(-able, -aciousness, -ability). K-Stem seemed to be a weak stemming
algorithm as it strips off only the inflectional suffixes (-s, -es,
-ed).

In IR, it is usually recommended using a "weak" stemmer, as the "weak"
stemmer seldom hurts performance, but it usually provides significant
improvement with precision.

However, Porter2 is the most widely used stemming algorithm AND it is a
'strong' stemmer which is contrary to what is said above. 

Can you share your ideas, experiences with stemmer algorithms? Thanks in
advance.

Sibel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org