You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/09/04 01:21:52 UTC

Re: SnowballPorterFilterFactory stemming word question

: If i give "machine" why is that it stems to "machin", now from where does
: this word come from
: If i give "revolutionary" it stems to "revolutionari", i thought it should
: stem to revolution.
: 
: How does stemming work?

the porter stemmer (and all of the stemmers provided with solr) are 
programtic stemmers ... they don't actually know the root of any words the 
use an aproximate algorithm to compute a *token* from a word based on a 
set of rules ... these tokens aren't neccessarily real words (and most of 
the time they aren't words) but the same token tends to be produced from 
words with similar roots.

if you want to see the actaul root word, you'll have to use a dictionary 
based stemmer.


-Hoss


Re: SnowballPorterFilterFactory stemming word question

Posted by darniz <rn...@edmunds.com>.
The link to download kstem is not working.

Any other link please



Yonik Seeley-2 wrote:
> 
> On Mon, Sep 7, 2009 at 2:49 AM, darniz<rn...@edmunds.com> wrote:
>> Does solr provide any implementation for dictionary stemmer, please let
>> me
>> know
> 
> The Krovetz stemmer is dictionary based (english only):
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> But from your original question, maybe you are concerned when the
> stemmer doesn't return real words? For normal search, don't be.
> During index time, words are stemmed, and then later the query is
> stemmed.  If the results match up, you're good.  For example, a
> document containing the word "machines" may stem to "machin" and then
> a query of "machined" will stem to "machin" and thus match the
> document.
> 
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25404615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SnowballPorterFilterFactory stemming word question

Posted by darniz <rn...@edmunds.com>.
Thanks Yonik
i have a task where my user is giving me 20 words of english dictionary and
i have to run a program and generate a report with all stemmed words.

I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to
check which one is faster and gets the best results

Should i write a java module and use the library which comes with solr.
is there any code snipped which i can use

If i can get a faint idea of how to do it is to create EnglishPorterFilter
from EnglishPorterFilterFactory by passing a tokenizer etc...

i will appreciate if some one can give me a hint on this.

thanks
darniz









Yonik Seeley-2 wrote:
> 
> On Mon, Sep 7, 2009 at 2:49 AM, darniz<rn...@edmunds.com> wrote:
>> Does solr provide any implementation for dictionary stemmer, please let
>> me
>> know
> 
> The Krovetz stemmer is dictionary based (english only):
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
> 
> But from your original question, maybe you are concerned when the
> stemmer doesn't return real words? For normal search, don't be.
> During index time, words are stemmed, and then later the query is
> stemmed.  If the results match up, you're good.  For example, a
> document containing the word "machines" may stem to "machin" and then
> a query of "machined" will stem to "machin" and thus match the
> document.
> 
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 

-- 
View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25393323.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SnowballPorterFilterFactory stemming word question

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Sep 7, 2009 at 2:49 AM, darniz<rn...@edmunds.com> wrote:
> Does solr provide any implementation for dictionary stemmer, please let me
> know

The Krovetz stemmer is dictionary based (english only):
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

But from your original question, maybe you are concerned when the
stemmer doesn't return real words? For normal search, don't be.
During index time, words are stemmed, and then later the query is
stemmed.  If the results match up, you're good.  For example, a
document containing the word "machines" may stem to "machin" and then
a query of "machined" will stem to "machin" and thus match the
document.


-Yonik
http://www.lucidimagination.com

Re: SnowballPorterFilterFactory stemming word question

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks Hoss
: Could you please provide with any example
: 
: Does solr provide any implementation for dictionary stemmer, please let me

As mentioned on the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Expansion stemming -- Takes a root word and 'expands' it to all of its 
various forms -- can be used either at insertion time or at query time. 
One way to approach this is by using the SynonymFilterFactory


-Hoss


Re: SnowballPorterFilterFactory stemming word question

Posted by darniz <rn...@edmunds.com>.
Thanks Hoss
Could you please provide with any example

Does solr provide any implementation for dictionary stemmer, please let me
know 

Thanks
Rashid


hossman wrote:
> 
> 
> : If i give "machine" why is that it stems to "machin", now from where
> does
> : this word come from
> : If i give "revolutionary" it stems to "revolutionari", i thought it
> should
> : stem to revolution.
> : 
> : How does stemming work?
> 
> the porter stemmer (and all of the stemmers provided with solr) are 
> programtic stemmers ... they don't actually know the root of any words the 
> use an aproximate algorithm to compute a *token* from a word based on a 
> set of rules ... these tokens aren't neccessarily real words (and most of 
> the time they aren't words) but the same token tends to be produced from 
> words with similar roots.
> 
> if you want to see the actaul root word, you'll have to use a dictionary 
> based stemmer.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25325738.html
Sent from the Solr - User mailing list archive at Nabble.com.