You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hung Huynh <hh...@vinaconsulting.com> on 2008/04/21 17:59:03 UTC

better stemming engine than Porter?

I recall I've read some where in one of the mailing-list archives that some
one had developed a better stemming algo for Solr than the built-in Porter
stemming. Does anyone have link to that stemming module? 

Thanks,

HH 


RE: better stemming engine than Porter?

Posted by "Wagner,Harry" <wa...@oclc.org>.
Thanks Ryan. I just opened SOLR-546. Please let me know if I can provide
further help. Cheers! h

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Monday, April 21, 2008 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: better stemming engine than Porter?

Hey-

to create an issue, make an account on jira and post it...
https://issues.apache.org/jira/browse/SOLR

Give that a try and holler if you have trouble.

ryan



On Apr 21, 2008, at 12:31 PM, Wagner,Harry wrote:
> Hi HH,
> Here's a note I sent Solr-dev a while back:
>
> ---
> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
> else had already written a Lucene wrapper for it).  KStem is  
> considered
> to be more appropriate for library usage since it is much less
> aggressive than Porter (i.e., searches for organization do NOT match  
> on
> organ!). If there is any interest in feeding this back into Solr I  
> would
> be happy to contribute it.
> ---
>
> I believe there was interest in it, but I never opened an issue for it
> and I don't know if it was ever followed-up on. I'd be happy to do  
> that
> now. Can someone on the Solr-dev team point me in the right direction
> for opening an issue?
>
> Thanks... harry
>
>
> -----Original Message-----
> From: Hung Huynh [mailto:hh@vinaconsulting.com]
> Sent: Monday, April 21, 2008 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: better stemming engine than Porter?
>
> I recall I've read some where in one of the mailing-list archives that
> some
> one had developed a better stemming algo for Solr than the built-in
> Porter
> stemming. Does anyone have link to that stemming module?
>
> Thanks,
>
> HH
>
>
>




Re: better stemming engine than Porter?

Posted by Chris Hostetter <ho...@fucit.org>.
: to create an issue, make an account on jira and post it...
: https://issues.apache.org/jira/browse/SOLR
: 
: Give that a try and holler if you have trouble.

To elaborate more (and save some time on the question answering of the 
correct procedures) ...

http://wiki.apache.org/solr/HowToContribute

(note the "contributing code" section)


-Hoss


Re: better stemming engine than Porter?

Posted by Ryan McKinley <ry...@gmail.com>.
Hey-

to create an issue, make an account on jira and post it...
https://issues.apache.org/jira/browse/SOLR

Give that a try and holler if you have trouble.

ryan



On Apr 21, 2008, at 12:31 PM, Wagner,Harry wrote:
> Hi HH,
> Here's a note I sent Solr-dev a while back:
>
> ---
> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
> else had already written a Lucene wrapper for it).  KStem is  
> considered
> to be more appropriate for library usage since it is much less
> aggressive than Porter (i.e., searches for organization do NOT match  
> on
> organ!). If there is any interest in feeding this back into Solr I  
> would
> be happy to contribute it.
> ---
>
> I believe there was interest in it, but I never opened an issue for it
> and I don't know if it was ever followed-up on. I'd be happy to do  
> that
> now. Can someone on the Solr-dev team point me in the right direction
> for opening an issue?
>
> Thanks... harry
>
>
> -----Original Message-----
> From: Hung Huynh [mailto:hh@vinaconsulting.com]
> Sent: Monday, April 21, 2008 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: better stemming engine than Porter?
>
> I recall I've read some where in one of the mailing-list archives that
> some
> one had developed a better stemming algo for Solr than the built-in
> Porter
> stemming. Does anyone have link to that stemming module?
>
> Thanks,
>
> HH
>
>
>


RE: better stemming engine than Porter?

Posted by "Wagner,Harry" <wa...@oclc.org>.
Hi Jay,
I did not do a timing comparison either, but any change in performance after switching to Kstem was not noticeable.  Cheers... h

-----Original Message-----
From: Jay [mailto:yu@AI.SRI.COM] 
Sent: Tuesday, April 22, 2008 12:26 PM
To: solr-user@lucene.apache.org
Subject: Re: better stemming engine than Porter?

Hi Wagner,

Thanks for the intro of KStem! I quickly scanned the original paper on 
KStem by Robert Krovetz but could not find any timing comparison data on
KStem and Porter stem. I wonder how slow/fast Kstem is compared to 
Porter stem based on your use in your application?

Jay

Wagner,Harry wrote:
> Mathieu,
> It's not my Kstem. It was written by someone at Umass, Amherst. More info here: 
> http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi 
> 
> Someone else had already ported it to Lucene. I simply modified that wrapper to work with Solr. I'll open an issue for it so that it can (hopefully) be integrated into the project.
> 
> Cheers... harry
> 
> -----Original Message-----
> From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] 
> Sent: Tuesday, April 22, 2008 3:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: better stemming engine than Porter?
> 
> Porter stemmer is not only agressive, it is ugly, too. The generated 
> code is too old, too  few object centric and should be too slow.
> If your kstem compile with java 1.4, why don't you suggest it to lucene 
> core?
> 
> M.
> 
> Wagner,Harry a écrit :
>> Hi HH,
>> Here's a note I sent Solr-dev a while back:
>>
>> ---
>> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
>> else had already written a Lucene wrapper for it).  KStem is considered
>> to be more appropriate for library usage since it is much less
>> aggressive than Porter (i.e., searches for organization do NOT match on
>> organ!). If there is any interest in feeding this back into Solr I would
>> be happy to contribute it.
>> ---
>>
>> I believe there was interest in it, but I never opened an issue for it
>> and I don't know if it was ever followed-up on. I'd be happy to do that
>> now. Can someone on the Solr-dev team point me in the right direction
>> for opening an issue?
>>
>> Thanks... harry
>>
>>
>> -----Original Message-----
>> From: Hung Huynh [mailto:hh@vinaconsulting.com] 
>> Sent: Monday, April 21, 2008 11:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: better stemming engine than Porter?
>>
>> I recall I've read some where in one of the mailing-list archives that
>> some
>> one had developed a better stemming algo for Solr than the built-in
>> Porter
>> stemming. Does anyone have link to that stemming module? 
>>
>> Thanks,
>>
>> HH 
>>
>>
>>
>>
>>   
> 
> 
> 



Re: better stemming engine than Porter?

Posted by Jay <yu...@AI.SRI.COM>.
Hi Wagner,

Thanks for the intro of KStem! I quickly scanned the original paper on 
KStem by Robert Krovetz but could not find any timing comparison data on
KStem and Porter stem. I wonder how slow/fast Kstem is compared to 
Porter stem based on your use in your application?

Jay

Wagner,Harry wrote:
> Mathieu,
> It's not my Kstem. It was written by someone at Umass, Amherst. More info here: 
> http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi 
> 
> Someone else had already ported it to Lucene. I simply modified that wrapper to work with Solr. I'll open an issue for it so that it can (hopefully) be integrated into the project.
> 
> Cheers... harry
> 
> -----Original Message-----
> From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] 
> Sent: Tuesday, April 22, 2008 3:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: better stemming engine than Porter?
> 
> Porter stemmer is not only agressive, it is ugly, too. The generated 
> code is too old, too  few object centric and should be too slow.
> If your kstem compile with java 1.4, why don't you suggest it to lucene 
> core?
> 
> M.
> 
> Wagner,Harry a écrit :
>> Hi HH,
>> Here's a note I sent Solr-dev a while back:
>>
>> ---
>> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
>> else had already written a Lucene wrapper for it).  KStem is considered
>> to be more appropriate for library usage since it is much less
>> aggressive than Porter (i.e., searches for organization do NOT match on
>> organ!). If there is any interest in feeding this back into Solr I would
>> be happy to contribute it.
>> ---
>>
>> I believe there was interest in it, but I never opened an issue for it
>> and I don't know if it was ever followed-up on. I'd be happy to do that
>> now. Can someone on the Solr-dev team point me in the right direction
>> for opening an issue?
>>
>> Thanks... harry
>>
>>
>> -----Original Message-----
>> From: Hung Huynh [mailto:hh@vinaconsulting.com] 
>> Sent: Monday, April 21, 2008 11:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: better stemming engine than Porter?
>>
>> I recall I've read some where in one of the mailing-list archives that
>> some
>> one had developed a better stemming algo for Solr than the built-in
>> Porter
>> stemming. Does anyone have link to that stemming module? 
>>
>> Thanks,
>>
>> HH 
>>
>>
>>
>>
>>   
> 
> 
> 

RE: better stemming engine than Porter?

Posted by "Wagner,Harry" <wa...@oclc.org>.
Mathieu,
It's not my Kstem. It was written by someone at Umass, Amherst. More info here: 
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi 

Someone else had already ported it to Lucene. I simply modified that wrapper to work with Solr. I'll open an issue for it so that it can (hopefully) be integrated into the project.

Cheers... harry

-----Original Message-----
From: Mathieu Lecarme [mailto:mathieu@garambrogne.net] 
Sent: Tuesday, April 22, 2008 3:57 AM
To: solr-user@lucene.apache.org
Subject: Re: better stemming engine than Porter?

Porter stemmer is not only agressive, it is ugly, too. The generated 
code is too old, too  few object centric and should be too slow.
If your kstem compile with java 1.4, why don't you suggest it to lucene 
core?

M.

Wagner,Harry a écrit :
> Hi HH,
> Here's a note I sent Solr-dev a while back:
>
> ---
> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
> else had already written a Lucene wrapper for it).  KStem is considered
> to be more appropriate for library usage since it is much less
> aggressive than Porter (i.e., searches for organization do NOT match on
> organ!). If there is any interest in feeding this back into Solr I would
> be happy to contribute it.
> ---
>
> I believe there was interest in it, but I never opened an issue for it
> and I don't know if it was ever followed-up on. I'd be happy to do that
> now. Can someone on the Solr-dev team point me in the right direction
> for opening an issue?
>
> Thanks... harry
>
>
> -----Original Message-----
> From: Hung Huynh [mailto:hh@vinaconsulting.com] 
> Sent: Monday, April 21, 2008 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: better stemming engine than Porter?
>
> I recall I've read some where in one of the mailing-list archives that
> some
> one had developed a better stemming algo for Solr than the built-in
> Porter
> stemming. Does anyone have link to that stemming module? 
>
> Thanks,
>
> HH 
>
>
>
>
>   




Re: better stemming engine than Porter?

Posted by Mathieu Lecarme <ma...@garambrogne.net>.
Porter stemmer is not only agressive, it is ugly, too. The generated 
code is too old, too  few object centric and should be too slow.
If your kstem compile with java 1.4, why don't you suggest it to lucene 
core?

M.

Wagner,Harry a écrit :
> Hi HH,
> Here's a note I sent Solr-dev a while back:
>
> ---
> I've implemented a Solr plug-in that wraps KStem for Solr use (someone
> else had already written a Lucene wrapper for it).  KStem is considered
> to be more appropriate for library usage since it is much less
> aggressive than Porter (i.e., searches for organization do NOT match on
> organ!). If there is any interest in feeding this back into Solr I would
> be happy to contribute it.
> ---
>
> I believe there was interest in it, but I never opened an issue for it
> and I don't know if it was ever followed-up on. I'd be happy to do that
> now. Can someone on the Solr-dev team point me in the right direction
> for opening an issue?
>
> Thanks... harry
>
>
> -----Original Message-----
> From: Hung Huynh [mailto:hh@vinaconsulting.com] 
> Sent: Monday, April 21, 2008 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: better stemming engine than Porter?
>
> I recall I've read some where in one of the mailing-list archives that
> some
> one had developed a better stemming algo for Solr than the built-in
> Porter
> stemming. Does anyone have link to that stemming module? 
>
> Thanks,
>
> HH 
>
>
>
>
>   


RE: better stemming engine than Porter?

Posted by "Wagner,Harry" <wa...@oclc.org>.
Hi HH,
Here's a note I sent Solr-dev a while back:

---
I've implemented a Solr plug-in that wraps KStem for Solr use (someone
else had already written a Lucene wrapper for it).  KStem is considered
to be more appropriate for library usage since it is much less
aggressive than Porter (i.e., searches for organization do NOT match on
organ!). If there is any interest in feeding this back into Solr I would
be happy to contribute it.
---

I believe there was interest in it, but I never opened an issue for it
and I don't know if it was ever followed-up on. I'd be happy to do that
now. Can someone on the Solr-dev team point me in the right direction
for opening an issue?

Thanks... harry


-----Original Message-----
From: Hung Huynh [mailto:hh@vinaconsulting.com] 
Sent: Monday, April 21, 2008 11:59 AM
To: solr-user@lucene.apache.org
Subject: better stemming engine than Porter?

I recall I've read some where in one of the mailing-list archives that
some
one had developed a better stemming algo for Solr than the built-in
Porter
stemming. Does anyone have link to that stemming module? 

Thanks,

HH