You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by climbingrose <cl...@gmail.com> on 2008/07/01 04:45:52 UTC

Re: Limit Porter stemmer to plural stemming only?

I modified the original English Stemmer written in Snowball language and
regenerate the Java implementation using Snowball compiler. It's been
working for me  so far. I certainly can share the modified Snowball English
Stemmer if anyone wants to use it.

Cheers,
Cuong

On Tue, Jul 1, 2008 at 4:12 AM, Mike Klaas <mi...@gmail.com> wrote:

> If you find a solution that works well, I encourage you to contribute it
> back to Solr.  Plural-only stemming is probably a common need (I've
> definitely wanted to use it before).
>
> cheers,
> -Mike
>
>
> On 30-Jun-08, at 2:25 AM, climbingrose wrote:
>
>  Ok, it looks like step 1a in Porter algo does what I need.
>> On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <cl...@gmail.com>
>> wrote:
>>
>>  Hi all,
>>> Porter stemmer in general is really good. However, there are some cases
>>> where it doesn't work. For example, "accountant" matches "Accountant" as
>>> well as "Account Manager" which isn't desirable. Is it possible to use
>>> this
>>> analyser for plural words only? For example:
>>> +Accountant -> accountant
>>> +Accountants -> accountant
>>> +Account -> Account
>>> +Accounts -> account
>>>
>>> Thanks.
>>>
>>> --
>>> Regards,
>>>
>>> Cuong Hoang
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Cuong Hoang
>>
>
>


-- 
Regards,

Cuong Hoang

Re: Limit Porter stemmer to plural stemming only?

Posted by "jerry.jacob@gmail.com" <je...@gmail.com>.
Hi,

Do you mind attaching the Plural only Stemmer? I cant find it in this post.

Thanks
Jerry



--
View this message in context: http://lucene.472066.n3.nabble.com/Limit-Porter-stemmer-to-plural-stemming-only-tp486449p4142867.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Limit Porter stemmer to plural stemming only?

Posted by climbingrose <cl...@gmail.com>.
Attached is the modified Snowball source code for plural-only English
stemmer. You need to compile it to Java using instruction here:
http://snowball.tartarus.org/runtime/use.html. Essentially, you need to:

1) Download (Snowball, algorithms, and libstemmer
library)<http://snowball.tartarus.org/dist/snowball_code.tgz> and
compile Snowball compiler it self using this command: gcc -O -o snowball
compiler/*.c.
2) Compile the the attached file to Java:
./snowball stem_ISO_8859_1.sbl -java -o EnglishStemmer -name EnglishStemmer

You can change EnglishStemmer to whatever you like, for example,
PluralEnglishStemmer. After that, you need to modify the generated Java
class so that it references the appropriate classes in net.sf.snowball.*
package instead of the one from Snowball website. I think only 2 classes you
need to import are Among and SnowballProgram.

Once, you have the new stemmer ready, write something similar to
EnglishPorterFilterFactory to use it within Solr.

Hope this helps.

Cheers,
Cuong


On Tue, Jul 1, 2008 at 6:07 PM, Guillaume Smet <gu...@gmail.com>
wrote:

> Hi Cuong,
>
> On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <cl...@gmail.com>
> wrote:
> > I modified the original English Stemmer written in Snowball language and
> > regenerate the Java implementation using Snowball compiler. It's been
> > working for me  so far. I certainly can share the modified Snowball
> English
> > Stemmer if anyone wants to use it.
>
> Yeah, it would be nice. A step by step explanation of how to
> regenerate the Java files would be nice too (or a pointer to such a
> documentation if you found one).
>
> Thanks,
>
> --
> Guillaume
>

Re: Limit Porter stemmer to plural stemming only?

Posted by Guillaume Smet <gu...@gmail.com>.
Hi Cuong,

On Tue, Jul 1, 2008 at 4:45 AM, climbingrose <cl...@gmail.com> wrote:
> I modified the original English Stemmer written in Snowball language and
> regenerate the Java implementation using Snowball compiler. It's been
> working for me  so far. I certainly can share the modified Snowball English
> Stemmer if anyone wants to use it.

Yeah, it would be nice. A step by step explanation of how to
regenerate the Java files would be nice too (or a pointer to such a
documentation if you found one).

Thanks,

-- 
Guillaume