You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jan Tosovsky <j....@email.cz> on 2017/10/01 20:53:51 UTC

A new Snowball stemmer

Dear All,

I'd like to integrate a new Snowball stemmer [1] to Lucene for my
experiments, but I can see some incompatibilities between original Snowball
stemmers (produced via Snowball compiler) and actual Lucene's Snowball
stemmers [2].

Especially:
* different constructor of Among class: new Among("ce", -1, 1) vs. new Among
("ce", -1, -1, "", methodObject)
* in the find_among_b() method only two params are accepted

What is the procedure for producing Lucene-compatible stemmers from SBL
file? Is there any automation or should I modify that original compiled file
manually?

Thanks,
Jan

_________
[1] It is actually a Czech stemmer, see
https://issues.apache.org/jira/browse/LUCENE-4042, eventhough the original
author has stated in LUCENE-3883: I wouldn't recommend the aggressive mode,
and I regret that I left it uncommented. If you really think an alternative
would be welcome, it would be quite easy to get the best of both (in fact, I
spent roughly half the time on that trying to beat Snowball into
overstemming to match the original).

[2] Lucene stemmers can be found here:
https://github.com/apache/lucene-solr/tree/master/lucene/analysis/common/src
/java/org/tartarus/snowball/ext


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: A new Snowball stemmer

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

there is an ANT task for patching the Snowball Compiler output.

lucene/analysis/common/ $ ant patch-snowball

I am not 100% sure if this still works with latest snowball compiler, but back at that time it was used to convert the files. You may need to use an older Snowball version, so the regexes work.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Jan Tosovsky [mailto:j.tosovsky@email.cz]
> Sent: Sunday, October 1, 2017 10:54 PM
> To: java-user@lucene.apache.org
> Subject: A new Snowball stemmer
> 
> Dear All,
> 
> I'd like to integrate a new Snowball stemmer [1] to Lucene for my
> experiments, but I can see some incompatibilities between original Snowball
> stemmers (produced via Snowball compiler) and actual Lucene's Snowball
> stemmers [2].
> 
> Especially:
> * different constructor of Among class: new Among("ce", -1, 1) vs. new
> Among
> ("ce", -1, -1, "", methodObject)
> * in the find_among_b() method only two params are accepted
> 
> What is the procedure for producing Lucene-compatible stemmers from SBL
> file? Is there any automation or should I modify that original compiled file
> manually?
> 
> Thanks,
> Jan
> 
> _________
> [1] It is actually a Czech stemmer, see
> https://issues.apache.org/jira/browse/LUCENE-4042, eventhough the
> original
> author has stated in LUCENE-3883: I wouldn't recommend the aggressive
> mode,
> and I regret that I left it uncommented. If you really think an alternative
> would be welcome, it would be quite easy to get the best of both (in fact, I
> spent roughly half the time on that trying to beat Snowball into
> overstemming to match the original).
> 
> [2] Lucene stemmers can be found here:
> https://github.com/apache/lucene-
> solr/tree/master/lucene/analysis/common/src
> /java/org/tartarus/snowball/ext
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org