You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Leonardo Dias <le...@catho.com.br> on 2009/02/18 18:40:45 UTC

Snowball and protected words

Hi there!

Is there a way to make the snowball algorithm work with a protwords.txt 
file?

EnglishPorter works fine. It would be great if the snowball algorithm 
could do the same to avoid searches with irrelevant results.

Best,

Leonardo

Re: Snowball and protected words

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 19, 2009, at 8:37 AM, Leonardo Dias wrote:
> Erik just said it wouldn't be hard to bring that functionality to  
> Snowball. Erik, do you know what needs to be done in order to  
> achieve that? Don't you guys have plans for that? I'm sure that I'm  
> not the only one with that problem using SOLR with portuguese  
> language (or any other idiom).

Leonardo - so many development things are not hard... practically  
everything is easy... there's just so darn many of them!

For you:  <http://issues.apache.org/jira/browse/SOLR-1026>

I'm ready to commit as-is, test case passes, and using analysis.jsp  
with "text" modified to have Snowball instead of EnglishPorter:

    <filter class="solr.SnowballPorterFilterFactory"  
language="English" protected="protwords.txt"/>

it all worked as expected for stemmed and protected words.

	Erik


Re: Snowball and protected words

Posted by Leonardo Dias <le...@catho.com.br>.
Hello Walter.

We believe this kind of thing is better managed by a content team that 
works with user feedback. It would be costly everytime we find a word 
that brings irrelevant results the fact that, to correct that, we'd need 
to build a new stemmer. It's a lot better to create a simple interface 
that allows anyone to define which are the protected words we need 
according to user feedback in a simple, easy way.

Erik just said it wouldn't be hard to bring that functionality to 
Snowball. Erik, do you know what needs to be done in order to achieve 
that? Don't you guys have plans for that? I'm sure that I'm not the only 
one with that problem using SOLR with portuguese language (or any other 
idiom).

Thank you very much for your help,

Leonardo.

Walter Underwood escreveu:
> You can define exceptions in the Snowball language and generate
> a new stemmer. See the examples here:
>
> http://snowball.tartarus.org/algorithms/english/stemmer.html
>
> wunder
>
> On 2/18/09 9:56 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:
>
>   
>> On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
>>     
>>> Is there a way to make the snowball algorithm work with a
>>> protwords.txt file?
>>>       
>> Currently, and unfortunately, no - the protected words feature is not
>> available the SnowballPorterFilterFactory.    It wouldn't take much
>> effort to bring that capability across though.
>>
>> Erik
>>     
>
>
>
>   




Re: Snowball and protected words

Posted by Walter Underwood <wu...@netflix.com>.
You can define exceptions in the Snowball language and generate
a new stemmer. See the examples here:

http://snowball.tartarus.org/algorithms/english/stemmer.html

wunder

On 2/18/09 9:56 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:

> 
> On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
>> Is there a way to make the snowball algorithm work with a
>> protwords.txt file?
> 
> Currently, and unfortunately, no - the protected words feature is not
> available the SnowballPorterFilterFactory.    It wouldn't take much
> effort to bring that capability across though.
> 
> Erik



Re: Snowball and protected words

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
> Is there a way to make the snowball algorithm work with a  
> protwords.txt file?

Currently, and unfortunately, no - the protected words feature is not  
available the SnowballPorterFilterFactory.    It wouldn't take much  
effort to bring that capability across though.

	Erik