You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Leonardo Dias <le...@catho.com.br> on 2009/02/18 18:40:45 UTC
Snowball and protected words
Hi there!
Is there a way to make the snowball algorithm work with a protwords.txt
file?
EnglishPorter works fine. It would be great if the snowball algorithm
could do the same to avoid searches with irrelevant results.
Best,
Leonardo
Re: Snowball and protected words
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 19, 2009, at 8:37 AM, Leonardo Dias wrote:
> Erik just said it wouldn't be hard to bring that functionality to
> Snowball. Erik, do you know what needs to be done in order to
> achieve that? Don't you guys have plans for that? I'm sure that I'm
> not the only one with that problem using SOLR with portuguese
> language (or any other idiom).
Leonardo - so many development things are not hard... practically
everything is easy... there's just so darn many of them!
For you: <http://issues.apache.org/jira/browse/SOLR-1026>
I'm ready to commit as-is, test case passes, and using analysis.jsp
with "text" modified to have Snowball instead of EnglishPorter:
<filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
it all worked as expected for stemmed and protected words.
Erik
Re: Snowball and protected words
Posted by Leonardo Dias <le...@catho.com.br>.
Hello Walter.
We believe this kind of thing is better managed by a content team that
works with user feedback. It would be costly everytime we find a word
that brings irrelevant results the fact that, to correct that, we'd need
to build a new stemmer. It's a lot better to create a simple interface
that allows anyone to define which are the protected words we need
according to user feedback in a simple, easy way.
Erik just said it wouldn't be hard to bring that functionality to
Snowball. Erik, do you know what needs to be done in order to achieve
that? Don't you guys have plans for that? I'm sure that I'm not the only
one with that problem using SOLR with portuguese language (or any other
idiom).
Thank you very much for your help,
Leonardo.
Walter Underwood escreveu:
> You can define exceptions in the Snowball language and generate
> a new stemmer. See the examples here:
>
> http://snowball.tartarus.org/algorithms/english/stemmer.html
>
> wunder
>
> On 2/18/09 9:56 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:
>
>
>> On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
>>
>>> Is there a way to make the snowball algorithm work with a
>>> protwords.txt file?
>>>
>> Currently, and unfortunately, no - the protected words feature is not
>> available the SnowballPorterFilterFactory. It wouldn't take much
>> effort to bring that capability across though.
>>
>> Erik
>>
>
>
>
>
Re: Snowball and protected words
Posted by Walter Underwood <wu...@netflix.com>.
You can define exceptions in the Snowball language and generate
a new stemmer. See the examples here:
http://snowball.tartarus.org/algorithms/english/stemmer.html
wunder
On 2/18/09 9:56 AM, "Erik Hatcher" <er...@ehatchersolutions.com> wrote:
>
> On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
>> Is there a way to make the snowball algorithm work with a
>> protwords.txt file?
>
> Currently, and unfortunately, no - the protected words feature is not
> available the SnowballPorterFilterFactory. It wouldn't take much
> effort to bring that capability across though.
>
> Erik
Re: Snowball and protected words
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 18, 2009, at 12:40 PM, Leonardo Dias wrote:
> Is there a way to make the snowball algorithm work with a
> protwords.txt file?
Currently, and unfortunately, no - the protected words feature is not
available the SnowballPorterFilterFactory. It wouldn't take much
effort to bring that capability across though.
Erik