You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by rashi gandhi <ga...@gmail.com> on 2020/04/09 13:10:37 UTC

handling stopwords for special scenarios

Hi All,

We are using stopword filter factory at both index and search time, to omit
the stopwords.

However, for a one particular case, we are getting "here" as a search query
and "here" is one the words in title/name representing our client.
We are returning zero results as "here" is one of the English
language stopwords which is getting omitted while indexing and searching
both.

One solution could be that I remove the "here" from list of stopwords,
however does not look feasible.

Is there any way where we can handle this kind of cases, where
stopwrods are meant to be actual search term?

Any leads would be appreciated.

Re: handling stopwords for special scenarios

Posted by Walter Underwood <wu...@wunderwood.org>.
Agreed, leave the stopwords alone. I ran into this same problem
thirteen years ago at Netflix. Even before that, I wasn’t removing 
stopwords, but I accidentally left them in the Solr 1.3 config.

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 9, 2020, at 7:34 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> 1> why use stopwords at all? They’re largely a holdover from the
>     bad old days when memory was limited. I usually recommend
>     people just start by not using stopwords at all.
> 
> 2> assuming <1> doesn’t work for you, why doesn’t it look feasible
>      to remove here from the stopword list? True, you have to re-index.
> 
> But what you’re asking for is not possible. Stopwords are simply gone
> from the index without a trace, there’s absolutely no way to reconstruct
> one.
> 
> I’d also argue that this is an N+1 situation. Sure, you’ll solve the “here”
> problem by removing it from the stopword list, but then you’ll have
> the same problem with “there”…
> 
> Best,
> Erick
> 
>> On Apr 9, 2020, at 9:10 AM, rashi gandhi <ga...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> We are using stopword filter factory at both index and search time, to omit
>> the stopwords.
>> 
>> However, for a one particular case, we are getting "here" as a search query
>> and "here" is one the words in title/name representing our client.
>> We are returning zero results as "here" is one of the English
>> language stopwords which is getting omitted while indexing and searching
>> both.
>> 
>> One solution could be that I remove the "here" from list of stopwords,
>> however does not look feasible.
>> 
>> Is there any way where we can handle this kind of cases, where
>> stopwrods are meant to be actual search term?
>> 
>> Any leads would be appreciated.
> 


Re: handling stopwords for special scenarios

Posted by Erick Erickson <er...@gmail.com>.
1> why use stopwords at all? They’re largely a holdover from the
     bad old days when memory was limited. I usually recommend
     people just start by not using stopwords at all.

2> assuming <1> doesn’t work for you, why doesn’t it look feasible
      to remove here from the stopword list? True, you have to re-index.

But what you’re asking for is not possible. Stopwords are simply gone
from the index without a trace, there’s absolutely no way to reconstruct
one.

I’d also argue that this is an N+1 situation. Sure, you’ll solve the “here”
problem by removing it from the stopword list, but then you’ll have
the same problem with “there”…

Best,
Erick

> On Apr 9, 2020, at 9:10 AM, rashi gandhi <ga...@gmail.com> wrote:
> 
> Hi All,
> 
> We are using stopword filter factory at both index and search time, to omit
> the stopwords.
> 
> However, for a one particular case, we are getting "here" as a search query
> and "here" is one the words in title/name representing our client.
> We are returning zero results as "here" is one of the English
> language stopwords which is getting omitted while indexing and searching
> both.
> 
> One solution could be that I remove the "here" from list of stopwords,
> however does not look feasible.
> 
> Is there any way where we can handle this kind of cases, where
> stopwrods are meant to be actual search term?
> 
> Any leads would be appreciated.