You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Steven White <sw...@gmail.com> on 2020/04/24 12:32:59 UTC

Stopwords impact on search

Hi everyone,

What is, if any, the impact of stopwords in to my search ranking quality?
Will my ranking improve is I do not index stopwords?

I'm trying to figure out if I should use the stopword filter or not.

Thanks in advanced.

Steve

Re: Stopwords impact on search

Posted by Steven White <sw...@gmail.com>.
Thanks Walter.  Much appreciated.

To the Solr dev team, it would be of great help if there Walter's IDF
summary is made part of stop-filter:
https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#stop-filter

Steve

On Fri, Apr 24, 2020 at 8:49 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> IDF and stopword removal are different approaches to the same thing.
>
> Removing stopwords is a binary decision on how important common words
> are for search. It says some words are completely useless.
>
> IDF is a proportional measure on how important common words are for search.
>
> Instead of removing a list of words that are assumed to be common and less
> useful, let the engine actually measure how common the words are and factor
> that into the relevance.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 24, 2020, at 5:39 PM, Steven White <sw...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > I get it why and when if stopwords are note indexed is a bad idea and can
> > give you 0 or incomplete results.  But what about the quality of search
> > result when stopwords are indexed vs. not indexed?
> >
> > 1) Stopwords are removed and I do word search, not phrase for "solr and
> > lucene are so cool".
> > 2) Stopwords are not removed and I do word search, not phrase for "solr
> and
> > lucene are so cool".
> >
> > Now if "and", "are" and "or" are stopwords, will the search quality and
> > ranking for #1 be better then #2?  What about if I turn the above into a
> > phrase search?
> >
> > Thanks
> >
> > Steve
> >
> >
> > On Fri, Apr 24, 2020 at 10:53 AM Walter Underwood <wunder@wunderwood.org
> >
> > wrote:
> >
> >> I’m astonished that the default still has that. It was a bad idea in
> Solr
> >> 1.3, when
> >> it bit my ass.
> >>
> >> We help people with this about once a month and the advice is always the
> >> same.
> >> Imagine all the poor people who never ask about it and run with that
> >> default!
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Apr 24, 2020, at 7:34 AM, Erick Erickson <er...@gmail.com>
> >> wrote:
> >>>
> >>> +1 to removing stopword filters.
> >>>
> >>>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com>
> >> wrote:
> >>>>
> >>>> I tend to agree. Should we simply remove the stopword filters from the
> >> default configsets shipping with Solr?
> >>>>
> >>>> Jan
> >>>>
> >>>>> 24. apr. 2020 kl. 14:44 skrev David Hastings <
> >> hastings.recursive@gmail.com>:
> >>>>>
> >>>>> you should never use the stopword filter unless you have a very
> >> specific
> >>>>> purpose
> >>>>>
> >>>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> What is, if any, the impact of stopwords in to my search ranking
> >> quality?
> >>>>>> Will my ranking improve is I do not index stopwords?
> >>>>>>
> >>>>>> I'm trying to figure out if I should use the stopword filter or not.
> >>>>>>
> >>>>>> Thanks in advanced.
> >>>>>>
> >>>>>> Steve
> >>>>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: Stopwords impact on search

Posted by Walter Underwood <wu...@wunderwood.org>.
IDF and stopword removal are different approaches to the same thing.

Removing stopwords is a binary decision on how important common words
are for search. It says some words are completely useless.

IDF is a proportional measure on how important common words are for search.

Instead of removing a list of words that are assumed to be common and less
useful, let the engine actually measure how common the words are and factor
that into the relevance.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 24, 2020, at 5:39 PM, Steven White <sw...@gmail.com> wrote:
> 
> Hi everyone,
> 
> I get it why and when if stopwords are note indexed is a bad idea and can
> give you 0 or incomplete results.  But what about the quality of search
> result when stopwords are indexed vs. not indexed?
> 
> 1) Stopwords are removed and I do word search, not phrase for "solr and
> lucene are so cool".
> 2) Stopwords are not removed and I do word search, not phrase for "solr and
> lucene are so cool".
> 
> Now if "and", "are" and "or" are stopwords, will the search quality and
> ranking for #1 be better then #2?  What about if I turn the above into a
> phrase search?
> 
> Thanks
> 
> Steve
> 
> 
> On Fri, Apr 24, 2020 at 10:53 AM Walter Underwood <wu...@wunderwood.org>
> wrote:
> 
>> I’m astonished that the default still has that. It was a bad idea in Solr
>> 1.3, when
>> it bit my ass.
>> 
>> We help people with this about once a month and the advice is always the
>> same.
>> Imagine all the poor people who never ask about it and run with that
>> default!
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 24, 2020, at 7:34 AM, Erick Erickson <er...@gmail.com>
>> wrote:
>>> 
>>> +1 to removing stopword filters.
>>> 
>>>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com>
>> wrote:
>>>> 
>>>> I tend to agree. Should we simply remove the stopword filters from the
>> default configsets shipping with Solr?
>>>> 
>>>> Jan
>>>> 
>>>>> 24. apr. 2020 kl. 14:44 skrev David Hastings <
>> hastings.recursive@gmail.com>:
>>>>> 
>>>>> you should never use the stopword filter unless you have a very
>> specific
>>>>> purpose
>>>>> 
>>>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> What is, if any, the impact of stopwords in to my search ranking
>> quality?
>>>>>> Will my ranking improve is I do not index stopwords?
>>>>>> 
>>>>>> I'm trying to figure out if I should use the stopword filter or not.
>>>>>> 
>>>>>> Thanks in advanced.
>>>>>> 
>>>>>> Steve
>>>>>> 
>>>> 
>>> 
>> 
>> 


Re: Stopwords impact on search

Posted by Steven White <sw...@gmail.com>.
Hi everyone,

I get it why and when if stopwords are note indexed is a bad idea and can
give you 0 or incomplete results.  But what about the quality of search
result when stopwords are indexed vs. not indexed?

1) Stopwords are removed and I do word search, not phrase for "solr and
lucene are so cool".
2) Stopwords are not removed and I do word search, not phrase for "solr and
lucene are so cool".

Now if "and", "are" and "or" are stopwords, will the search quality and
ranking for #1 be better then #2?  What about if I turn the above into a
phrase search?

Thanks

Steve


On Fri, Apr 24, 2020 at 10:53 AM Walter Underwood <wu...@wunderwood.org>
wrote:

> I’m astonished that the default still has that. It was a bad idea in Solr
> 1.3, when
> it bit my ass.
>
> We help people with this about once a month and the advice is always the
> same.
> Imagine all the poor people who never ask about it and run with that
> default!
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 24, 2020, at 7:34 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >
> > +1 to removing stopword filters.
> >
> >> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com>
> wrote:
> >>
> >> I tend to agree. Should we simply remove the stopword filters from the
> default configsets shipping with Solr?
> >>
> >> Jan
> >>
> >>> 24. apr. 2020 kl. 14:44 skrev David Hastings <
> hastings.recursive@gmail.com>:
> >>>
> >>> you should never use the stopword filter unless you have a very
> specific
> >>> purpose
> >>>
> >>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com>
> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> What is, if any, the impact of stopwords in to my search ranking
> quality?
> >>>> Will my ranking improve is I do not index stopwords?
> >>>>
> >>>> I'm trying to figure out if I should use the stopword filter or not.
> >>>>
> >>>> Thanks in advanced.
> >>>>
> >>>> Steve
> >>>>
> >>
> >
>
>

Re: Stopwords impact on search

Posted by Walter Underwood <wu...@wunderwood.org>.
I’m astonished that the default still has that. It was a bad idea in Solr 1.3, when
it bit my ass.

We help people with this about once a month and the advice is always the same.
Imagine all the poor people who never ask about it and run with that default!

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 24, 2020, at 7:34 AM, Erick Erickson <er...@gmail.com> wrote:
> 
> +1 to removing stopword filters.
> 
>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>> 
>> I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr?
>> 
>> Jan
>> 
>>> 24. apr. 2020 kl. 14:44 skrev David Hastings <ha...@gmail.com>:
>>> 
>>> you should never use the stopword filter unless you have a very specific
>>> purpose
>>> 
>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> What is, if any, the impact of stopwords in to my search ranking quality?
>>>> Will my ranking improve is I do not index stopwords?
>>>> 
>>>> I'm trying to figure out if I should use the stopword filter or not.
>>>> 
>>>> Thanks in advanced.
>>>> 
>>>> Steve
>>>> 
>> 
> 


Re: Stopwords impact on search

Posted by Jan Høydahl <ja...@cominvent.com>.
Turns out there is already a JIRA for this SOLR-10992 <https://issues.apache.org/jira/browse/SOLR-10992>
where both you and I commented already :) But it’s 3 years old...

> 24. apr. 2020 kl. 16:34 skrev Erick Erickson <er...@gmail.com>:
> 
> +1 to removing stopword filters.
> 
>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com> wrote:
>> 
>> I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr?
>> 
>> Jan
>> 
>>> 24. apr. 2020 kl. 14:44 skrev David Hastings <ha...@gmail.com>:
>>> 
>>> you should never use the stopword filter unless you have a very specific
>>> purpose
>>> 
>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> What is, if any, the impact of stopwords in to my search ranking quality?
>>>> Will my ranking improve is I do not index stopwords?
>>>> 
>>>> I'm trying to figure out if I should use the stopword filter or not.
>>>> 
>>>> Thanks in advanced.
>>>> 
>>>> Steve
>>>> 
>> 
> 


Re: Stopwords impact on search

Posted by Erick Erickson <er...@gmail.com>.
+1 to removing stopword filters.

> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <ja...@cominvent.com> wrote:
> 
> I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr?
> 
> Jan
> 
>> 24. apr. 2020 kl. 14:44 skrev David Hastings <ha...@gmail.com>:
>> 
>> you should never use the stopword filter unless you have a very specific
>> purpose
>> 
>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:
>> 
>>> Hi everyone,
>>> 
>>> What is, if any, the impact of stopwords in to my search ranking quality?
>>> Will my ranking improve is I do not index stopwords?
>>> 
>>> I'm trying to figure out if I should use the stopword filter or not.
>>> 
>>> Thanks in advanced.
>>> 
>>> Steve
>>> 
> 


Re: Stopwords impact on search

Posted by Jan Høydahl <ja...@cominvent.com>.
I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr?

Jan

> 24. apr. 2020 kl. 14:44 skrev David Hastings <ha...@gmail.com>:
> 
> you should never use the stopword filter unless you have a very specific
> purpose
> 
> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:
> 
>> Hi everyone,
>> 
>> What is, if any, the impact of stopwords in to my search ranking quality?
>> Will my ranking improve is I do not index stopwords?
>> 
>> I'm trying to figure out if I should use the stopword filter or not.
>> 
>> Thanks in advanced.
>> 
>> Steve
>> 


Re: Stopwords impact on search

Posted by Rohan Kasat <ro...@gmail.com>.
So do we use stopwords filter as part of query analyzer, to avoid
highlighting of these stop words ?

Regards,
Rohan

On Fri, Apr 24, 2020 at 7:45 AM Walter Underwood <wu...@wunderwood.org>
wrote:

> Agreed. Here is an article from 13 years ago when I accidentally turned on
> stopword removal at Netflix. It caused bad problems.
>
> https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/
>
> Infoseek was not removing stopwords when I joined them in 1996. Since then,
> I’ve always left stopwords in the index. Removing stop words is a desperate
> speed/hack hack from the days of 16-bit machines.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 24, 2020, at 5:44 AM, David Hastings <
> hastings.recursive@gmail.com> wrote:
> >
> > you should never use the stopword filter unless you have a very specific
> > purpose
> >
> > On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com>
> wrote:
> >
> >> Hi everyone,
> >>
> >> What is, if any, the impact of stopwords in to my search ranking
> quality?
> >> Will my ranking improve is I do not index stopwords?
> >>
> >> I'm trying to figure out if I should use the stopword filter or not.
> >>
> >> Thanks in advanced.
> >>
> >> Steve
> >>
>
> --

*Regards,Rohan Kasat*

Re: Stopwords impact on search

Posted by Walter Underwood <wu...@wunderwood.org>.
Agreed. Here is an article from 13 years ago when I accidentally turned on 
stopword removal at Netflix. It caused bad problems.

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

Infoseek was not removing stopwords when I joined them in 1996. Since then,
I’ve always left stopwords in the index. Removing stop words is a desperate
speed/hack hack from the days of 16-bit machines.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 24, 2020, at 5:44 AM, David Hastings <ha...@gmail.com> wrote:
> 
> you should never use the stopword filter unless you have a very specific
> purpose
> 
> On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:
> 
>> Hi everyone,
>> 
>> What is, if any, the impact of stopwords in to my search ranking quality?
>> Will my ranking improve is I do not index stopwords?
>> 
>> I'm trying to figure out if I should use the stopword filter or not.
>> 
>> Thanks in advanced.
>> 
>> Steve
>> 


Re: Stopwords impact on search

Posted by David Hastings <ha...@gmail.com>.
you should never use the stopword filter unless you have a very specific
purpose

On Fri, Apr 24, 2020 at 8:33 AM Steven White <sw...@gmail.com> wrote:

> Hi everyone,
>
> What is, if any, the impact of stopwords in to my search ranking quality?
> Will my ranking improve is I do not index stopwords?
>
> I'm trying to figure out if I should use the stopword filter or not.
>
> Thanks in advanced.
>
> Steve
>