You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2020/07/12 04:06:10 UTC

Re: Query in quotes cannot find results

On 6/30/2020 12:07 PM, Permakoff, Vadim wrote:
> Regarding removing the stopwords, I agree, there are many cases when you don't want to remove the stopwords, but there is one very compelling case when you want them to be removed.
> 
> Imagine, you have one document with the following text:
> 1. "to expand the methods for mailing cancellation"
> And another document with the text:
> 2. "to expand methods for mailing cancellation"
> 
> The user query is (without quotes): q=expand the methods for mailing cancellation
> I don't want to bring all the documents with condition q.op=OR, it will find too many unrelated documents, so I want to search with q.op=AND. Unfortunately, the document 2 will not be found as it has no stop word "the" in it.
> What should I do now?

Do these users want imprecise matches to only show up when there is a 
well-known stopword involved, or do they also want imprecise matches to 
show up with ANY word missing, added, or moved?  If I were betting on 
it, I'd say they want the latter, not the former.  Erick already gave 
you the solution to that -- phrase slop.

In modern times, the only valid reason I can think of to implement a 
stopword filter is for situations where you want it to be impossible to 
search for certain words -- some might want expletives in this category, 
for example.

Tuning a Solr config for good results is an exercise in tradeoffs.  The 
core tradeoff in most situations is the standard "precision vs. recall" 
discussion.  A change that increases precision will almost always reduce 
recall, and vice versa.  I know from experience that you'll get more 
complaints about reducing recall than you will about reducing precision. 
  Implementing a hard-coded phrase slop value of 1 will reduce precision 
by an amount that's hard to determine, and GREATLY increase recall. 
Chances are good that most users will appreciate the change.  If you 
make the phrase slop setting configurable by the user, that's even better.

Thanks,
Shawn